pairwise alignment prelab.pdf
TRANSCRIPT
-
7/27/2019 Pairwise Alignment prelab.pdf
1/87
PAIRWISE ALIGNMENT
-
7/27/2019 Pairwise Alignment prelab.pdf
2/87
SEQUENCES ARE RELATED
Darwin: all organisms
are related through
descent withmodification
Sequences are related
through descent with
modification Similar molecules have
similar functions in
different organisms
Phylogenetic tree based onribosomal RNA: three domains of life
-
7/27/2019 Pairwise Alignment prelab.pdf
3/87
WHY COMPARE SEQUENCES?
To determine evolutionary
relationships
To decide if two proteins
(or genes) are related
structurally or functionally
Protein 1: binds oxygen
Sequence similarity
Protein 2: binds oxygen ?
-
7/27/2019 Pairwise Alignment prelab.pdf
4/87
WHY COMPARE SEQUENCES?
To identify domains or motifs that are shared between
proteins
-
7/27/2019 Pairwise Alignment prelab.pdf
5/87
TERMINOLOGIES
Similaritythe extent to which nucleotide or protein
sequences are related. It is based upon identity plus
conservation. Identitythe extent to which two sequences are invariant.
-
7/27/2019 Pairwise Alignment prelab.pdf
6/87
TERMINOLOGIES
Conservationchanges at a specific position of an amino
acid (or less commonly, DNA) sequences that preserve the
physicochemical properties of the original residue.
-
7/27/2019 Pairwise Alignment prelab.pdf
7/87
CONSERVED RESIDUES
Residues conserved among various G protein coupled
receptors are highlighted in green
-
7/27/2019 Pairwise Alignment prelab.pdf
8/87
CONSERVATION OF FUNCTION
Alignments can reveal which parts of the sequences arelikely to be important for the function, if the proteins are
involved in similar processes. In parts of the sequence of a protein which are not very
critical for its function, random mutations can easilyaccumulate.
In parts of the sequence that are critical for the function of
the protein, hardly any mutations will be accepted; nearly allchanges in such regions will destroy the function.
-
7/27/2019 Pairwise Alignment prelab.pdf
9/87
INSERTIONS/DELETIONS AND
PROTEIN STRUCTURE
loop structures: insertions/deletions
here not so significant
Why is it that two similar sequences may have large
insertions/deletions?
some insertions and deletions may not significantlyaffect the structure of a protein
-
7/27/2019 Pairwise Alignment prelab.pdf
10/87
COMPARING THE PROTEIN KINASE KRAF_HUMAN AND THE
UNCHARACTERIZED O22558 FROM ARABIDOPSIS USING BLAST546 AA
Score = 185 bits (464), Expect = 1e-45
Identities = 107/283 (37%), Positives = 172/283 (59%), Gaps = 15/283 (5%)
Query: 337 DSSYYWEIEASEVMLSTRIGSGSFGTVYKGKWHG-DVAVKILKVVDPTPEQFQAFRNEVA 395
D + WEI+ +++ + ++ SGS+G +++G + +VA+K LK E + F EV
Sbjct: 274 DGTDEWEIDVTQLKIEKKVASGSYGDLHRGTYCSQEVAIKFLKPDRVNNEMLREFSQEVF 333
Query: 396 VLRKTRHVNILLFMGYMTKD-NLAIVTQWCEGSSLYKHLHVQETKFQMFQLIDIARQTAQ 454
++RK RH N++ F+G T+ L IVT++ S+Y LH Q+ F++ L+ +A A+
Sbjct: 334 IMRKVRHKNVVQFLGACTRSPTLCIVTEFMARGSIYDFLHKQKCAFKLQTLLKVALDVAK 393
Query: 455 GMDYLHAKNIIHRDMKSNNIFLHEGLTVKIGDFGLATVKSRWSGSQQVEQPTGSVLWMAP 514
GM YLH NIIHRD+K+ N+ + E VK+ DFG+A V+ SG E TG+ WMAP
Sbjct: 394 GMSYLHQNNIIHRDLKTANLLMDEHGLVKVADFGVARVQIE-SGVMTAE--TGTYRWMAP 450
Query: 515 EVIRMQDNNPFSFQSDVYSYGIVLYELMTGELPYSHINNRDQIIFMVGRGYASPDLSKLY 574
EVI ++ P++ ++DV+SY IVL+EL+TG++PY+ + + +V +G P + K
Sbjct: 451 EVI---EHKPYNHKADVFSYAIVLWELLTGDIPYAFLTPLQAAVGVVQKG-LRPKIPK-- 504
Query: 575 KNCPKAMKRLVADCVKKVKEERPLFPQILSSIELLQHSLPKIN 617
K PK +K L+ C + E+RPLF +I IE+LQ + ++N
Sbjct: 505 KTHPK-VKGLLERCWHQDPEQRPLFEEI---IEMLQQIMKEVN 543
-
7/27/2019 Pairwise Alignment prelab.pdf
11/87
SIMILARITY AND HOMOLOGY
Similaritycan be expressed as a percentage. It does not imply
any reasons for the observed sameness.
Homologyis an evolutionary term used to describerelationship via descent from a common ancestor.
Homologous things are often similar, but not always
(e.g. whale flipper human arm)
Homology is NEVER expressed as a percentage
-
7/27/2019 Pairwise Alignment prelab.pdf
12/87
HOMOLOGY
Homologous sequences can be divided into three groups:
Orthologous sequencessequences that diverged due to
a speciation event (e.g. human -globin and mouse -globin).
Paralogous sequencessequences that diverged due to a
gene duplication event (e.g. human -globin and human
-globin, various versions of both).
Xenologous sequencessequences for which the history
of one of them involves interspecies transfer since the time
of their common ancestor.
-
7/27/2019 Pairwise Alignment prelab.pdf
13/87
HOMOLOGY
-
7/27/2019 Pairwise Alignment prelab.pdf
14/87
SIMILARITY AND HOMOLOGY
Sequence homology can be reliably inferred from
statistically significant similarity over a majority of the
sequence length. Non-homology CANNOT be inferred from non-similarity
because non-similar things can still share a common
ancestor.
Homologous proteins share common structures, but notnecessarily common sequence or function
Homology is all or nothing. There is no such thing as 50%
homology
-
7/27/2019 Pairwise Alignment prelab.pdf
15/87
QUESTION 1
True or False. Homology is synonymous with similarity
-
7/27/2019 Pairwise Alignment prelab.pdf
16/87
SEARCHING SEQUENCE DATABASES
When we search a sequence database, we are usually
looking for related sequences.
Unfortunately, the algorithms that we have for searchingdatabases, do not search for homology, they search for
similarity.
When similarity is found, we must determine if this similarity
is a result of homology or if it comes from another source.
-
7/27/2019 Pairwise Alignment prelab.pdf
17/87
WHY SEARCH FOR SIMILARITY?
I have just sequenced something. What is known about the
thing I sequenced?
I have a unique sequence. Is there similarity to another genethat has a known function?
I found a new protein in a lower organism. Is it similar to a
protein from another species?
I have decided to work on a new gene. The people in thefield will not give me the plasmid. I need the complete
cDNA sequence to perform RT-PCR or some other
experiment.
-
7/27/2019 Pairwise Alignment prelab.pdf
18/87
SEQUENCE ALIGNMENT: DEFINITION
The process of lining up two or more sequences to achieve
maximal levels of identity (and conservation, in the case of
amino acid sequences) for the purpose of assessing thedegree of similarity and the possibility of homology.
-
7/27/2019 Pairwise Alignment prelab.pdf
19/87
SEQUENCE ALIGNMENT
Comparing sequences provides information as to which
genes or proteins have the same function
Sequences are compared by aligning themsliding themalong each other to find the most matches with a few gaps
An alignment can be scoredcount matches, and can
penalize mismatches and gaps
-
7/27/2019 Pairwise Alignment prelab.pdf
20/87
QUESTION 2
Whenever possible, it is better to
A. Compare proteins than to compare genes
B. Compare genes than to compare proteins
Discuss as a group and cite points that defend your argument
-
7/27/2019 Pairwise Alignment prelab.pdf
21/87
IT IS MUCH EASIER TO ALIGN PROTEINS
4 DNA bases vs. 20 amino acidsless chance similarity
There are varying degrees of similarity between different
AAs
Protein databanks are much smaller than DNA databanks
-
7/27/2019 Pairwise Alignment prelab.pdf
22/87
PAIRWISE ALIGNMENT
The alignment of two sequences (DNA or protein) is a
relatively straightforward computational problem.
Two sequences can always be aligned. Sequence alignments have to be scored.
Often there is more than one solution with the same
score.
-
7/27/2019 Pairwise Alignment prelab.pdf
23/87
PAIRWISE ALIGNMENTS: PURPOSE
identification of sequences with significant similarity to (a)
sequence(s) in a sequence repository
identification of all homologous sequences within therepository
identification of domains with sequence similarity
-
7/27/2019 Pairwise Alignment prelab.pdf
24/87
METHODS OF ALIGNMENT
By handslide sequences on two lines of a word processor
Dot plot
Rigorous mathematical approach
Dynamic programming (slow, optimal)
Heuristic methods (fast, approximate)
BLAST and FASTA (uses word matching and hash tables)
-
7/27/2019 Pairwise Alignment prelab.pdf
25/87
ALIGNMENT BY HAND
GATCGCCTA_TTACGTCCTGGAC AGGCATACGTA_GCCCTTTCGC
A scoring system is still essential to find the best alignment
-
7/27/2019 Pairwise Alignment prelab.pdf
26/87
DOTPLOTS
Not technically an
alignment
Gives picture ofcorrespondence between
pairs of sequences
Dot represents similarity
between segments of thetwo sequences
-
7/27/2019 Pairwise Alignment prelab.pdf
27/87
QUESTION 3
Do diagonals correspond
to conserved regions?
A. Yes
B. No
-
7/27/2019 Pairwise Alignment prelab.pdf
28/87
QUESTION 3 REDUX
Take note that the dots
are placed at grid points
where two sequenceshave identical residues.
Do diagonals correspond
to conserved regions?
A. Yes
B. No
-
7/27/2019 Pairwise Alignment prelab.pdf
29/87
A LIMITATION TO DOT MATRIX COMPARISON
Where part of one sequence shares a long stretch ofsimilarity with the other sequence, a diagonal of dots will beevident in the matrix.
However, when single bases are compared at each position,most of the dots in the matrix will be due to backgroundsimilarity.
That is, for any two nucleotides compared between the two
sequences, there is a 1 in 4 chance of a match, assumingequal frequencies of A,G,C and T.
-
7/27/2019 Pairwise Alignment prelab.pdf
30/87
SIMPLE DOT PLOT
G G C T T G A C C G G
G
G
A
T
T
G
A
C
C
C
G
-
7/27/2019 Pairwise Alignment prelab.pdf
31/87
A SOLUTION
This background noise can be filtered out by comparing
groups oflnucleotides, rather than single nucleotides, at
each position. For example, if we compare dinucleotides (l= 2), the
probability of two dinucleotides chosen at random from
each sequence matching is 1/16, rather than 1/4.
Therefore, the number of background matches will be lower:
-
7/27/2019 Pairwise Alignment prelab.pdf
32/87
A FILTERED DOT PLOT
G G C T T G A C C G G
G
G
A
T
T
G
A
C
C
C
G
-
7/27/2019 Pairwise Alignment prelab.pdf
33/87
THE DOT MATRIX ALGORITHM
The dot-matrix algorithm can be generalized for sequences sand t of sizes m and n, respectively, and window size l.
For each position in sequence s, compare a window oflnucleotides centered at that position with each window oflnucleotides in sequence t.
Conceptually, you can think of windows of length lslidingalong each axis, so that all possible windows oflnucleotidesare compared between the two sequences.
-
7/27/2019 Pairwise Alignment prelab.pdf
34/87
I = 3
G G C T T G A C C G G
G
G
A
T
T
G
A
C
C
C
G
-
7/27/2019 Pairwise Alignment prelab.pdf
35/87
DOT MATRIX SEQUENCE
COMPARISON EXAMPLES
-
7/27/2019 Pairwise Alignment prelab.pdf
36/87
COMPARING A PROTEIN WITH ITSELF
Proteins can be compared with themselves to show internal
duplications or repeating sequences.
A self-matrix produces a central diagonal line through theorigin, indicating an exact match between the x and y axes.
The parallel diagonals that appear off the central line are
indicative of repeated sequence elements in different
locations of the same protein.
-
7/27/2019 Pairwise Alignment prelab.pdf
37/87
HAPTOGLOBIN Haptoglobin is a protein that is secreted into the blood by the
liver. This protein binds free hemoglobin.
The concentration of "free" hemoglobin (that is, outside redblood cells) in plasma (the fluid portion of blood) is ordinarilyvery low.
However, free hemoglobin is released when red blood cellshemolyze for any reason.
After haptoglobin binds hemoglobin, it is taken up by the liver.
The liver recycles the iron, heme, and amino acids contained inthe hemoglobin protein.
-
7/27/2019 Pairwise Alignment prelab.pdf
38/87
OUR COMPARISON
Files used
1006264A Haptoglobin H2
DNA sequencing shows that the intragenic duplication withinthe human haptoglobin Hp2 allele was formed by a non-
homologous, probably random, crossing-over within different
introns of two Hp1 genes.
A repeated sequence (starting with ADDGCP...) is observedbeginning at positions 30-90 and 90-150 - probably due to a
duplication event in one of these locations.
-
7/27/2019 Pairwise Alignment prelab.pdf
39/87
Window: 30 Stringency: 3
Blosum 62 matrix
-
7/27/2019 Pairwise Alignment prelab.pdf
40/87
SEARCHING FOR REPEATS IN DOTPLOTS
One of the strengths of dot-matrix searches is that they
make repeats easy to detect by comparing a sequence
against itself. In self comparisons, direct repeats appear as diagonals
parallel to the main line of identity.
-
7/27/2019 Pairwise Alignment prelab.pdf
41/87
COMPARISON OF TWO SIMILAR SEQUENCES Files Used:
P03035
Repressor protein from E. coliPhage p22
RPBPL Repressor protein from E. coliphage Lambda
Lambda phages infect E. coli. They can be lytic and destroysthe host cell, making hundreds of progeny.
They can also be lysogenic, and live quietly within the DNA
of the bacteria. A gene makes the repressor protein that prevents the phage
from going destructively lytic.
Phage p22 is a related phage that also makes a repressor.
Both proteins form a dimer and bind DNA to prevent lysis.
-
7/27/2019 Pairwise Alignment prelab.pdf
42/87
-
7/27/2019 Pairwise Alignment prelab.pdf
43/87
LAMBDA REPRESSOR/OPERATOR COMPLEX (1LMB)
-
7/27/2019 Pairwise Alignment prelab.pdf
44/87
DOT MATRIX SEQUENCE COMPARISON
A row of dots represents a region of sequence similarity.
Background matching also appears as scattered dots.
-
7/27/2019 Pairwise Alignment prelab.pdf
45/87
Window: 10 Stringency: 1
Blosum 62 matrix
-
7/27/2019 Pairwise Alignment prelab.pdf
46/87
Window: 10 Stringency: 3
Blosum 62 matrix
-
7/27/2019 Pairwise Alignment prelab.pdf
47/87
Window: 30 Stringency: 1
Blosum 62 matrix
-
7/27/2019 Pairwise Alignment prelab.pdf
48/87
Window: 30 Stringency: 3
Blosum 62 matrix
-
7/27/2019 Pairwise Alignment prelab.pdf
49/87
QUESTION 4
Which of the following combinations of parameters will
produce the least background noise?
A. Low window, low stringency
B. Low window, high stringency
C. High window, low stringency
D. High window, high stringency
-
7/27/2019 Pairwise Alignment prelab.pdf
50/87
DISADVANTAGES TO DOT PLOTS
While dot-matrix searches provide a great deal of
information in a visual fashion, they can only be considered
semi-quantitative, and therefore do not lend themselves tostatistical analysis.
Also, dot-matrix searches do not provide a precise
alignment between two sequences.
-
7/27/2019 Pairwise Alignment prelab.pdf
51/87
RIGOROUS ALGORITHMSDYNAMIC PROGRAMMING
-
7/27/2019 Pairwise Alignment prelab.pdf
52/87
ALGORITHM
An algorithm is a complete, unambiguous procedure for
solving a specified problem in a finite number of steps.
Algorithms leave nothing undefined and require no intuitionto achieve their end.
-
7/27/2019 Pairwise Alignment prelab.pdf
53/87
FIVE FEATURES OF AN ALGORITHM:
An algorithm must stop after a finite number of steps.
All steps of the algorithm must be precisely defined.
Input to the algorithm must be specified.
Output of the algorithm must be specified. There must be at
least one output.
An algorithm must be effective - i.e. its operations must be
basic and doable.
-
7/27/2019 Pairwise Alignment prelab.pdf
54/87
DYNAMIC PROGRAMMING
Algorithmic technique for optimization problems that have
two properties:
Optimal substructure: Optimal solution can be computed from optimalsolutions to subproblems
Overlapping subproblems: Subproblems overlap such that the total
number of distinct subproblems to be solved is relatively small
1
3
2
7
6
8
5
4
-
7/27/2019 Pairwise Alignment prelab.pdf
55/87
RIGOROUS ALGORITHMS
Needleman-Wunsch (Global)
Smith-Waterman (Local)
-
7/27/2019 Pairwise Alignment prelab.pdf
56/87
GLOBAL VS. LOCAL ALIGNMENT
Global alignment algorithms start at the beginning of two
sequences and add gaps to each until the end of one is
reached.
Local alignment algorithms finds the region (or regions)
of highest similarity between two sequences and build thealignment outward from there.
-
7/27/2019 Pairwise Alignment prelab.pdf
57/87
GLOBAL VS. LOCAL ALIGNMENT
-
7/27/2019 Pairwise Alignment prelab.pdf
58/87
GLOBAL ALIGNMENT
The Needleman-Wunsch algorithm creates a globalalignment over the length of both sequences (needle)
Global algorithms are often not effective for highly divergedsequences - do not reflect the biological reality that twosequences may only share limited regions of conservedsequence.
Sometimes two sequences may be derived from ancient
recombination events where only a single functional domain is shared. Global methods are useful when you want to force two
sequences to align over their entire length
-
7/27/2019 Pairwise Alignment prelab.pdf
59/87
LOCAL ALIGNMENT
Identify the most similar sub-region shared between two
sequences
There is no attempt to force entire sequences into analignment, just those parts that appear to have good
similarity, according to some criterion.
Smith-Waterman (water)
-
7/27/2019 Pairwise Alignment prelab.pdf
60/87
LOCAL ALIGNMENTS
It may seem that one should always use local alignments.
However, it may be difficult to spot an overall similarity, as
opposed to just a domain-to-domain similarity, if one usesonly local alignment.
So global alignment is useful in some cases.
The popular programs BLAST and FASTA for searching
sequence databases produce local alignments.
-
7/27/2019 Pairwise Alignment prelab.pdf
61/87
GAPS AND INSERTIONS
In an alignment, much better correspondence can beobtained between two sequences if a gap can be introducedin one sequence.
Alternatively, an insertion could be allowed in the othersequence.
Biologically, this corresponds to a mutation event thateliminates a part of a gene, or introduces new DNA into a
gene.
-
7/27/2019 Pairwise Alignment prelab.pdf
62/87
GAPS
Positions at which a letter is paired with a null are called
gaps.
Gap scores are typically negative.
-
7/27/2019 Pairwise Alignment prelab.pdf
63/87
QUESTION 5
Which is more significant? The presence of a gap or the
length of a gap?
A. The presence of a gap
B. The length of a gap
-
7/27/2019 Pairwise Alignment prelab.pdf
64/87
GAPS
Since a single mutational event may cause the insertion or
deletion of more than one residue, the presence of a gap is
considered more significant than the length of the gap.
-
7/27/2019 Pairwise Alignment prelab.pdf
65/87
OPTIMAL ALIGNMENT
The alignment that is the best, given a defined set ofrules and parameter values for comparing different
alignments. There is no such thing as the single best
alignment, since optimality always depends on theassumptions one bases the alignment on.
For example, what penalty should gaps carry?
All sequence alignment procedures make some suchassumptions.
-
7/27/2019 Pairwise Alignment prelab.pdf
66/87
PARAMETERS OF SEQUENCE ALIGNMENT
Scoring systems:
Each symbol pairing is assigned a numerical value, based on asymbol comparison table.
Gap penalties:
Opening: The cost to introduce a gap
Extension: The cost to elongate a gap
-
7/27/2019 Pairwise Alignment prelab.pdf
67/87
DNA SCORING SYSTEMSVERY SIMPLE
Match: 1
Mismatch: 0
Score = 5
-
7/27/2019 Pairwise Alignment prelab.pdf
68/87
PROTEIN SCORING SYSTEMS
-
7/27/2019 Pairwise Alignment prelab.pdf
69/87
PROTEIN SCORING SYSTEMS
Amino acids have different biochemical and physical
properties that influence their relative replaceability in
evolution.
-
7/27/2019 Pairwise Alignment prelab.pdf
70/87
PROTEIN SCORING SYSTEMS
Scoring matrices reflect:
Number of mutations to convert one to another
Chemical similarity Observed mutation frequencies
The probability of occurrence of each amino acid
Widely used scoring matrices:
PAM
BLOSUM
-
7/27/2019 Pairwise Alignment prelab.pdf
71/87
PAM MATRICES
Point Accepted Mutation
Family of matrices: PAM 80, PAM 120, PAM 250
The number with a PAM matrix represents the evolutionarydistance between the sequences on which the matrix is
based.
PAM 250 = 250 mutations per 100 residues
Greater numbers denote greater evolutionary distance
-
7/27/2019 Pairwise Alignment prelab.pdf
72/87
PAM MATRICES
Derived from global alignments ofprotein families. Family
members share at least 85% identity
Construction of phylogenetic tree and ancestral sequences
of each protein family
Computation of number of replacements for each pair ofamino acids
-
7/27/2019 Pairwise Alignment prelab.pdf
73/87
-
7/27/2019 Pairwise Alignment prelab.pdf
74/87
PAM 250 MATRIX
-
7/27/2019 Pairwise Alignment prelab.pdf
75/87
PAMLIMITATIONS
Based on only one original dataset
Based mainly on small globular proteins so the matrix is biased
Examines proteins with few differences (85% identity)
-
7/27/2019 Pairwise Alignment prelab.pdf
76/87
BLOSUM MATRICES
BLOcks SUbstitution Matrix
Derived from alignments of domains of distantly related
proteins Different BLOSUMn matrices are calculated independently
from blocks (ungapped local alignments)
BLOSUMn is based on a cluster of BLOCKS of sequences
that share at least n percent identity BLOSUM 62 represents closer sequences than BLOSUM 45
-
7/27/2019 Pairwise Alignment prelab.pdf
77/87
BLOSUM MATRICES
Built from BLOCKS database: from the most conservedregions of aligned sequences
~2000 blocks from 500 families have been used BLOSUM 62 is the most popular matrix and is the default
matrix for the standard BLAST program.
-
7/27/2019 Pairwise Alignment prelab.pdf
78/87
BLOSUM 50 MATRIX
A 5
R -2 7
N -1 -1 7
D -2 -2 2 8
C -1 -4 -2 -4 13Q -1 1 0 0 -3 7
E -1 0 0 2 -3 2 6
G 0 -3 0 -1 -3 -2 -3 8
H -2 0 1 -1 -3 1 0 -2 10
I -1 -4 -3 -4 -2 -3 -4 -4 -4 5
L -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5
K -1 3 0 -1 -3 2 1 -2 0 -3 -3 6M -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7
F -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8
P -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10
S 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5
T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5
W -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15
Y -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8
V 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5
A R N D C Q E G H I L K M F P S T W Y V
Positive scores on diagonal
(identities)
Similar residues get higher
(positive) scores
Dissimilar residues get smaller
(negative) scores
-
7/27/2019 Pairwise Alignment prelab.pdf
79/87
QUESTION 6
Which is the appropriate matrix to use when comparing
highly divergent sequences?
A. BLOSUM with a lower n
B. PAM with lower n
C. BLOSUM with a higher n
D. Both B and C
-
7/27/2019 Pairwise Alignment prelab.pdf
80/87
PAM VS. BLOSUM
PAM 100 = BLOSUM 90
PAM 120 = BLOSUM 80
PAM 160 = BLOSUM 60PAM 200 = BLOSUM 52
PAM 250 = BLOSUM 45
More distant
sequences
PAM 120 for general use BLOSUM 62 for general use
PAM 160 for close relations BLOSUM 80 for close relations
PAM 250 for distant relations BLOSUM 45 for distant relations
-
7/27/2019 Pairwise Alignment prelab.pdf
81/87
TIPS ON CHOOSING A SCORING MATRIX
Generally, BLOSUM matrices perform better than PAM
matrices for local similarity searches (Henikoff & Henikoff,
1993).
When comparing closely related proteins one should use
lower PAM or higher BLOSUM matrices, for distantly
related proteins higher PAM or lower BLOSUM matrices.
For database searching the commonly used matrix isBLOSUM62.
-
7/27/2019 Pairwise Alignment prelab.pdf
82/87
Lower BLOSUM series meansmore divergence
Higher PAM series meansmore divergence
better for finding local
alignments
better for finding global
alignments and remotehomologs
based on groups of relatedsequences counted as one
based on minimumreplacement or maximumparsimony
Built from vast amout of dataBuilt from small amout ofdata
Built from local alignmentsBuilt from global alignments
BLOSUMPAM
-
7/27/2019 Pairwise Alignment prelab.pdf
83/87
SCORING INSERTIONS AND DELETIONS
The creation of a gap is penalized with a negative score
value
-
7/27/2019 Pairwise Alignment prelab.pdf
84/87
WHY GAP PENALTIES?
-
7/27/2019 Pairwise Alignment prelab.pdf
85/87
WHY GAP PENALTIES?
The optimal alignment of two similar sequences is usually
that which
Maximizes the number of matches and Minimizes the number of gaps
Permitting the insertion of arbitrarily many gaps can lead to
high scoring alignments of non-homologous sequences.
Penalizing gaps forces alignments to have relatively few gaps.
-
7/27/2019 Pairwise Alignment prelab.pdf
86/87
BALANCING GAPS WITH MISMATCHES
Gaps must get a steep penalty, or else youll end up with
nonsense alignments.
In real sequences, multi-base (or amino acid) gaps are quitecommon
Affine gap penalties give a big penalty for each new gap,
but a much smaller gap extension penalty.
-
7/27/2019 Pairwise Alignment prelab.pdf
87/87
SCORING INSERTIONS AND DELETIONS