chapter 6 - profiles1 assume we have a family of sequences. to search for other sequences in the...
DESCRIPTION
Chapter 6 - Profiles3 Multiple alignments and profiles What weight does amino acid a have in position r in the profileTRANSCRIPT
Chapter 6 - Profiles 1
Chapter 6 - Profiles
Assume we have a family of sequences. To search for other sequences in the family we can
• Search with a sequence from the family• Search with more sequences from the family
together– Consensus sequences (regular expressions)
• Regular expression Ex. A-[FR]-X(2,3)-M• GARCCMH LCAFARLMLMA
– Weight matrices or position-specific scoring matrices• Not considering gaps
– Profiles– Profiles as Hidden Markov Models
Chapter 6 - Profiles 2
Search with a family of sequences
1. Align the sequences (multiple)2. Make a profile from part of the alignment3. Search in the database with the profile4. As an option, revise the profile, and search again (iteratively)
Chapter 6 - Profiles 3
Multiple alignments and profiles
What weight does amino acid a have in position r in the profile
Chapter 6 - Profiles 4
ExampleClustal X (1.64b) multiple sequence alignmentXENLA1 ALVSGPQD------NELDG--MQLXENLA2 AQVNGPQD------NELDG--MQFMOUSE1 PQVEQLEL------GGSP---GDLRAT1 PQVPQLEL------GGGPEA-GDLMOUSE2 PQVAQLEL------GGGPGA-GDLRAT2 PQVAQLEL------GGGPGA-GDL RemovedCRILO PQVAQLEL------GGGPGA-DDLRABIT LQVGQAEL------GGGPGA-GGLBOVIN PQVGALEL------AGGPG-----SHEEP PQVGALEL------AGGPG----- RemovedPIG PQAGAVEL------GGGLGG---LCANFA LQVRDVEL------AGAPGE-GGLHUMAN LQVGQVEL------GGGPGA-GSLCHICK P-LVSSPL------RGEAGV-LPFORENI LLGFLPPKAGGAVVQGGEN---EVVERMO LLGFLPAKSGGAAAGG-ENEVAEF 12345678******567890*234 * means removed Cons A B C D E F G H I K L M N P Q R S T V W X Y Z Gap Le1 P 1 0 -18 -17 -12 -14 -21 -13 -3 -10 1 -2 -15 26 -6 -12 -3 -2 -1 -32 0 -18 0 100 1002 q -4 0 -18 -5 2 -10 -17 2 -3 3 0 1 -3 -7 11 3 -4 -3 -4 -17 0 -10 0 50 1003 V 1 0 -5 -23 -17 -6 -15 -17 15 -15 9 7 -17 -16 -13 -17 -7 -3 18 -26 0 -14 0 100 1004 G 0 0 -12 -8 -7 -14 0 -5 -13 -6 -14 -10 -2 -9 -5 -6 -1 -3 -8 -22 0 -11 0 100 1005 Q 2 0 -15 1 1 -25 4 -3 -17 -1 -15 -11 1 -7 3 -2 3 -1 -12 -30 0 -20 0 100 1006 P 1 0 -13 -17 -11 -14 -21 -13 0 -10 0 -1 -13 18 -7 -13 -1 0 3 -32 0 -17 0 100 1007 E 0 0 -29 12 19 -36 -10 0 -25 7 -24 -19 3 20 13 2 2 0 -17 -41 0 -26 0 100 1008 L -8 0 -20 -15 -10 -1 -29 -10 7 -7 14 9 -13 -17 -6 -10 -12 -8 3 -20 0 -8 0 100 1005 g 3 0 -16 5 2 -36 21 0 -28 3 -28 -21 10 -8 4 5 4 -2 -20 -32 0 -25 0 34 346 G 4 0 -21 6 0 -49 51 -10 -41 -6 -40 -32 4 -13 -4 -7 3 -9 -30 -40 0 -37 0 100 1007 G 3 0 -16 -3 -4 -31 23 -11 -22 -8 -20 -16 -2 -12 -5 -9 0 -6 -16 -33 0 -27 0 100 1008 P 3 0 -24 7 6 -32 -10 -5 -21 -1 -20 -17 0 27 2 -6 2 0 -14 -43 0 -25 0 100 1009 g 3 0 -19 5 -2 -45 49 -8 -39 -6 -38 -30 9 -13 -5 -6 4 -7 -28 -37 0 -33 0 50 780 a 5 0 -3 -2 0 -12 0 -5 -3 -3 -6 -3 -2 -3 -1 -4 1 0 0 -19 0 -12 0 50 782 g -1 0 -11 -9 -9 -12 7 -9 -6 -9 -4 0 -6 -13 -7 -10 -4 -6 -6 -18 0 -14 0 50 783 q 0 0 -22 13 11 -33 4 0 -26 3 -25 -19 6 6 7 0 3 0 -19 -36 0 -23 0 50 784 L -12 0 -10 -37 -28 28 -42 -13 22 -22 29 21 -27 -24 -17 -23 -20 -12 15 1 0 10 0 100 100 * 17 0 0 10 17 3 52 0 0 1 36 2 4 22 21 2 5 0 16 0 0 0 0
Chapter 6 - Profiles 5
What to take into account when creating a profile?1. The observed amino acids in position r in the alignment.
2. The number of independent ‘observations’ that has been used for constructingthe alignment of position r (for example number of different a.a. in the column)
3. The similarity of a to the amino acids observed in column r, to allow for not yetobserved amino acids. Amino acid a is more likely to occur in unknown family members if
there are many amino acids similar to a in the known sequences.Thus a ‘background’ scoring matrix should be used.
4. The background (a priori) distribution of the amino acids.
5. The diversity and similarity of the sequences, resulting in the importance (orweight) of each sequence. The known sequences are normally not uniformlydistributed in the ‘family space’, and should have different weights in the calculation.
6. The number of gaps over column r and the neighbouring columns.
These points are not independent. How these aspects are treated varies with the different methods for profile construction.
Chapter 6 - Profiles 6
Database search with a profile
Chapter 6 - Profiles 7
Notations
Chapter 6 - Profiles 8
Position weight
r
rbrb
r
r
rb
rb
r
rbrb
mTV
m
mT
V
mTV
ln1ln1:3
]11ln[
]1
1ln[:2
:1
No sequence weight considered now
1. All a.a. In the column count equally2. A.a occurring many times are favored3. A.a. Occurring many times are ’punished’
Chapter 6 - Profiles 9
PSI-BLAST
Chapter 6 - Profiles 10
Hidden Markov Model