comparing protein sequences
DESCRIPTION
Tutorial 4. Comparing Protein Sequences. Today’s menu: PAM and BLOSUM score matrices Psi-BLAST Phi-BLAST. PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. - PowerPoint PPT PresentationTRANSCRIPT
Comparing Protein Sequences
Tutorial 4
Today’s menu:
• PAM and BLOSUM score matrices• Psi-BLAST• Phi-BLAST
PAM & BLOSUM
• PAM matrices are based on global alignments of closely related proteins.
• The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence.
• Other PAM matrices are extrapolated from PAM1.
• BLOSUM matrices are based on local alignments.
• BLOSUM 62 is a matrix calculated from comparisons of sequences with at most 62% identity
in the blocks.
• All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins.
PAM100 ~ BLOSUM90 Closely RelatedPAM120 ~ BLOSUM80PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52PAM250 ~ BLOSUM45 Highly Divergent
Query length Matrix Gap costs
<35 PAM30 9,1
35-50 PAM70 10,1
50-85 BLOSUM80 10,1
>85 BLOSUM62 11,1
Use Recommendations
Example
• Query: >ADRM1_HUMAN
(A glycosylated plasma membrane protein which promotes cell adhesion
• Data Base: nr on Human genome.• Blast Program: BLASTP• Matrices: PAM30,BLOSUM45
PAM 30 BLOSUM45
•With BLOSUM45 we found related and divergent sequences.
•With PAM30 we found only related sequences.
What differences we observe?:
PAM 30
BLOSUM45
With BLOSUM45 we can discover interesting relations between proteins
...
Mucin-13:a glycosylated membrane protein that protects the cell by binding to pathogens
With PAM 30
With BLOSUM45
Using different scoring matrices can produce slightlyDifferent alignments:
A same alignment can be solved in many ways, specially when using a matrix for highly divergent sequences (BLOSUM45):
PSI-BLAST
Position Specific Iterative BLAST
We will analyze the following Archeal uncharacterized protein: >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS
Threshold for initial BLAST
Search (default:10)
Threshold for inclusion in PSI-BLAST iterations
(default:0.005)
The query itself
Orthologous sequences in two other
archaeal species
Other homologous sequences
...
...
...
Is MJ0577 a filament protein?
Is MJ0577 a cationic amino
transporter?
Is MJ0577 a universal stress
protein?
Pattern Hit Initiated BLAST
PHI-BLAST
A-T-X-[AVG]R-S
Pattern symbols
[]= For grouping up aminoacids that can happen at a given position
()= For numbers, when a residue (or group of residues) is repited
- = For separating between positions
Making a pattern
[LIVM](2)-D-E-A-D-[RKEN]-x-[LI]
…LIDEADKTT……IMDEADEFL……LLDEADKCL……ILDEADRIL……VVDEADNFI……LVDEADKGI……LMDEADEFL……MLDEADRSI……LIDEADKML……MLDEADNWI……LVDEADRFL…
Example>gi|71154193|sp|P0A9P6|DEAD_ECOLI Cold-shock DEAD box protein A (ATP-dependent RNA helicase deaD) MAEFETTFADLGLKAPILEALNDLGYEKPSPIQAECIPHLLNGRDVLGMAQTGSGKTAAFSLPLLQNLDP ELKAPQILVLAPTRELAVQVAEAMTDFSKHMRGVNVVALYGGQRYDVQLRALRQGPQIVVGTPGRLLDHL KRGTLDLSKLSGLVLDEADEMLRMGFIEDVETIMAQIPEGHQTALFSATMPEAIRRITRRFMKEPQEVRI QSSVTTRPDISQSYWTVWGMRKNEALVRFLEAEDFDAAIIFVRTKNATLEVAEALERNGYNSAALNGDMN QALREQTLERLKDGRLDILIATDVAARGLDVERISLVVNYDIPMDSESYVHRIGRTGRAGRAGRALLFVE NRERRLLRNIERTMKLTIPEVELPNAELLGKRRLEKFAAKVQQQLESSDLDQYRALLSKIQPTAEGEELD LETLAAALLKMAQGERTLIVPPDAPMRPKREFRDRDDRGPRDRNDRGPRGDREDRPRRERRDVGDMQLYR IEVGRDDGVEVRHIVGAIANEGDISSRYIGNIKLFASHSTIELPKGMPGEVLQHFTRTRILNKPMNMQLL GDAQPHTGGERRGGGRGFGGERREGGRNFSGERREGGRGDGRRFSGERREGRAPRRDDSTGRRRFGGDA
The DEAD box pattern: [LIVM](2)-D-E-A-D-[RKEN]-x-[LI]