protein sequence analysisparticular cluster of residue types, which is variously known as a pattern,...

11
Protein Sequence Analysis Protein Sequence Analysis

Upload: others

Post on 27-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

Protein Sequence AnalysisProtein Sequence Analysis

Page 2: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

Protein sequence motifsProtein sequence motifsßß Premise: the sequence of a proteinPremise: the sequence of a protein

sequence gives clues about its structuresequence gives clues about its structureand function.and function.

ßß In the 80s, scientists looked directly for In the 80s, scientists looked directly forclusters of residues that were indicative ofclusters of residues that were indicative offunction.function.

Page 3: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

PrositeProsite

ßß In some cases the sequence of an unknown protein is too distantly related to anyIn some cases the sequence of an unknown protein is too distantly related to anyprotein of known structure to detect its resemblance by overall sequence alignment.protein of known structure to detect its resemblance by overall sequence alignment.However, relationships can be revealed by the occurrence in its sequence of aHowever, relationships can be revealed by the occurrence in its sequence of aparticular cluster of residue types, which is variously known as a pattern, motif,particular cluster of residue types, which is variously known as a pattern, motif,signature or fingerprint. These motifs arise because specific region(s) of a proteinsignature or fingerprint. These motifs arise because specific region(s) of a proteinwhich may be important, for example, for their binding properties or for theirwhich may be important, for example, for their binding properties or for theirenzymatic activity are conserved in both structure and sequence. These structuralenzymatic activity are conserved in both structure and sequence. These structuralrequirements impose very tight constraints on the evolution of this small but importantrequirements impose very tight constraints on the evolution of this small but importantportion(s) of a protein sequence. The use of protein sequence patterns or profiles toportion(s) of a protein sequence. The use of protein sequence patterns or profiles todetermine the function of proteins is becoming very rapidly one of the essential toolsdetermine the function of proteins is becoming very rapidly one of the essential toolsof sequence analysis. Many authors ( 3,4) have recognized this reality. Based onof sequence analysis. Many authors ( 3,4) have recognized this reality. Based onthese observations, we decided in 1988, to actively pursue the development of athese observations, we decided in 1988, to actively pursue the development of adatabase of regular expression-like patterns, which would be used to search againstdatabase of regular expression-like patterns, which would be used to search againstsequences of unknown function.sequences of unknown function.

Kay Hofmann ,Kay Hofmann ,PhilippPhilipp Bucher, Laurent Bucher, Laurent Falquet Falquet and Amos and Amos BairochBairoch

The PROSITE database, its status in 1999The PROSITE database, its status in 1999

Page 4: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

Basic ideaBasic ideaßß It is a heuristic approach. Start with theIt is a heuristic approach. Start with the

following:following:ßß A collection of sequences with the same function.A collection of sequences with the same function.ßß Region/residues known to be significant for maintainingRegion/residues known to be significant for maintaining

structure and function.structure and function.

ßß Develop a pattern of conserved residues aroundDevelop a pattern of conserved residues aroundthe residues of interestthe residues of interestßß Iterate for appropriate sensitivity and specificityIterate for appropriate sensitivity and specificity

Page 5: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

From alignment to regular expressionsFrom alignment to regular expressions

* ALRDFATHDDF SMTAEATHDSI ECDQAATHEAS

ATH-[DE]

• Search Swissprot with the resulting pattern• Refine pattern to eliminate false positives• Iterate

Page 6: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

Zinc Finger domainZinc Finger domain

Page 7: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

Proteins containingProteins containing zf zfdomainsdomains

How can we find a motifcorresponding to a zfdomain

Page 8: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

The sequence analysis perspectiveThe sequence analysis perspective

ß Zinc Finger motifßß C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-HC-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H

ß 2 conserved C, and 2 conserved H

ß How can we search a database using these motifs?ß The motif is described using a regular expression. What is a

regular expression?ß How can we search for a match to a regular expression? Not

allowed to use Perl :-)

ß The ‘regular expression’ motif is weak. How can wemake it stronger

Page 9: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)
Page 10: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

ProfilesProfiles

Page 11: Protein Sequence Analysisparticular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s)

Scoring ProfilesScoring Profiles

S(i, j) = fikk

 M k, j[ ]