prediction of protein contact maps piero fariselli department of biology university of bologna

53
Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Upload: joella-park

Post on 13-Jan-2016

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Prediction of protein contact maps

Prediction of protein contact maps

Piero Fariselli

Department of BiologyUniversity of Bologna

Page 2: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

From Sequence to Function From Sequence to Function Functional Genomics and ProteomicsFunctional Genomics and Proteomics

Genomic sequence

s

>BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus.MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSGDLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDESKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYHWPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDEYSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGIKSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITRGNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVSLAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPYYLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNTKRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH

Protein sequences Protein structures

Protein functionsProtein functions

Page 3: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The Protein Folding

T T C C P S I V A R S N F N V C R L P G T P E A L C A T Y T G C I I I P G A T C P G D Y A N

Page 4: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

(Rost B.) http://dodo.cpmc.columbia.edu/cubic/papers/

Page 5: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The Data Bases of Sequences and Structures

>BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus.MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSGDLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDESKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYHWPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDEYSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGIKSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITRGNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVSLAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPYYLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNTKRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH

EMBL: 195,241,608 sequences 292,078,866,691 nucleotides

UNIPROT: 428 650 sequences 154'416'236 residues

PDB: 68000 structures membrane proteins 1%

November/2009

Page 6: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

What is a multiple alignment ? The short answer is this -

VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG--

Page 7: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

1 Y K D Y H S - D K K K G E L - - 2 Y R D Y Q T - D Q K K G D L - - 3 Y R D Y Q S - D H K K G E L - - 4 Y R D Y V S - D H K K G E L - - 5 Y R D Y Q F - D Q K K G S L - - 6 Y K D Y N T - H Q K K N E S - - 7 Y R D Y Q T - D H K K A D L - - 8 G Y G F G - - L I K N T E T T K 9 T K G Y G F G L I K N T E T T K 10 T K G Y G F G L I K N T E T T K

A 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 70 0 0 0 0 60 0 0 0 0 20 0 0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 70 0 0 0 F 0 0 0 10 0 33 0 0 0 0 0 0 0 0 0 0 G 10 0 30 0 30 0 100 0 0 0 0 50 0 0 0 0 H 0 0 0 0 10 0 0 10 30 0 0 0 0 0 0 0 K 0 40 0 0 0 0 0 0 10 100 70 0 0 0 0 100 I 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 L 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 M 0 0 0 0 0 0 0 0 0 0 0 0 0 60 0 0 N 0 0 0 0 10 0 0 0 0 0 30 10 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 40 0 0 0 30 0 0 0 0 0 0 0 R 0 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 S 0 0 0 0 0 33 0 0 0 0 0 0 10 10 0 0 T 20 0 0 0 0 33 0 0 0 0 0 30 0 30 100 0 V 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 W 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Y 70 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0

sequence position

Evolutionary information

•Multiple Sequence Alignment (MSA) of similar sequences

•Sequence profile: for each position a 20-valued vector contains the aminoacidic composition of the aligned sequences.

MS

ASe

quen

ce p

rofi

le

Page 8: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna
Page 9: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna
Page 10: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

New folds Existing folds

ThreadingAb initio

prediction

Building by homology

Homology (%)

0 10 20 30 40 50 60 70 80 90 100

3D structure prediction of proteins

Page 11: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Contacts and Contact MapsContacts and Contact Maps

Contact definition

F 297

F 156 V 299

V 271

I 240V 238

I 269

Page 12: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Protein contact definitions:

1. Based on C2. Based on C3. All-atom (without Hydrogens)

Page 13: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

From the 3D structure to the contact map

Given a protein of length L, and a square matrix M of dimension L L

For each pair of residue i and j

calculate distance between i and j

if distance < threshold

put 1 in the cell M(i,j)

otherwise

put 0 in the cell M(i,j)

Page 14: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

From 3D Structure

F 297

F 156 V 299

V 271

I 240V 238

I 269

Computation of Contact MapsComputation of Contact Maps

To Contact MapTTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYANT

TCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

Page 15: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Protein Structural Classes

All- All-

+ /

Page 16: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

An Example of a Contact map (All-)

1

2

3

4

1C5A

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70

2

1

3

4

Page 17: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

An Example of a Contact map (All-)

1SFP

0

20

40

60

80

100

120

0 20 40 60 80 100 120

NC

Page 18: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

An Example of Contact map ()

N

C

6PTI

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Page 19: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

From the contact map to the 3D structure

Two methods have been proposed :

1. Bohr et al., “Protein Structure from distance Inequalities” J.Mol. Biol. 1993, 231:861-869 => based on a steepest descent procedure

2. Vendruscolo and Domany Fold. Des. 1998, 2:295-306=> based on a modified Metropolis procedure

Page 20: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

6pti Reconstruction Efficiency (58 residues)

At M= 200 No of eliminated true contacts 6 % real contacts No of added false contacts 52 % real contacts

RMSD

M (Number of random flipping)

Vendruscolo and Domany Fold. Des. 1998

Page 21: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

From the contact map to the 3D structure:

the reconstruction efficiency

Page 22: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

RMSD = 2.5 Å

N

C

Contact map

1QHJ (1.9 Å)

3-D Modelling through Contact Maps example: Bacteriorhodopsin3-D Modelling through Contact Maps example: Bacteriorhodopsin

Model

Page 23: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

0.000

1.000

2.000

3.000

4.000

5.000

0.0 20.0 40.0 60.0 80.0 100.0

% missing contacts

RM

SD

MARC efficiency in 3D reconstruction from the protein contact map after progressive

elimination of true contacts (6pti)

Page 24: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

0.0001.0002.0003.0004.0005.0006.0007.000

0.0 10.0 20.0 30.0 40.0

% wrong contacts

RM

SD

MARC efficiency in 3D reconstruction after progressive addition of wrong contacts to a

protein contact map with 30 % of true contacts (6pti)

Page 25: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Prediction of Contact Maps

Page 26: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Prediction of Contact Maps

Several methods have been applied:

Bohr et al., FEBS 1990 261:43-46=> based on neural networks

Göbel et al., PROTEINS 1994 18: 309-317=> based on correlated mutations in proteins

Thomas et al., Prot. Eng. 1996 9: 941-948=> based on a statistical method and evolution information Olmea and Valencia Fold. Des. 1997 2:S25-S32 => based on correlated mutations and other information Fariselli and Casadio Prot. Eng 1999 12:15-21=> based on neural networks and evolutionary information Fariselli et al., CASP4/ and Prot. Eng. in press=> Neural networks and other information

Pollastri and Baldi al., Bioinformatics 2002 18 S62-S70=> Recurrent Neural networks

Page 27: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Relevant points

• Contact Threshold

• Sequence separation (or sequence gap)• No of contacts vs No of non-contacts

Page 28: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The Contact Threshold

16 Å

0

10

20

30

40

50

0 10 20 30 40 50

Page 29: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The Contact Threshold

16 Å

12 Å

0

10

20

30

40

50

0 10 20 30 40 50

Page 30: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The Contact Threshold

16 Å

12 Å

8 Å

0

10

20

30

40

50

0 10 20 30 40 50

Page 31: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The Contact Threshold

16 Å

12 Å

8 Å

6 Å

0

10

20

30

40

50

0 10 20 30 40 50

Page 32: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Sequence separation

1

100

20

40

…VTISCTGSSSNIGAGNHVKWYQQLPG…

Page 33: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The Sequence Separation

0

10

20

30

40

50

60

0 10 20 30 40 50 60

example of a sequence separation = 10

residues

2

Page 34: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Frequency distribution of the real and hypothetical contacts as a function of sequence separation

Page 35: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Protein length

Num

ber

of c

onta

cts

Relation between the number of contacts and the protein length

Page 36: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Evaluation of the efficiency of contact map predictions

1) Accuracy:

A = Ncp* / Ncpwhere Ncp* and Ncp are the number of correctly assigned contacts and that of total predicted contacts, respectively.

2) Improvement over a random predictor :

R = A / (Nc/Np)

where Nc/Np is the accuracy of a random predictor ; Nc is the number of real contacts in the protein of length Lp, and Np are all the possible contacts

3) Difference in the distribution of the inter-residue distances in the 3D structure for predicted pairs compared with all pair distances in the structure (Pazos et al., 1997):

Xd= i=1,n (Pic - Pia ) / n di

where n is the number of bins of the distance distribution (15 equally distributed bins from 4 to 60Å cluster all the possible distances of residue pairs observed in the protein structure); di is the upper limit (normalised to 60 Å) for each bin, e.g. 8 Å for the 4 to 8 Å bin; Pic and Pia are the percentage of predicted contact pairs (with distance between di and di-1 ) and that of all possible pairs respectively

Page 37: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

PredictionNew sequence

Prediction

Tools out of machine learning approaches

Tools out of machine learning approaches Neural NetworksNeural Networks

Data Base Subset

General

rules

Known mapping

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

Training

Page 38: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Contact definition used:

• C- C distance < 0.8 nm

• Sequence gap > 7 residues

Page 39: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

L<100 1c5a 1sco 2sn3 1bkf 1npk 3lzt 1juk 1axn1a1i_A 1cfh 1spy 2sxl 1bkr_A 1pdn_C 3nul 1kid 1b0m1a1t_A 1ctj 1sro 3gat_A 1br0 1pkp 5p21 1mml 1bg21a68 1cyo 1tbn 3mef_A 1bsn 1poa 7rsa 1mrj 1bgp1a7i 1fna 1tiv 4mt2 1bv1 1put L: 170-2991nls 1bxo1acp 1hev 1tle 5pti 1bxa 1ra9 1ad2 1ppn 1dlc1ah9 1hrz_A 1tsg L: 100-169 1c25 1rcf 1akz 1rgs 1irk1aho 1kbs 1ubi 1a62 1cew_I 1rie 1amm 1rhs 1iso1aie 1mbh 1uxd 1a6g 1cfe 1skz 1aol 1thv 1kvu1ail 1mbj 2acy 1acz 1cyx 1tam 1ap8 1vin 1moq1ajj 1msi 2adx 1asx 1dun 1vsd 1bf8 1xnb 1svb1aoo 1mzm 2bop_A 1aud_A 1eca 1whi 1bjk 1yub 1uro_A1ap0 1nxb 2ech 1ax3 1erv 2fsp 1byq_A 1zin 1ysc1ark 1ocp 2fdn 1b10 1exg 2gdm 1c3d 2baa 2cae1awd 1opd 2fn2 1bc4 1hfc 2ilk 1cdi 2fha 2dpg1awj 1pce 2fow 1bd8 1ifc 2lfb 1cne L>300 2pgd1awo 1plc 2hfh 1bea 1jvr 2pil 1cnv 16pk 3grs1bbo 1pou 2hoa 1bfe_A 1kpf 2tgi 1csn 1a8e1bc8_C 1ppt 2hqi 1bfg 1kte 2ucz 1ezm 1ads1brf 1rof 2lef_A 1bgf 1mak 3chy 1fts 1arv

The database of proteins used to train and test the contact map predictors.

Page 40: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Neural Network-based predictor

• 1 output neuron (contact/non-contact)

• 1 hidden layer with 8 neurons

• Input layer with 1071 input neurons :• Ordered residue pairs (1050 neurons)

• Secondary structures (18 neurons)

• Correlated mutations (1 neuron)

• Sequence conservation (2 neurons)

Page 41: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

(A) An alignment of 5 (hypothetical) sequences they are represented in a HSSP file (Sander and Schneider, 1991). i and j stand for the positions of the two residues making or not making contact (A and D in the leading sequence or sequence 1). (B) Single sequence coding. The position representing the couple (AD) in the vector is set to 1.0 while the other positions are set to 0. (C) Multiple sequence coding. For each sequence in the alignment (1 to 5 in the scheme in A) a couple of residues in position i and j is counted. The final input coding representing the frequency of each couple in the alignment is normalized to the number of the sequences

Representation of the input coding based on ordered couples.

Page 42: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Multiple sequence alignment1 MVKGPGLYTDIGKKARDLLYKDYHSDKKFTISTYSPTGVAITSS2 MVKGPGLYSDIGKRARDLLYRDYQSDHKFTLTTYTANGVAITST3 MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTT

N s

eque

nces

S(T;S)

S(T;T)

S(S;T)

S(I;L)

S(I;V)

S(L;V)

S : McLachlan substitution matrix

ViVj

M-valued vectors:

Correlation:

M

kij M

C1 ji

jjii

VσVσ

V(k)VV(k)V1

Correlated mutations

i j

M =

N·(

N-1

)/2

co

uple

s

1 MVKGPGLYTDIGKKARDLLYKDYHSDKKFTISTYSPTGVAITSS2 MVKGPGLYSDIGKRARDLLYRDYQSDHKFTLTTYTANGVAITST

1 MVKGPGLYTDIGKKARDLLYKDYHSDKKFTISTYSPTGVAITSS3 MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTT

2 MVKGPGLYSDIGKRARDLLYRDYQSDHKFTLTTYTANGVAITST3 MVKGPGLYTEIGKKARDLLYRDYQGDQKFSVTTYSSTGVAITTT

Page 43: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

The neural network architecture for prediction of contact maps

Page 44: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

0

5

10

15

20

25

30

35

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7

Accuracy of contact map prediction using a cross-validated data set (170 proteins)

Accuracy

No

of p

rote

ins

Page 45: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

T0087: 310 residues (A = 0.20 FR/NF )

N

C

Page 46: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

N

C

T0106: 123 residues (A=0.06 FR / NF )

Page 47: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

N

C

T0128: 222 residues (A = 0.24 CM )

Page 48: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

T0110: 128 residues (A = 0.30 FR )

N

C

Page 49: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

N

C

T0125: 141 residues (A = 0.03 CM )

Page 50: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

C

N

T0124: 242 residues (A = 0.01 NF)

Page 51: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

0

50

100

150

200

250

300

0 100 200 300

TARGET: T0115 (300 residues) (A = 0.17 FR/NF)PDB code: 1FWK (Homoserine kinase, Methanococcus jannaschii)

C

N

Sequence position

Seq

uenc

e po

siti

on

Page 52: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

Predictive performance on 29 targets

Target Q3 (SS)Predicted

Fr(H)Predicted

Fr(E)Observed

Fr(H)Observed

Fr(E) Lp Nal Xd A Class or T0087 0.777 0.453 0.204 0.401 0.207 310 15 10.5 0.20 FR/NFT0089 0.762 0.333 0.328 0.397 0.331 419 37 6.6 0.10 CM/FR/NFT0094 0.701 0.441 0.254 0.294 0.356 181 2 1.4 0.04 FR/NFT0099 0.625 0.000 0.411 0.000 0.071 56 358 6.0 0.25 CM/FRT0100 0.664 0.117 0.456 0.088 0.377 342 132 9.2 0.11 FRT0101 0.718 0.040 0.403 0.045 0.290 400 8 4.3 0.07 FRT0105 0.649 0.181 0.298 0.202 0.213 94 43 0.5 0.08 FR/NFT0107 0.803 0.016 0.463 0.074 0.463 188 16 7.8 0.14 FRT0108 0.754 0.006 0.475 0.078 0.374 206 9 9.4 0.18 FRT0109 0.819 0.527 0.165 0.451 0.159 182 19 7.7 0.18 FRT0110 0.832 0.474 0.326 0.451 0.159 128 27 8.3 0.30 FRT0111 0.742 0.493 0.142 0.428 0.177 431 222 7.2 0.08 CMT0112 0.701 0.213 0.365 0.316 0.270 352 704 5.8 0.17 CM/FRT0115 0.821 0.412 0.250 0.375 0.260 300 45 7.5 0.17 FR/NFT0116 0.793 0.520 0.162 0.464 0.194 811 165 7.5 0.09 FR/NFT0121 0.796 0.285 0.336 0.304 0.349 372 1000 14.7 0.16 CM/FRT0122 0.888 0.515 0.170 0.527 0.166 248 103 13.2 0.19 CMT0123 0.681 0.263 0.300 0.138 0.425 160 70 12.9 0.25 CMT0126 0.704 0.420 0.216 0.241 0.370 163 8 7.9 0.12 FRT0127 0.750 0.512 0.154 0.509 0.142 350 70 9.2 0.14 FRT0128 0.801 0.526 0.123 0.493 0.133 222 551 20.2 0.24 CMall-T0114 0.747 0.000 0.529 0.000 0.483 87 1 7.3 0.26 FRall-T0096 0.874 0.712 0.036 0.757 0.045 239 22 2.7 0.07 FR/NFT0097 0.846 0.779 0.000 0.663 0.000 105 7 2.4 0.08 FR/NFT0098 0.731 0.647 0.109 0.714 0.000 121 36 0.1 0.07 NFT0102 0.457 0.357 0.414 0.814 0.000 70 2 1.2 0.14 FRT0106 0.728 0.320 0.128 0.408 0.056 128 123 2.3 0.06 FR/NFT0124 0.868 0.979 0.000 0.855 0.000 242 429 -5.0 0.01 NFT0125 0.803 0.679 0.066 0.723 0.015 141 5 0.0 0.03 CM

Q3=secondary structure prediction accuarcy; Fr(H) and Fr(E)= frequency of predicted and observed alfa and beta structures in the chain;Lp=protein length in residues; Nal= number of sequences in the alignment; Xd and A are as defined in equations 2 and 1, respectively;Class is the classification of targets by predictio difficulty: CM=comparative modeling, FR=fold recognition, NF=new fold.

Page 53: Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna

COMMENTS

• The predictor is trained mainly on globular mixed proteins

• Contacts among beta structures dominate

• Contacts in all-alpha proteins are more difficult to predict

• A filtering algorithm is needed