casp 13 predicting contacts - predictioncenter.org · • it is difficult to directly correlate...
TRANSCRIPT
CASP13
PredictingContacts
Assessor:AndrásFiserDepartmentofSystemsandComputationalBiologyDepartmentofBiochemistry
1
Possible questions
• Doescontactpredictionaccuracycorrelatewiththatofstructuremodeling?• Howwellyoudidamongyourselves?• HowwellyoudidcomparedtopreviousCASPs?
• Someinsightanalysis:– Areyoucapturingthesamesetofcontacts?– Arethereparticulartypesofcontactsthatyouaregettingaccurately?– Howimportantisthequalityofsequenceinformation?
2
Best structure prediction (out of 98) vs. Best contact predictions (out of 46)
G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044.
FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044.
XXG089(20)XG224(11)G498(1)XXXXXXXXXXXXXXXXX.
ContactsonlyG498(6)G032*G180*G323*G491*G106*G164(46)G189*G352*G125*G224(5)G036*
G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*.
Best structure prediction (out of 98) vs. Best contact predictions (out of 46)
G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044.
FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044.
XX(G036)(12)G089(20)X(G032)(2)G224(11)G498(1)X(G180,G32)(3,2)X(G229)(39)XX(G498)XXXXXX(G491)XXXXXXX.
ContactsonlyG498(6)G032*(2)G322,G180*(2)G322G323*(2)G322G491*(16)G117G106*G164(46)G189*G352*G125*G224(5)G036*(2)or(50)G116G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*.
Best structure prediction (out of 98) vs. Best contact predictions (out of 46)
G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044
FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044
XXG089(20)XG224(11)G498(1)XXXXXXXXXXXXXXXXX
ContactsonlyG498(6)G032*G180*G323*G491*G106*G164(46)G189*G352*G125*G224(5)G036*G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*
Difficulttoestablishclearrelationbetweencontactandstructureprediction->wedonotknowhowwellonecouldperformwitha“top”contactprediction
Sometopperformingstructurepredictiongroupsdidnotsubmitcontactprediction->wedonotknowiftheyhaveabettercontactpredictionthanothers
Amonggroupsthathavesubmittedbothstructureandcontactprediction:Surprisinginconsistencies!!Itisimportanttoknowhowtousecontactinformation!And/OrContactinformationisnotasimportantasonethought
Best structure prediction (out of 98) vs. Best contact predictions (out of 46)
ContactsXX089(20)X224(11)498(1)XXXXXXXXXXXXXXXXX
Contactsonly498(6)032*180*323*491*106*164(46)189*352*125*224(5)036*392*351(54)122(67)386*475*154*292*089(3)430*041(63)091*
89submitted:30/31targets…)
Are we predicting different contacts? Jaccard distance (“1-Intersection over Union”)
dj =A B − A B∩∪
A B∪0(same)<dj<1(different)
TopL/5numberofcontacts,Listhelengthofsequence
Contactsonly498(6)032*180*323*491*106*164(46)189*352*125*224(5)036*392*351(54)122(67)386*475*154*292*089(3)430*041(63)091*
(RRMD)
(RRMD-plus)
Deltacontact
Gammacontact
Tripletres
Tripletres_AT
Performance using different criteria
Models(FMorFM+TBM)XContacts(top10orL/5orL/2orLorFL)Xprobability(0or0.5)Xcontactdefinition(medium/long;long;extralong)=>60combinationsevaluatedbyeither:usingF1;Precision/Recall;Z-scoresumorZ-scoreaverageetc.
Long/medium contacts, FM only, Zscore >0
9
Long/medium contacts (FM only), sum Zscore (>0)
Long/medium:top10
Long/medium:L5
Long/medium:L2
Long/medium:L
032and323arethesame
Long contacts, (FM only), sum Zscore (>0)
11
Long:L
Long:L/2
Long:L/5
Long:top10
Extra long contacts only, (FM only), Zscore>0
12
ExtraLong:top10
ExtraLong:L/5
ExtraLong:L/2
ExtraLong:L
13
0
10
20
30
40
50
60
70
AveragePrecision
Longcontacts,L/5lists
CASP10
Improvement in contact prediction accuracy over CASP10-13 meetings
CASP10:23groups,15non-redundant
14
0
10
20
30
40
50
60
70
AveragePrecision
Longcontacts,L/5lists
CASP11 CASP10
Improvement in contact prediction accuracy over CASP10-13 meetings
CASP10:23groups,15non-redundantCASP11:28groups,22non-redundant
15
0
10
20
30
40
50
60
70
AveragePrecision
Longcontacts,L/5lists
CASP12 CASP11
CASP10
Improvement in contact prediction accuracy over CASP10-13 meetings
CASP10:23groups,15non-redundantCASp11:28groups,22non-redundantCASP12:31groups,22non-redundant
Improvement in contact prediction accuracy over CASP10-13 meetings
16
0
10
20
30
40
50
60
70
AveragePrecision
Longcontacts,L/5lists
CASP13 CASP12
CASP11 CASP10
CASP10:23groups,15non-redundantCASP11:28groups,22non-redundantCASP12:31groups,24non-redundantCASP13:44groups,34non-redundant
T0953s1d1
17
GoodFscore63.4
PoorFscore:14.64
BestTSmodel(G43),Cyan,GDT_TS54.48Contactmodel(G164),Green,GDT_TS41.05
Relationship between sequence profile depth and success (F-score) of predicting contacts
• Lessreliantonsequenceprofiles. 18
20 30 40 50 60
020
040
060
080
010
00
F−Score
Num
ber o
f hits
●●● ●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● PsiblastHHBlits
Limited signal coming from sequence
19
20.7 23 10 36032.9 252 25 13845.7 14 1 4917 46 3728 50 17 33
31.6 40 22 3739.5 46 37 3621.9 669 37 4034.4 591 4131 89 46 168
33.3 38 20 6020.5 3021 1905 51132 172 172 457
25.4 609 183 46725 6130 129 465
34.7 132 31 20134.7 91 111 20116 30 6 917 30 6 55
35.7 278 17 34723 19 9 35
20.6 38 23 16218.3 38 23 16936 194 1 300
24.4 194 1 2746.4 58 31 45
18 58 31 5425 58 31 51
53.7 1 1 029 14 14 36
24.2 1266 1028 47843.7 1 1 219.1 3752 50032.1 4 3 923 584 110 43036 1 1 019 7 4 18
51.4 21 6 12351.4 77 13 12330 231 126 267
64.3 302 85 44164.3 545 343 44127.2 1730 68 47018.2 629 1163 5018.2 3755 38 5032 1380 53 465
Fscoree-5e-20Neff Fscoree-5e-20NeffBlastBlast+HHblits BlastBlast+HHblits
What is what?
20Green:parallel(parallelwithdiagonal)+diffuse(helical)Blue:Anti-parallel(orthogonalwithdiagonal)+compact(strand)
Performance vs. secondary structure interactions
21
E−E H−H E−H C−C E/H−C
Fsco
re0
1020
3040
5060
70
β-β
α-α
Coil-coil β-α
β/α-coil
Randommodel
Coil-coil
β-α
α-αβ-β
β/α-coil
Topology dependence of success rates, Class level
22
0
5
10
15
20
25
30
E H M
<F-score>
all-β all-α α/β
Correlation with size
23
0
10
20
30
40
50
60
0 50 100 150 200 250 300 350 400 450 500
Proteinlength
F-score*100accuracy
R=0.32
Withoutthissinglepoint:R=0.19
Conclusions
• Contactpredictionmethodsmadeamajoradvanceforthelasttwoyears• Alotofdifferentsubsetsofcorrectcontactscanbemadeandused
successfullyin3Dmodeling• Itisdifficulttodirectlycorrelatepredictedcontactswith3Dpredictions
becauseofambiquityandlackofoverlapbetweencategoriesbut:– Best3Dpredictorshaveeitherevensuperiorcontactpredictionsorbetterwaystouse
contactinformation– Fromthefewexampleswhenbothcontactsand3Dstructureswerepredictedwesee
stronginconsistencies:itisimportanttoknowhowtousecontactinformation
• Oftenveryfewhomologoussequenceswereavailable,butverygoodcontactpredictionsweremade
– Lessemphasisonco-variancebasedmethods(supportedbytheabstractofinvitedgroups)
24
Acknowledgement
25
CASP and Predictioncenter at UC Davies, Davies, USA: Andriy Kryshtafovych Bohdan Monastyrskyy Krzysztof Fidelis CASP organizers Albert Einstein College of Medicine, New York, USA: Rojan Shrestha Eduardo Fajardo Nelson Gil