protein function prediction using structural homology
TRANSCRIPT
1
Protein Function Prediction Using Structural Homology
Kevin Drew,Lars Malmstroem,
Glenn Butterfoss,Richard Bonneau
AAGACUUCGGAUCUGGCGACACCC
UACACUUCGGAUGACACCAAAGUG
AGGUCUUCGGAACGGGCACCAUU
CCAACUUCGGAUUUUGCUACCAUA
AAGCCUUCGGAGCGGGCGUAACUC
1
2
6
t2t1
5
3
4
7
t4t3
8
t7t6
9
t5
t8
Evolution
Structure Function
4
Motivation: Genome Annotation
Cheaper sequencing technologies
New protein sequences
Proteins w/ unknown function
Shibu Yooseph et al. 2007 Plos Biology
4
Genome Annotation using Homology
Score = 107 bits (264), Expect = 6e-23 Identities = 63/160 (39%), Positives = 97/160 (60%), Gaps = 7/160 (4%)
Query:1 MSVMYKKILYPTDFSETAEIALKHVKTLKAEEVILLDEREIKKRDIFSLLLGVA 60 M M++K+L+PTDFSE A A++ + ++ EVILLDE +++ L+ G +Sbjct:1 MIFMFRKVLFPTDFSEGAYRAVEVFEKMEVGEVILLDEGTLEE-----LMDGYS 55...
=
Sequence Homology (40% - 60%):
Structural Homology: Function Annotations
5
Structural Homology: ExampleBacteriocin AS-48, Casp 4
1E68 1NKLGYFCESCRKIIQKLEDMVGPQPNEDTVTQAASQVCDKLKILRGLCKKIMRSFLRRISWDILTGKKPQAICVDIKICKE
MAKEFGIPAAVAGTVLNVVEAGGWVTTIVSILTAVGSGGLSLLAAAGRESIKAYLKKEIKKKGKRAVIAW 4%=
=
Cyclic Bacterial Lysin = NK Lysin
Structure:
Function:
Sequence:
Bonneau, R., Tsai, J., Ruczinski, I., Baker, D. Functional Inferences from Blind ab Initio Protein Structure Predictions. J. Structural Biology. (2001)
Rosetta
Local Sequence Bias
Non-local Interactions
CC
R
N
F Y
N
Hq
HH d
Experimental,
Kevin Drew, Chivian, D., Bonneau, R. Ab initio structure prediction. (In) Bourne, P.E. (2007) Structural Bioinformatics (Methods of Biochemical Analysis, V. 44). New York: John Wiley & Sons; ISBN: 0471201995. Second Edition.
Bacterial and Archaea:Bonneau, & Baliga. (2004)Genome Biology:Annotaion of Halobacterium NRC-1identification of transcription factorsrole of chemotaxis sensing domains
Yeast:Malstroem, Baker, Bonneau (2006) Plos Biology
Human & others:Bonneau, Malstroem, IBM: Human and others (in Progress)
Completed and ongoing projects
worldcommunitygrid.org & grid.org
collaborators: Lars Malmstroem, Viktors Berstis, Mike Riffle, Leroy Hood, David Baker
9
Gene Ontology (GO)
Molecular Function GO DAG
specificity
Molecular Function GO:0003674
Binding GO:0005488
Protein Binding GO:0005515
Clathrin Binding GO:0030276
10
Additional Evidence of Function for Integration with Structure
• GO Biological Process • GO Cellular Component• Experimental Data
• Mass Spec Pull Down• Fluorescent Localization"
• Generally boosts confidence of predictions
11
Protein Domain Prediction
TransMembrane Helix
Signal Peptides
Disorderedprediction
TM regions igna lpeptide
Domain 1PDB Domain 2 Domain 3
RosettaTM regions igna lpeptide
Query Sequence
PSIBLASTPDB
Fold Recognition
MSA/Pfam/Unassigned Ginzu
12
Function Prediction Overview
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
13
Function Prediction Overview
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
14
Function Prediction Overview
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
15
Function Prediction Overview
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
16
Matching Predicted Structures to
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
17
Training Data Derived from GO and
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
GO: 1.6 million sequences
Cluster Centers280,511
GO + AstralBlast-hits: 643,173
Removal of Benchmark< 280,511
18
Naïve Bayes
• In words: what is the probability that a variable, y, is true given features, x, over the probability y is false given the features x.– Take the log and if its >0 its more likely to be true than false.
• y = molecular function and x = {sf, bp, cc}
19
Full Function Prediction Formula
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
Naive Bayes
20
Full Function Prediction Formula
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
Naive Bayes
Structure Contribution
21
Full Function Prediction Formula
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
Naive Bayes
22
Full Function Prediction Formula
...
Known StructuresPredicted Structures
P(Structure)
Gene Ontology Terms
P(Function|Structure)
P(Function)
Naive Bayes
Additional Evidence
Prior
23
Results: Solved StructuresHow accurate are we when we predict SCOP Superfamily for PDB Structures?
24
How accurate are we when we predict Structure for Swissprot Proteins?
MCM ScoreGrey = all mcm scores, seagreen = correct based on since solved, KS-test: D=0.67 p-value= 0e+00
Grey Bar = total number of structure predictionsGreen Bar = number of correct structure predictions% above bar = percent correct
Low Confidence High Confidence
25
How accurate are our function predictions using structure only?
Log Likelihood Ratio
3083 Domains
Grey Bar = total number of function predictionsGreen Bar = number of correct functions predictions% above bar = percent correct
Low Confidence High Confidence
26
What does structure provide over GO process alone?
Log Likelihood Ratio
Domain Coverage: 3083 domains
Process (orange) Process & Structure (green)# = number of domains
Low Confidence High Confidence
27
Uniqueness and Specificity of GO Functions
Swissprot LLR >= 2
Unique Functions by Evidence
GO:0005198 structural molecule activity 0.03GO:0003735 structural constituent of ribosome 0.02GO:0003676 nucleic acid binding 0.17GO:0003723 RNA binding 0.04GO:0016491 oxidoreductase activity 0.16GO:0046872 metal ion binding 0.11GO:0016787 hydrolase activity 0.24GO:0043167 ion binding 0.12GO:0043169 cation binding 0.11GO:0005509 calcium ion binding 0.01…GO:0004550 nucleoside diphosphate kinase activity 0.0009GO:0005496 steroid binding 0.001GO:0042379 chemokine receptor binding 0.0006GO:0030234 enzyme regulator activity 0.01GO:0016788 hydrolase activity, acting on ester bonds 0.04GO:0008289 lipid binding 0.005GO:0004812 aminoacyl-tRNA ligase activity 0.01GO:0005506 iron ion binding 0.03GO:0005216 ion channel activity 0.003
GO ID GO Name Percent of Genes with Terms
28
AcknowledgementsBonneau LabGlenn ButterfossThadeous KacmarczykPeter WaltmanAviv MadarKevin BelascoAlex PineRichard Bonneau
NYUSasha LevyPeter McKenney Jane CarltonDennis ShashaKris GunsalusFabio PianoPatrick EichenbergerBiology Department
University of WashingtonLars MalmstroemDavid BakerTrisha N. DavisMichael RiffleYeast Resource Center
IBMViktors BerstisKeith J UplingerBill Boverman
FundingDODDOENSF
Rosetta-Commons
Data & Results: http://www.yeastrc.org/pdr/