protein function prediction using structural homology

28
1 Protein Function Prediction Using Structural Homology Kevin Drew, Lars Malmstroem, Glenn Butterfoss, Richard Bonneau

Upload: others

Post on 16-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein Function Prediction Using Structural Homology

1

Protein Function Prediction Using Structural Homology

Kevin Drew,Lars Malmstroem,

Glenn Butterfoss,Richard Bonneau

Page 2: Protein Function Prediction Using Structural Homology

AAGACUUCGGAUCUGGCGACACCC

UACACUUCGGAUGACACCAAAGUG

AGGUCUUCGGAACGGGCACCAUU

CCAACUUCGGAUUUUGCUACCAUA

AAGCCUUCGGAGCGGGCGUAACUC

1

2

6

t2t1

5

3

4

7

t4t3

8

t7t6

9

t5

t8

Evolution

Structure Function

Page 3: Protein Function Prediction Using Structural Homology

4

Motivation: Genome Annotation

Cheaper sequencing technologies

New protein sequences

Proteins w/ unknown function

Shibu Yooseph et al. 2007 Plos Biology

Page 4: Protein Function Prediction Using Structural Homology

4

Genome Annotation using Homology

Score = 107 bits (264), Expect = 6e-23 Identities = 63/160 (39%), Positives = 97/160 (60%), Gaps = 7/160 (4%)

Query:1 MSVMYKKILYPTDFSETAEIALKHVKTLKAEEVILLDEREIKKRDIFSLLLGVA 60 M M++K+L+PTDFSE A A++ + ++ EVILLDE +++ L+ G +Sbjct:1 MIFMFRKVLFPTDFSEGAYRAVEVFEKMEVGEVILLDEGTLEE-----LMDGYS 55...

=

Sequence Homology (40% - 60%):

Structural Homology: Function Annotations

Page 5: Protein Function Prediction Using Structural Homology

5

Structural Homology: ExampleBacteriocin AS-48, Casp 4

1E68 1NKLGYFCESCRKIIQKLEDMVGPQPNEDTVTQAASQVCDKLKILRGLCKKIMRSFLRRISWDILTGKKPQAICVDIKICKE

MAKEFGIPAAVAGTVLNVVEAGGWVTTIVSILTAVGSGGLSLLAAAGRESIKAYLKKEIKKKGKRAVIAW 4%=

=

Cyclic Bacterial Lysin = NK Lysin

Structure:

Function:

Sequence:

Bonneau, R., Tsai, J., Ruczinski, I., Baker, D. Functional Inferences from Blind ab Initio Protein Structure Predictions. J. Structural Biology. (2001)

Page 6: Protein Function Prediction Using Structural Homology

Rosetta

Local Sequence Bias

Non-local Interactions

CC

R

N

F Y

N

Hq

HH d

Experimental,

Kevin Drew, Chivian, D., Bonneau, R. Ab initio structure prediction. (In) Bourne, P.E. (2007) Structural Bioinformatics (Methods of Biochemical Analysis, V. 44). New York: John Wiley & Sons; ISBN: 0471201995. Second Edition.

Page 7: Protein Function Prediction Using Structural Homology

Bacterial and Archaea:Bonneau, & Baliga. (2004)Genome Biology:Annotaion of Halobacterium NRC-1identification of transcription factorsrole of chemotaxis sensing domains

Yeast:Malstroem, Baker, Bonneau (2006) Plos Biology

Human & others:Bonneau, Malstroem, IBM: Human and others (in Progress)

Completed and ongoing projects

Page 8: Protein Function Prediction Using Structural Homology

worldcommunitygrid.org & grid.org

collaborators: Lars Malmstroem, Viktors Berstis, Mike Riffle, Leroy Hood, David Baker

Page 9: Protein Function Prediction Using Structural Homology

9

Gene Ontology (GO)

Molecular Function GO DAG

specificity

Molecular Function GO:0003674

Binding GO:0005488

Protein Binding GO:0005515

Clathrin Binding GO:0030276

Page 10: Protein Function Prediction Using Structural Homology

10

Additional Evidence of Function for Integration with Structure

• GO Biological Process • GO Cellular Component• Experimental Data

• Mass Spec Pull Down• Fluorescent Localization"

• Generally boosts confidence of predictions

Page 11: Protein Function Prediction Using Structural Homology

11

Protein Domain Prediction

TransMembrane Helix

Signal Peptides

Disorderedprediction

TM regions igna lpeptide

Domain 1PDB Domain 2 Domain 3

RosettaTM regions igna lpeptide

Query Sequence

PSIBLASTPDB

Fold Recognition

MSA/Pfam/Unassigned Ginzu

Page 12: Protein Function Prediction Using Structural Homology

12

Function Prediction Overview

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Page 13: Protein Function Prediction Using Structural Homology

13

Function Prediction Overview

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Page 14: Protein Function Prediction Using Structural Homology

14

Function Prediction Overview

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Page 15: Protein Function Prediction Using Structural Homology

15

Function Prediction Overview

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Page 16: Protein Function Prediction Using Structural Homology

16

Matching Predicted Structures to

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Page 17: Protein Function Prediction Using Structural Homology

17

Training Data Derived from GO and

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

GO: 1.6 million sequences

Cluster Centers280,511

GO + AstralBlast-hits: 643,173

Removal of Benchmark< 280,511

Page 18: Protein Function Prediction Using Structural Homology

18

Naïve Bayes

• In words: what is the probability that a variable, y, is true given features, x, over the probability y is false given the features x.– Take the log and if its >0 its more likely to be true than false.

• y = molecular function and x = {sf, bp, cc}

Page 19: Protein Function Prediction Using Structural Homology

19

Full Function Prediction Formula

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Naive Bayes

Page 20: Protein Function Prediction Using Structural Homology

20

Full Function Prediction Formula

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Naive Bayes

Structure Contribution

Page 21: Protein Function Prediction Using Structural Homology

21

Full Function Prediction Formula

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Naive Bayes

Page 22: Protein Function Prediction Using Structural Homology

22

Full Function Prediction Formula

...

Known StructuresPredicted Structures

P(Structure)

Gene Ontology Terms

P(Function|Structure)

P(Function)

Naive Bayes

Additional Evidence

Prior

Page 23: Protein Function Prediction Using Structural Homology

23

Results: Solved StructuresHow accurate are we when we predict SCOP Superfamily for PDB Structures?

Page 24: Protein Function Prediction Using Structural Homology

24

How accurate are we when we predict Structure for Swissprot Proteins?

MCM ScoreGrey = all mcm scores, seagreen = correct based on since solved, KS-test: D=0.67 p-value= 0e+00

Grey Bar = total number of structure predictionsGreen Bar = number of correct structure predictions% above bar = percent correct

Low Confidence High Confidence

Page 25: Protein Function Prediction Using Structural Homology

25

How accurate are our function predictions using structure only?

Log Likelihood Ratio

3083 Domains

Grey Bar = total number of function predictionsGreen Bar = number of correct functions predictions% above bar = percent correct

Low Confidence High Confidence

Page 26: Protein Function Prediction Using Structural Homology

26

What does structure provide over GO process alone?

Log Likelihood Ratio

Domain Coverage: 3083 domains

Process (orange) Process & Structure (green)# = number of domains

Low Confidence High Confidence

Page 27: Protein Function Prediction Using Structural Homology

27

Uniqueness and Specificity of GO Functions

Swissprot LLR >= 2

Unique Functions by Evidence

GO:0005198 structural molecule activity 0.03GO:0003735 structural constituent of ribosome 0.02GO:0003676 nucleic acid binding 0.17GO:0003723 RNA binding 0.04GO:0016491 oxidoreductase activity 0.16GO:0046872 metal ion binding 0.11GO:0016787 hydrolase activity 0.24GO:0043167 ion binding 0.12GO:0043169 cation binding 0.11GO:0005509 calcium ion binding 0.01…GO:0004550 nucleoside diphosphate kinase activity 0.0009GO:0005496 steroid binding 0.001GO:0042379 chemokine receptor binding 0.0006GO:0030234 enzyme regulator activity 0.01GO:0016788 hydrolase activity, acting on ester bonds 0.04GO:0008289 lipid binding 0.005GO:0004812 aminoacyl-tRNA ligase activity 0.01GO:0005506 iron ion binding 0.03GO:0005216 ion channel activity 0.003

GO ID GO Name Percent of Genes with Terms

Page 28: Protein Function Prediction Using Structural Homology

28

AcknowledgementsBonneau LabGlenn ButterfossThadeous KacmarczykPeter WaltmanAviv MadarKevin BelascoAlex PineRichard Bonneau

NYUSasha LevyPeter McKenney Jane CarltonDennis ShashaKris GunsalusFabio PianoPatrick EichenbergerBiology Department

University of WashingtonLars MalmstroemDavid BakerTrisha N. DavisMichael RiffleYeast Resource Center

IBMViktors BerstisKeith J UplingerBill Boverman

FundingDODDOENSF

Rosetta-Commons

Data & Results: http://www.yeastrc.org/pdr/