proteome analyst
DESCRIPTION
Proteome Analyst. Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors. Proteome Analyst. Duane Szafron, Paul Lu, Russell Greiner, David Wishart, Zhiyong Lu, Brett Poulin, Roman Eisner, John Anvik,Cam Macdonell. Proteome Analyst. Proteome - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/1.jpg)
Proteome Analyst
Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors
![Page 2: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/2.jpg)
Proteome Analyst
Duane Szafron, Paul Lu, Russell Greiner, David Wishart, Zhiyong Lu, Brett Poulin, Roman Eisner, John Anvik,Cam Macdonell
![Page 3: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/3.jpg)
Proteome Analyst
Proteomeone of many ‘-omes’set of all proteins in an organism
Analysisprediction of protein function or
localization from sequence data
![Page 4: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/4.jpg)
Analyze a Protein
We have examples of annotated proteins in various protein classes.
We have more examples of unannotated proteins.
![Page 5: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/5.jpg)
Analyze a Protein
We have examples of annotated proteins in various protein classes.
We have more examples of unannotated proteins.
What do we do?
![Page 6: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/6.jpg)
Analyze a Protein
We have examples of annotated proteins in various protein classes.
We have more examples of unannotated proteins.
What do we do? Find homologues to each protein and
assume similar function.
![Page 7: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/7.jpg)
Analyze a Protein
We have examples of annotated proteins in various protein classes.
We have more examples of unannotated proteins.
What do we do? Find homologues to each protein and
assume similar function. Find characteristics of each protein that affect
function.
![Page 8: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/8.jpg)
Analyzing Proteins
One Protein?
![Page 9: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/9.jpg)
Analyzing Proteins
One Protein?Just do it.
![Page 10: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/10.jpg)
Analyzing Proteins
One Protein?Just do it.
5 Proteins?
![Page 11: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/11.jpg)
Analyzing Proteins
One Protein?Just do it.
5 Proteins?Post-doc familiar with protein classes.
![Page 12: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/12.jpg)
Analyzing Proteins
One Protein?Just do it.
5 Proteins?Post-doc familiar with protein classes.
50 Proteins?
![Page 13: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/13.jpg)
Analyzing Proteins
One Protein?Just do it.
5 Proteins?Post-doc familiar with protein classes.
50 Proteins?grad student
![Page 14: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/14.jpg)
Analyzing Proteins
One Protein?Just do it.
5 Proteins?Post-doc familiar with protein classes.
50 Proteins?grad student
5000 proteins?
![Page 15: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/15.jpg)
Analyzing Proteins
One Protein?Just do it.
5 Proteins?Post-doc familiar with protein classes.
50 Proteins?grad student
5000 proteins?summer students
![Page 16: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/16.jpg)
Proteome Analyst
![Page 17: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/17.jpg)
Proteome Analyst
High-throughput Transparent Prediction of
Protein FunctionProtein LocalizationCustom Classification
![Page 18: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/18.jpg)
Machine Learning Task
TrainingINPUT: sequences, classesOUTPUT: Classifier
AnalysisINPUT: sequences, ClassifierOUTPUT: classes
![Page 19: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/19.jpg)
Machine Learning Task
TrainingINPUT: sequences, classesOUTPUT: Classifier
AnalysisINPUT: sequences, ClassifierOUTPUT: classes, explanation
![Page 20: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/20.jpg)
Training
INPUTsequences, classes
PA Toolssequences features
ML Algorithmfeatures, classes Classifier
OUTPUTClassifier
![Page 21: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/21.jpg)
Training: INPUT
>class A<Training Seq 1MVGSGLLWLALVSCILTQASAVQRGYGNPIEASSYGL...>class B<Training Seq 2LLDEPFRSTENSAGSQGCDKNMSGWYRFVGEGGVRMS...>class B<Training Seq 3EVIAYLRDPNCSSILQTEERNWVSVTSPVQASACRNI... ...
![Page 22: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/22.jpg)
Training: INPUT
>class A<Training Seq 1MVGSGLLWLALVSCILTQASAVQRGYGNPIEASSYGL...>class B<Training Seq 2LLDEPFRSTENSAGSQGCDKNMSGWYRFVGEGGVRMS...>class B<Training Seq 3EVIAYLRDPNCSSILQTEERNWVSVTSPVQASACRNI... ...
classes
protein sequences
![Page 23: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/23.jpg)
Training: PA Tools
sequences features
![Page 24: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/24.jpg)
Training: PA Tools
sequences featuresHomology Tools (BLAST)
sequence homologueshomologues annotationsannotations features
![Page 25: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/25.jpg)
Homology Tool
sequence features
sequence
homologues
annotations features
seq DB
BLAST
retrieve
parse
![Page 26: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/26.jpg)
Homology Tool
sequence features
sequence
homologues
annotations features
seq DB
BLAST
retrieve
parse
DBSOURCE swissprot: locus MPPB_NEUCR, ...xrefs (non-sequence databases): ...InterProIPR001431,...KEYWORDS Hydrolase; Metalloprotease; Zinc; Mitochondrion; Transit peptide; Oxidoreductase; Electron transport; Respiratory chain.
![Page 27: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/27.jpg)
Homology Tool
sequence features
sequence
homologues
annotations features
seq DB
BLAST
retrieve
parse
![Page 28: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/28.jpg)
Training: PA Tools
sequences featuresHomology Tools (BLAST)
sequence homologueshomologues annotationsannotations features
Pattern Tools (PFAM, ProSite, …)sequences motifsmotifs features
![Page 29: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/29.jpg)
Pattern Tool
sequence features
sequence
patterns
features
patternDB
find
parse
![Page 30: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/30.jpg)
Pattern Tool
sequence features
sequence
patterns
features
patternDB
find
parse
Pfam; PF00234; tryp_alpha_amyl; 1.PROSITE; PS00940; GAMMA_THIONIN; 1.PROSITE; PS00305; 11S_SEED_STORAGE; 1.
![Page 31: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/31.jpg)
Pattern Tool
sequence features
not included in current results
sequence
patterns
features
patternDB
find
parse
![Page 32: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/32.jpg)
Training: ML Algorithm
features, classes Classifier
![Page 33: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/33.jpg)
Training: ML Algorithm
features, classes Classifierany ML Algorithm may be useddefault = naïve Bayes
consistently near-best accuracy
(SVM, ANN slightly better)efficient (for high-throughput)easy to interpret
![Page 34: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/34.jpg)
Training: OUTPUT
Classifier
![Page 35: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/35.jpg)
Analysis (Classification)
INPUTsequences
PA Toolssequences features
Classifierfeatures classes, explanation
OUTPUTclasses
![Page 36: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/36.jpg)
Analysis: INPUT
>Seq 1DTILNINFQCAYPLDMKVSLQAALQPIVSSLNVSVDG...>Seq 2AVELSVESVLYVGAILEQGDTSRFNLVLRNCYATPTE...>Seq 3HVEENGQSSESRFSVQMFMFAGHYDLVFLHCEIHLCD... ...
![Page 37: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/37.jpg)
Analysis: INPUT
>Seq 1DTILNINFQCAYPLDMKVSLQAALQPIVSSLNVSVDG...>Seq 2AVELSVESVLYVGAILEQGDTSRFNLVLRNCYATPTE...>Seq 3HVEENGQSSESRFSVQMFMFAGHYDLVFLHCEIHLCD... ...
protein sequences
![Page 38: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/38.jpg)
Analysis: PA Tools
sequences features
![Page 39: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/39.jpg)
Analysis: PA Tools
sequences featuresHomology Tools (BLAST)
sequence homologueshomologues annotationsannotations features
Pattern Tools (PFAM, ProSite, …)sequences motifsmotifs features
![Page 40: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/40.jpg)
Analysis: Classification
features classes
![Page 41: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/41.jpg)
Analysis: Classification
features classesnaïve Bayes
returns probabilities of each class for each sequence
efficient (for high-throughput)easy to interpret
![Page 42: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/42.jpg)
Analysis: Classification
features classes, explanation
![Page 43: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/43.jpg)
Analysis: Classification
features classes, explanation
![Page 44: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/44.jpg)
Analysis: Classification
features classes, explanation
![Page 45: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/45.jpg)
Analysis: Classification
features classes, explanation
![Page 46: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/46.jpg)
Analysis: Classification
features classes, explanation
![Page 47: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/47.jpg)
Results: General Function
GeneQuiz classification5-fold x-val accuracy on 14 classes
![Page 48: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/48.jpg)
Results: General Function
GeneQuiz classification5-fold x-val accuracy on 14 classes
E. Coli (2370) 82.5%
Yeast (2359) 78.8%
Fly (3842) 76.6%
![Page 49: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/49.jpg)
Results: Specific Function
K+ Ion Channel Proteins5-fold x-val accuracy on
78 sequences, 4 classes
![Page 50: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/50.jpg)
Results: Specific Function
K+ Ion Channel Proteins5-fold x-val accuracy on
78 sequences, 4 classes
Accuracy
1st effort 97.4%
2nd effort 100%
![Page 51: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/51.jpg)
Results: Localization
Sub-cellular localization prediction 3146 sequences from 10 classes
![Page 52: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/52.jpg)
Results: Localization
Sub-cellular localization prediction 3146 sequences from 10 classes
Accuracy Coverage
Nair and Rost 81.5% 36.9%
Proteome Analyst 87.8% 100%
![Page 53: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/53.jpg)
Results
Sub-cellular localization prediction 3146 sequences from 10 classes
Accuracy Coverage
Nair and Rost 81.5% 36.9%
Proteome Analyst 87.8% 100%
![Page 54: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/54.jpg)
Proteome Analyst
High-throughput Transparent Prediction of
Protein FunctionProtein LocalizationCustom Classification
![Page 55: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/55.jpg)
Acknowledgements
Student developers Cynthia Luk Samer Nassar Kevin McKee
Biologists Warren Gallin Kathy Magor
Data Nair and Rost
![Page 56: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/56.jpg)
Acknowledgements
FundingPENCE – Protein Engineering
Network of Centres of ExcellenceNSERC - National Science and
Engineering Research CouncilSun MicrosystemsAICML - Alberta Ingenuity Centre for
Machine Learning
![Page 57: Proteome Analyst](https://reader033.vdocuments.mx/reader033/viewer/2022061614/5681500b550346895dbde6b1/html5/thumbnails/57.jpg)
Acknowledgements
Many ‘-ome’ jokesmy wife, Jen