![Page 1: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/1.jpg)
LSM3241: Bioinformatics and BiocomputingLSM3241: Bioinformatics and Biocomputing
Lecture 3: Machine learning method for Lecture 3: Machine learning method for protein function predictionprotein function prediction
Prof. Chen Yu ZongProf. Chen Yu Zong
Tel: 6516-6877Tel: 6516-6877Email: Email: [email protected]@nus.edu.sg
http://http://bidd.nus.edu.sgbidd.nus.edu.sgRoom 07-24, level 7, SOC1, Room 07-24, level 7, SOC1,
National University of Singapore National University of Singapore
![Page 2: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/2.jpg)
22
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
![Page 3: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/3.jpg)
33
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
![Page 4: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/4.jpg)
44
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
![Page 5: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/5.jpg)
55
Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM
• A protein is classified as either belong (+) or not belong (-) to a functional family
• By screening against all families, the function of this protein can be identified (example: SVMProt)
Protein
Family-1 SVM
Family-2 SVM
Family-3 SVM
Protein belongs toFamily-3
-
-
+
--
![Page 6: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/6.jpg)
66
Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM
What is SVM? • Support vector machines, a machine learning method,
learning by examples, statistical learning, classify objects into one of the two classes.
Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of sequence-derived physico-chemical features as
basis for classification. • Suitable for functional classification of novel proteins
(distantly-related proteins, homologous proteins of different functions).
![Page 7: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/7.jpg)
77
Machine Learning MethodMachine Learning Method Inductive learning:
Example-based learning
Descriptor
Positive examples
Negative examples
![Page 8: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/8.jpg)
88
Machine Learning MethodMachine Learning Method
A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)
Feature vectors: Descriptor
Feature vector
Positive examples
Negative examples
![Page 9: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/9.jpg)
99
SVM MethodSVM Method Feature vectors in input space:
A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)
Z
Input space
X
Y
BAE
F
Feature vector
![Page 10: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/10.jpg)
1010
SVM MethodSVM Method
BorderNew border
Project to a higher dimensional space
Protein familymembers
Nonmembers
Protein familymembers
Nonmembers
![Page 11: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/11.jpg)
1111
SVM methodSVM method
Support vector
Support vector
New border
Protein familymembers
Nonmembers
![Page 12: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/12.jpg)
1212
SVM MethodSVM Method
Protein familymembers
Nonmembers
New border
Support vector
Support vector
![Page 13: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/13.jpg)
1313
SVM MethodSVM Method
Border line is nonlinear
![Page 14: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/14.jpg)
1414
SVM methodSVM method
Non-linear transformation: use of kernel function
![Page 15: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/15.jpg)
1515
SVM methodSVM method
Non-linear transformation
![Page 16: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/16.jpg)
1616
SVM MethodSVM Method
![Page 17: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/17.jpg)
1717
SVM MethodSVM Method
![Page 18: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/18.jpg)
1818
SVM MethodSVM Method
![Page 19: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/19.jpg)
1919
SVM MethodSVM Method
![Page 20: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/20.jpg)
2020
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
• Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties:– amino acid composition– Hydrophobicity– normalized Van der Waals volume– polarity,– Polarizability– Charge– surface tension– secondary structure– solvent accessibility
• Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties.
Nucleic Acids Res., 31: 3692-3697
![Page 21: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/21.jpg)
2121
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
![Page 22: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/22.jpg)
2222
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
From protein sequence:
To Feature vector :
(C_amino acid composition, T_ amino acid composition, D_ amino acid composition, C_hydrophobicity, T_hydrophobicity, D_hydrophobicity, … )
Nucleic Acids Res., 31: 3692-3697
![Page 23: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/23.jpg)
Protein function prediction software SVMProtProtein function prediction software SVMProtUseful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions
Your protein sequence
Computer loaded Computer loaded with SVMProtwith SVMProt
Support vector machinesSupport vector machinesclassifier for every classifier for every
protein functional familyprotein functional family
Identified Identified Functional familiesFunctional families
Protein functionalProtein functionalindicationsindications
Send sequence to classifierSend sequence to classifier
Nucl. Acids Res. 31, 3692-3697 (2003)
Input sequencethrough internet
Option 2Option 1
Input sequenceon local machine
http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi
Your protein sequence
Which functional Which functional families your protein families your protein
belong to?belong to?
![Page 24: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/24.jpg)
Protein function prediction software SVMProtProtein function prediction software SVMProt
Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions.
Protein families covered:
46 enzyme families, 3 receptor families, 4 transporter and channel families, 6 DNA- and RNA-binding families, 8 structural families, 2 regulator/factor families.
SVMProt web-version at:http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi
Nucl. Acids Res. 31, 3692-3697 (2003)
![Page 25: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/25.jpg)
Protein function prediction software SVMProtProtein function prediction software SVMProt
Nucl. Acids Res. 31, 3692-3697 (2003)
Check covered protein families here
Input sequence here
Check format here
![Page 26: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/26.jpg)
Protein function prediction software SVMProtProtein function prediction software SVMProt
Nucl. Acids Res. 31, 3692-3697 (2003)
Probability of correct prediction
Prediction score
![Page 27: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649e7d5503460f94b8010b/html5/thumbnails/27.jpg)
2727
Summary of Today’s lectureSummary of Today’s lecture
• Machine learning method for protein function prediction.
• Use of SVMProt for probing protein function