protein evolution: sars coronavirus as an example
DESCRIPTION
CZ5225 Methods in Computational Biology Lecture 2-3: Protein Families and Family Prediction Methods Prof. Chen Yu Zong Tel: 6874-6877 Email: [email protected] http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS August 2004. Protein Evolution: SARS coronavirus as an example. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/1.jpg)
CZ5225 Methods in Computational BiologyCZ5225 Methods in Computational Biology
Lecture 2-3: Protein Families Lecture 2-3: Protein Families and Family Prediction Methodsand Family Prediction Methods
Prof. Chen Yu ZongProf. Chen Yu Zong
Tel: 6874-6877Tel: 6874-6877Email: Email: [email protected]@nus.edu.sghttp://xin.cz3.nus.edu.sghttp://xin.cz3.nus.edu.sg
Room 07-24, level 7, SOC1, NUSRoom 07-24, level 7, SOC1, NUSAugust 2004August 2004
![Page 2: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/2.jpg)
22
Protein Evolution: Protein Evolution: SARS coronavirus as an exampleSARS coronavirus as an example
![Page 3: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/3.jpg)
33
SARS CoronavirusSARS CoronavirusA novel coronavirusIdentified as the cause ofsevere respiratorysyndrome (SARS )
![Page 4: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/4.jpg)
44
SARS InfectionSARS Infection
How SARS coronavirus enters a cell and reproduce
![Page 5: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/5.jpg)
55
Protein EvolutionProtein Evolution
Generation of different species
![Page 6: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/6.jpg)
66
Protein Families• Sequence alignment-based families.
– Based on Principle of Sequence-structure-function-relationship.– Derived by multiple sequence alignment– Database: PFAM (Nucleic Acids Res. 30:276-280)
• Structure-based families.– Derived by visual inspection and comparison of structures– Database: SCOP (J. Mol. Biol. 247, 536-540)
• Functional Families.– Databases:
• G-protein coupled receptors: GPCRDB (Nucleic Acids Res. 29: 346-349), ORDB (Nucleic Acids Res. 30:354-360)
• Nuclear receptors: NucleaRDB (Nucleic Acids Res. 29: 346-349)• Enzymes: BRENDA (Nucleic Acids Res. 30, 47-49)• Transporters: TC-DB (Microbiol Mol Biol Rev. 64:354-411)• Ligand-gated ion channels: LGICdb (Nucleic Acids Res. 29: 294-295)• Therapeutic targets: TTD (Nucleic Acids Res. 30, 412-415)• Drug side-effect targets: DART (Drug Safety 26: 685-690)
![Page 7: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/7.jpg)
77
Protein Families
Sequence families =\= Structural families =\= Functional families
Sequence similar, structure different
Sequence different, structure similar
Sequence similar, function different (distantly related proteins)
Sequence different, function similar
Homework: find examples
![Page 8: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/8.jpg)
88
Protein Family Prediction Methods
Sequence alignment-based families:
• Multiple sequence alignment (HMM): HMMER; JMB 235, 1501-153; JMB 301, 173-190
Structure-based families:
• Visual inspection and comparison of structures
Functional Families.
• Statistical learning methods: – Neural network: ProtFun (Bioinformatics, 19:635-642)
– Support vector machines: SVMProt (Nucleic Acids Res., 31: 3692-3697)
![Page 9: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/9.jpg)
99
Sequence Comparison as a Sequence Comparison as a Mathematical Problem: Mathematical Problem:
Example:
Sequence a: ATTCTTGC
Sequence b: ATCCTATTCTAGC
Best Alignment: ATTCTTGC ATCCTATTCTAGC /|\ gap Bad Alignment: AT TCTT GC ATCCTATTCTAGC /|\ /|\ gap gap
Construction of many alignments => which is the best?
![Page 10: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/10.jpg)
1010
How to rate an alignment?How to rate an alignment?• Match: +8 (w(x, y) = 8, if x = y)
• Mismatch: -5 (w(x, y) = -5, if x ≠ y)
• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)
C - - - T T A A C TC G G A T C A - - T
+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12
Alignment score
![Page 11: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/11.jpg)
1111
Alignment GraphAlignment GraphSequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
![Page 12: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/12.jpg)
1212
An optimal alignmentAn optimal alignment-- the alignment of maximum score-- the alignment of maximum score
• Let A=a1a2…am and B=b1b2…bn .
• Si,j: the score of an optimal alignment between
a1a2…ai and b1b2…bj
• With proper initializations, Si,j can be computedas follows.
),(
),(
),(
max
1,1
1,
,1
,
jiji
jji
iji
ji
baws
bws
aws
s
![Page 13: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/13.jpg)
1313
Computing Computing SSi,ji,j
i
j
w(ai,-)
w(-,bj)
w(ai,bj)
Sm,n
![Page 14: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/14.jpg)
1414
InitializationsInitializations
0 -3 -6 -9 -12 -15 -18 -21 -24
-3
-6
-9
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
![Page 15: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/15.jpg)
1515
SS3,53,5 = = ??
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5 2 -1 -4 -7 -10 -13
-6 5 3 0 -3 7 4 1 -2
-9 2 0 -2 -5 ?
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
![Page 16: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/16.jpg)
1616
SS3,53,5 = = ??
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5 2 -1 -4 -7 -10 -13
-6 5 3 0 -3 7 4 1 -2
-9 2 0 -2 -5 5 -1 -4 9
-12 -1 -3 -5 6 3 0 7 6
-15 -4 -6 -8 3 1 -2 8 5
-18 -7 -9 -11 0 -2 9 6 3
-21 -10 -12 -14 -3 8 6 4 14
C G G A T C A T
C
T
T
A
A
C
T
optimal score
![Page 17: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/17.jpg)
1717
C T T A A C – TC T T A A C – TC G G A T C A TC G G A T C A T
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5 2 -1 -4 -7 -10 -13
-6 5 3 0 -3 7 4 1 -2
-9 2 0 -2 -5 5 -1 -4 9
-12 -1 -3 -5 6 3 0 7 6
-15 -4 -6 -8 3 1 -2 8 5
-18 -7 -9 -11 0 -2 9 6 3
-21 -10 -12 -14 -3 8 6 4 14
C G G A T C A T
C
T
T
A
A
C
T
8 – 5 –5 +8 -5 +8 -3 +8 = 14
![Page 18: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/18.jpg)
1818
Global Alignment vs. Local AlignmentGlobal Alignment vs. Local Alignment
• global alignment:
• local alignment:
![Page 19: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/19.jpg)
1919
An optimal local alignmentAn optimal local alignment
• Si,j: the score of an optimal local alignment ending at ai and bj
• With proper initializations, Si,j can be computedas follows.
),(
),(),(
0
max
1,1
1,
,1
,
jiji
jji
iji
ji
baws
bwsaws
s
![Page 20: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/20.jpg)
2020
local alignmentlocal alignment
0 0 0 0 0 0 0 0 0
0 8 5 2 0 0 8 5 2
0 5 3 0 0 8 5 3 13
0 2 0 0 0 8 5 2 11
0 0 0 0 8 5 3 ?
0
0
0
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 21: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/21.jpg)
2121
0 0 0 0 0 0 0 0 0
0 8 5 2 0 0 8 5 2
0 5 3 0 0 8 5 3 13
0 2 0 0 0 8 5 2 11
0 0 0 0 8 5 3 13 10
0 0 0 0 8 5 2 11 8
0 8 5 2 5 3 13 10 7
0 5 3 0 2 13 10 8 18
C G G A T C A T
C
T
T
A
A
C
T
The best
score
A – C - TA T C A T8-3+8-3+8 = 18
local alignmentlocal alignment
![Page 22: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/22.jpg)
2222
Multiple sequence alignment (MSA)Multiple sequence alignment (MSA)
• The multiple sequence alignment problem is to simultaneously align more than two sequences.
Seq1: GCTC
Seq2: AC
Seq3: GATC
GC-TC
A---C
G-ATC
![Page 23: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/23.jpg)
2323
How to score an MSA?How to score an MSA?
• Sum-of-Pairs (SP-score)
GC-TC
A---C
G-ATC
GC-TC
A---C
GC-TC
G-ATC
A---C
G-ATC
Score =
Score
Score
Score
+
+
![Page 24: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/24.jpg)
2424
Functional Classification by SVMFunctional Classification by SVM
• A protein is classified as either belong (+) or not belong (-) to a functional family
• By screening against all families, the function of this protein can be
identified (example: SVMProt)
• What is SVM? Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes.
• Advantage of SVM: Diversity of class members (no racial discrimination). Use of sequence-derived physico-chemical features as basis for classification. Suitable for functional family classifications.
![Page 25: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/25.jpg)
2525
SVM ReferencesSVM References
• C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Kluwer Academic Publishers,1998 (on-line).
• R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 5.11, hard-copy).
• S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Pres, 2001 (sections 3.6.2, 3.7.2, hard copy).
• Online lecture notes
![Page 26: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/26.jpg)
2626
Introduction to Machine LearningIntroduction to Machine Learning
Goal:
To “improve” (gaining knowledge, enhancing computing capability)
Tasks:
•Forming concepts by data generalization.•Compiling knowledge into compact form •Finding useful explanations for valid concepts.•Clustering data into classes.
Reference:
Machine Learning in Molecular Biology Sequence Analysis .
Internet links:
http://www.ai.univie.ac.at/oefai/ml/ml-resources.html
![Page 27: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/27.jpg)
2727
Introduction to Machine LearningIntroduction to Machine Learning
Category:
• Inductive learning.
• Forming concepts from data without a lot of knowledge from domain (learning from examples).
• Analytic learning.
• Use of existing knowledge to derive new useful concepts (explanation based learning).
• Connectionist learning.
• Use of artificial neural networks in searching for or representing of concepts.
• Genetic algorithms.
• To search for the most effective concept by means of Darwin’s “survival of the fittest” approach.
![Page 28: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/28.jpg)
2828
Machine Learning MethodsMachine Learning Methods Inductive learning:
Concept learning and example-based learning
Concept learning:
![Page 29: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/29.jpg)
2929
Machine Learning MethodsMachine Learning Methods Analytic
learning:
![Page 30: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/30.jpg)
3030
Machine Learning MethodsMachine Learning Methods Neural network:
![Page 31: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/31.jpg)
3131
Machine Learning MethodsMachine Learning Methods Genetic algorithms:
Strength
Pattern
Classification
![Page 32: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/32.jpg)
3232
![Page 33: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/33.jpg)
3333
SVMSVM
![Page 34: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/34.jpg)
3434
SVMSVM
![Page 35: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/35.jpg)
3535
SVMSVM
![Page 36: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/36.jpg)
3636
SVMSVM
![Page 37: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/37.jpg)
3737
SVMSVM
![Page 38: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/38.jpg)
3838
SVMSVM
![Page 39: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/39.jpg)
3939
SVMSVM
![Page 40: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/40.jpg)
4040
SVMSVM
![Page 41: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/41.jpg)
4141
SVMSVM
![Page 42: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/42.jpg)
4242
SVMSVM
![Page 43: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/43.jpg)
4343
SVMSVM
![Page 44: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/44.jpg)
4444
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
• Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties:– amino acid composition– Hydrophobicity– normalized Van der Waals volume– polarity,– Polarizability– Charge– surface tension– secondary structure– solvent accessibility
• Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties.
Nucleic Acids Res., 31: 3692-3697
![Page 45: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/45.jpg)
4545
SVM for Classification of ProteinsSVM for Classification of Proteins
Descriptors for amino acid composition of protein:
C=(53.33, 46.67)
T=(51.72)
D=(3.33, 16.67, 40.0, 66.67, 96.67, 6.67, 26.67, 60.0, 76.67, 100.0)
Nucleic Acids Res., 31: 3692-3697
![Page 46: Protein Evolution: SARS coronavirus as an example](https://reader036.vdocuments.mx/reader036/viewer/2022081503/56814d80550346895dbadedc/html5/thumbnails/46.jpg)
4646
CZ5225 Methods in Computational Biology Assignment 1Assignment 1
• Project 1: Protein family classification by SVM– Construction of training and testing datasets– Generating feature vectors– SVM classification and analysis.– Write a report and include a softcopy of your datasets
• Project 2: Develop a program of pair-wise sequence alignment using a simple scoring scheme. – Write a code in any programming language– Test it on a few examples (such as estrogen receptor and Progesterone
receptor)– Can you extend your program to multiple alignment?– Write a report and include a softcopy of your program