![Page 1: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/1.jpg)
Predicting Secondary Structure of All-Helical Proteins Using
Hidden Markov Support Vector Machines
Blaise Gassend, Charles W. O'Donnell, William Thies, Andrew Lee,
Marten van Dijk, and Srinivas Devadas
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
Workshop on Pattern Recognition in Bioinformatics – August 20, 2006
![Page 2: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/2.jpg)
Protein Structure Prediction• Classical problem: given sequence, predict structure
• High-level approaches1. Energy-minimization (ab-initio) techniques
- Elegant, but often lack correct parameters
2. Homology-based techniques
- Useful, but hard to predict new proteins
Sequence Structure
Our approach:Use energy minimization, butlearn parameters from existing proteins
![Page 3: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/3.jpg)
Our Framework (Training)
Amino-acidSequence
Prediction Algorithm
Correctstructure
Protein Data Bank
EnergyParameters
Predictedstructure
correct incorrect
Done! Constraints
energy(incorrect) > energy(correct)
LearningAlgorithm
![Page 4: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/4.jpg)
Our Framework (Testing)
EnergyParameters
Predictedstructure
Prediction Algorithm
Amino-acidSequence
![Page 5: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/5.jpg)
Initial Focus: Secondary Structure• Classify each residue as alpha helix, beta strand, coil
– In this paper, restrict to all-alpha proteins
• Applications:– Informing tertiary structure predictors– Identification of homologous proteins– Identification of active sites (coils)
![Page 6: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/6.jpg)
50%
60%
70%
80%
90%
100%
1975 1980 1985 1990 1995 2000 2005 2010
Year
Pre
dic
tio
n A
cc
ura
cy
(Q
3)
Secondary Structure Predictors
![Page 7: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/7.jpg)
DSCZvelebil et al.
GOR
Chou/Fasman50%
60%
70%
80%
90%
100%
1975 1980 1985 1990 1995 2000 2005 2010
Year
Pre
dic
tio
n A
cc
ura
cy
(Q
3)
Secondary Structure Predictors
Statistical Methods
HMMs
SequenceOnly
Sequence +Alignment
Statistical Methods
SequenceOnly
Sequence +Alignment
![Page 8: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/8.jpg)
Chou/Fasman
GOR
Zvelebil et al. DSC
SSPro
PSIPredPorter
SSPro4PetersonPSIPred
Riis/KroughPHD
Qian/Sejnoweski
50%
60%
70%
80%
90%
100%
1975 1980 1985 1990 1995 2000 2005 2010
Year
Pre
dic
tio
n A
cc
ura
cy
(Q
3)
Secondary Structure Predictors
Statistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
Statistical Methods
Neural Networks
SequenceOnly
Sequence +Alignment
![Page 9: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/9.jpg)
Chou/Fasman
GOR
Zvelebil et al. DSC
SSPro
PSIPredPorter
SSPro4PetersonPSIPred
Riis/KroughPHD
Qian/Sejnoweski
HuNguyen
KimWard
Ceroni
CasbonHua/Sun
50%
60%
70%
80%
90%
100%
1975 1980 1985 1990 1995 2000 2005 2010
Year
Pre
dic
tio
n A
cc
ura
cy
(Q
3)
Secondary Structure Predictors
Statistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
Statistical Methods
Neural Networks
SequenceOnly
Sequence +Alignment
SVMs
![Page 10: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/10.jpg)
DSCZvelebil et al.
GOR
Chou/Fasman
Qian/Sejnoweski
PHD Riis/Krough
PSIPredPeterson
PSIPredPorter
SSPro4
SSPro
Schmidler et al.
HMMSTR
Nguyen
Martin
Won
Martin
Hua/SunCasbon
CeroniWard
Kim
NguyenHu
50%
60%
70%
80%
90%
100%
1975 1980 1985 1990 1995 2000 2005 2010
Year
Pre
dic
tio
n A
cc
ura
cy
(Q
3)
Secondary Structure Predictors
Statistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
Statistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
SVMs
![Page 11: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/11.jpg)
DSCZvelebil et al.
GOR
Chou/Fasman
Qian/Sejnoweski
PHD Riis/Krough
PSIPredPeterson
PSIPredPorter
SSPro4
SSPro
Schmidler et al.
HMMSTR
Nguyen
Martin
Won
Martin
Hua/SunCasbon
CeroniWard
Kim
NguyenHu
50%
60%
70%
80%
90%
100%
1975 1980 1985 1990 1995 2000 2005 2010
Year
Pre
dic
tio
n A
cc
ura
cy
(Q
3)
Secondary Structure Predictors
• Exploits biochemical models• Offers biological insight
Statistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
Statistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
SVMs1400-2900 parameters
680 MB of support vectors
471 parameters
![Page 12: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/12.jpg)
DSCZvelebil et al.
GOR
Chou/Fasman
Qian/Sejnoweski
PHD Riis/Krough
PSIPredPeterson
PSIPredPorter
SSPro4
SSPro
Schmidler et al.
HMMSTR
Nguyen
Martin
THISPAPER
Won
Martin
Hua/SunCasbon
CeroniWard
Kim
NguyenHu
50%
60%
70%
80%
90%
100%
1975 1980 1985 1990 1995 2000 2005 2010
Year
Pre
dic
tio
n A
cc
ura
cy
(Q
3)
Secondary Structure Predictors
302 paramsStatistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
Statistical Methods
Neural Networks
HMMs
SequenceOnly
Sequence +Alignment
SVMs1400-2900 parameters
471 parameters• Exploits biochemical models• Offers biological insight
680 MB of support vectors
![Page 13: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/13.jpg)
Our Framework Applied to Helix Prediction
Amino-acidSequence
Correctstructure
Protein Data Bank
EnergyParameters
Predictedstructure
correct incorrect
Done! Constraints
energy(incorrect) > energy(correct)
LearningAlgorithm
Prediction Algorithm
HiddenMarkov Model
SupportVector
Machines
Alpha Helices
MNIFEMLRIDEGL HHHHHHHHH
![Page 14: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/14.jpg)
Energy Parameters
Description of Energy ParametersNumber of Parameters
Name
Energy of residue R in a helix 20 HR
Energy of residue R at offset i (-3…3) from N-cap 140 NR,i
Energy of residue R at offset i (-3…3) from C-cap 140 CR,i
Penalty for coils of length 1 or 2 2302 Total
![Page 15: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/15.jpg)
Energy Parameters
Description of Energy ParametersNumber of Parameters
Name
Energy of residue R in a helix 20 HR
Energy of residue R at offset i (-3…3) from N-cap 140 NR,i
Energy of residue R at offset i (-3…3) from C-cap 140 CR,i
Penalty for coils of length 1 or 2 2
• Example:Sequence: MNIFELRIDEGL
Structure: HHHHHH
Energy =
302 Total
![Page 16: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/16.jpg)
Energy Parameters
Description of Energy ParametersNumber of Parameters
Name
Energy of residue R in a helix 20 HR
Energy of residue R at offset i (-3…3) from N-cap 140 NR,i
Energy of residue R at offset i (-3…3) from C-cap 140 CR,i
Penalty for coils of length 1 or 2 2
• Example:Sequence: MNIFELRIDEGL
Structure: HHHHHH
Energy = HF + HE + HL + HR + HI + HD (Helix)
302 Total
![Page 17: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/17.jpg)
Energy Parameters
Description of Energy ParametersNumber of Parameters
Name
Energy of residue R in a helix 20 HR
Energy of residue R at offset i (-3…3) from N-cap 140 NR,i
Energy of residue R at offset i (-3…3) from C-cap 140 CR,i
Penalty for coils of length 1 or 2 2
• Example:Sequence: MNIFELRIDEGL
Structure: HHHHHH
Energy = HF + HE + HL + HR + HI + HD (Helix)
+ NM,-3 + NN,-2 + NI,-1 + NF,0 + NE,1 + NL,2 + NR,3 (N-cap)
302 Total
![Page 18: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/18.jpg)
Energy Parameters
Description of Energy ParametersNumber of Parameters
Name
Energy of residue R in a helix 20 HR
Energy of residue R at offset i (-3…3) from N-cap 140 NR,i
Energy of residue R at offset i (-3…3) from C-cap 140 CR,i
Penalty for coils of length 1 or 2 2
• Example:Sequence: MNIFELRIDEGL
Structure: HHHHHH
Energy = HF + HE + HL + HR + HI + HD (Helix)
+ NM,-3 + NN,-2 + NI,-1 + NF,0 + NE,1 + NL,2 + NR,3 (N-cap)
+ CL,-3 + CR,-2 + CI,-1 + CD,0 + CE,1 + CG,2 + CL,3 (C-cap)
302 Total
![Page 19: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/19.jpg)
Energy Parameters
Description of Energy ParametersNumber of Parameters
Name
Energy of residue R in a helix 20 HR
Energy of residue R at offset i (-3…3) from N-cap 140 NR,i
Energy of residue R at offset i (-3…3) from C-cap 140 CR,i
Penalty for coils of length 1 or 2 2
• Example:Sequence: MNIFELRIDEGL
Structure: HHHHHH
Energy = HF + HE + HL + HR + HI + HD (Helix)
+ NM,-3 + NN,-2 + NI,-1 + NF,0 + NE,1 + NL,2 + NR,3 (N-cap)
+ CL,-3 + CR,-2 + CI,-1 + CD,0 + CE,1 + CG,2 + CL,3 (C-cap)
302 Total
![Page 20: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/20.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structureCorrect structure
Energy ( ) = HA*A + HG*G
= w ¢ [A G]
Highest energy in direction of energy parameters w
Feature Space
where w represents the energy parameters [HA HG]
![Page 21: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/21.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Feature Space
w
Legal structureCorrect structure
Energy ( ) = HA*A + HG*G
= w ¢ [A G]
Highest energy in direction of energy parameters w
where w represents the energy parameters [HA HG]
![Page 22: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/22.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture
![Page 23: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/23.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
1. Predict stucture
![Page 24: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/24.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
Separating Hyperplane
1. Predict stucture2. Refine parameters
![Page 25: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/25.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
Separating Hyperplane
1. Predict stucture2. Refine parameters
![Page 26: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/26.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture2. Refine parameters
![Page 27: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/27.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture2. Refine parameters3. Predict structure
![Page 28: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/28.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
1. Predict stucture2. Refine parameters3. Predict structure
![Page 29: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/29.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters
![Page 30: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/30.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters
![Page 31: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/31.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters
![Page 32: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/32.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure
![Page 33: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/33.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure
![Page 34: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/34.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure6. Refine parameters
![Page 35: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/35.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure6. Refine parameters
![Page 36: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/36.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure6. Refine parameters
![Page 37: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/37.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
Structurealready predicted
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure6. Refine parameters7. Predict structure
![Page 38: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/38.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
Structurealready predicted
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure6. Refine parameters7. Predict structure8. Terminate
![Page 39: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/39.jpg)
Learning the Parameters
A: # of Alanines in Helices
G:
# o
f G
lyci
nes
in H
elic
es
Legal structure
Feature Space
Correct structurePredicted structure
w
Structurealready predicted
1. Predict stucture2. Refine parameters3. Predict structure4. Refine parameters5. Predict structure6. Refine parameters7. Predict structure8. Terminate
Details in paper: - How to converge faster - Early termination condition - [Tsochantaridis et al., ICML’02]
![Page 40: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/40.jpg)
Experimental Methodology• Data set: 300 non-homologous all-alpha proteins
– From EVA’s sequence-unique subset of the PDB, July 2005– Only consider alpha helices (“H” symbol in DSSP)
• Randomly split into 150 training, 150 test proteins
![Page 41: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/41.jpg)
Results
• Comparison to others– Best HMM method to date that does not utilize alignment info
• Offers 3.5% (Q), 0.2% (SOV) over previous best
– Lags behind neural networks; e.g., Porter overall SOV = 76.6%– However, we could likely gain 6-8% from alignment profiles
• Caveats– Moving beyond all-alpha proteins, we could suffer 3%– By considering 3/10 helices, we could decrease 2%
Metric Value Explanation
Q 77.6% percent of residues correctly predicted
SOV 73.4% segment overlap measure [Zemla’99]
[Nguyen02]
[Rost93]
[Jones99]
![Page 42: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/42.jpg)
Conclusions• Represents first step toward learning biophysical
parameters for energy minimization techniques– Iterative, demand-driven learning process using SVMs
• Promising results on alpha-helix prediction
– 77.6% among best Q for methods without alignment info
• Future work: super-secondary structure– Will predict full “contact maps” rather than 3-state labels– For beta sheets, replace HMMs by multi-tape grammars
http://protein.csail.mit.edu/
![Page 43: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/43.jpg)
Extra Slides
![Page 44: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/44.jpg)
Prediction Algorithm• Parameters represent energetic benefit
of a given feature in a protein structure– Features are fixed, chosen by designer– Example features:
• Number of prolines in an alpha helix• Number of coils shorter than 2 residues
• Energy (structure) = features 2 structure Energy (feature)
• Minimal-energy structure found with dynamic prog.– Idea: consider all structures, exploiting overlapping problems– Implemented as HMM using Viterbi algorithm
Amino-acidSequence
EnergyParameters
Predictedstructure
Prediction Algorithm
Structure withMinimal Energy
![Page 45: Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,](https://reader030.vdocuments.mx/reader030/viewer/2022032708/56649e7d5503460f94b80c80/html5/thumbnails/45.jpg)
Learning Algorithm• Constraints have form:
For all incorrectly predicted structures Si,
in future selection of the parameters w:
Energyw (Si) > Energyw (correct structure)
Constraints are linear in the energy parameters.
• If feasible, could solve with linear programming
• In general, solve with Support Vector Machines (SVMs)
– Energy(Si) ¸ Energy (correct structure) + 1 - i (i ¸ 0)
– Find parameters w minimizing ½ ||w||2 + C/n i=1 i
EnergyParameters
Constraintsenergy(incorrect) > energy(correct)
LearningAlgorithm
n
Provides general solution using soft-margin criterion