machine learning methods of protein secondary structure prediction presented by chao wang

22
Machine Learning Methods of P rotein Secondary Structure Pr ediction Presented by Chao Wang

Upload: letitia-griffith

Post on 18-Jan-2018

224 views

Category:

Documents


0 download

DESCRIPTION

What is secondary structure?

TRANSCRIPT

Page 1: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Machine Learning Methods of Protein Secondary Structure Prediction

Presented by Chao Wang

Page 2: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

• What is secondary structure?

• How to evaluate secondary structure prediction?

• How secondary structure prediction affects the accuracy of tertiary structure prediction?

• Our perspective: ``elite''

Page 3: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

What is secondary structure?

Page 4: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

• Hydrogen bond: a non-covalent bond

A hydrogen bond is identified if E in the following equation is less than -0.5 kcal/mol

Page 5: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

8-state annotation by DSSP

Page 6: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Prediction

• Early methods of secondary-structure prediction were restricted to predicting the three predominate states: helix, sheet, or random coil. These methods were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. Such methods were typically ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts.

Page 7: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

A significant increase in accuracy (to nearly ~80%) was made by exploiting multiple sequence alignment; knowing the full distribution of amino acids that occur at a position (and in its vicinity, typically ~7 residues on either side) throughout evolution provides a much better picture of the structural tendencies near that position. For illustration, a given protein might have a glycine at a given position, which by itself might suggest a random coil there. However, multiple sequence alignment might reveal that helix-favoring amino acids occur at that position (and nearby positions) in 95% of homologous proteins spanning nearly a billion years of evolution. Moreover, by examining the average hydrophobicity at that and nearby positions, the same alignment might also suggest a pattern of residue solvent accessibility consistent with an α-helix. Taken together, these factors would suggest that the glycine of the original protein adopts α-helical structure, rather than random coil. Several types of methods are used to combine all the available data to form a 3-state prediction, including neural networks, hidden Markov models and support vector machines. Modern prediction methods also provide a confidence score for their predictions at every position.

Page 8: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Outline

• CNF model by Jinbo• Multi-step learning model by Yaoqi• Iterative deep learning model by Yaoqi• Our perspective: Elite.

– A new enperiment to detect how elite affects secondary structure prediction.

Page 9: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

• Methods– How to model the probability– Feature Selection

• Results– vs. other methods– Improvement

Page 10: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Protein 8-class secondary structure prediction using conditional neural fields

Zhiyong Wang, Feng Zhao, Jian Peng, and Jinbo XuProteomics. 2011

Page 11: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang
Page 12: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Model

Page 13: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang
Page 14: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang
Page 15: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Training & Prediction

Page 16: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Features

Page 17: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Training/testing set

Page 18: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

Results

• Outperform SSpro8 on each state

Page 19: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

• Regularization factor effect: insensitive, optimal when the factor is set to 9.

Page 20: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

• Neff effective: for SS prediction, it may not be the best strategy to use evolutionary information in as many homologs as possible. Instead, we should use a subset of sequence homologs to build sequence profile when there are many sequence homologs available.

Page 21: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang

J Comput Chem. 2012

Page 22: Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang