![Page 1: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/1.jpg)
Neural Networks for Protein Structure PredictionBrown, JMB 1999
CS 466
Saurabh Sinha
![Page 2: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/2.jpg)
Outline
• Goal is to predict “secondary structure” of a protein from its sequence
• Artificial Neural Network used for this task
• Evaluation of prediction accuracy
![Page 3: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/3.jpg)
What is Protein Structure?
![Page 4: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/4.jpg)
http://academ
ic.brooklyn.cuny.edu/biology/bio4fv/page/3d_prot.htm
![Page 5: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/5.jpg)
http://ma
tcmadison.edu/bio
tech/resources/proteins/labManua
l/image
s/220_04_11
4.png
![Page 6: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/6.jpg)
Protein Structure
• An amino acid sequence “folds” into a complex 3-D structure
• Finding out this 3-D structure is a crucial and challenging task
• Experimental methods (e.g., X-ray crystallography) are very tedious
• Computational predictions are a possibility, but very difficult
![Page 7: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/7.jpg)
What is “secondary structure”?
![Page 8: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/8.jpg)
http://www.wiley.com/college/pratt/0471393878/student/structure/secondary_structure/secondary_structure.gif
“Strand” “Helix”
![Page 9: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/9.jpg)
http://www.npaci.edu/features/00/Mar/protein.jpg
“Strand”
“Helix”
![Page 10: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/10.jpg)
Secondary structure prediction
• Well, the whole 3-D “tertiary” protein structure may be hard to predict from sequence
• But can we at least predict the secondary structural elements such as “strand”, “helix” or “coil”?
• This is what this paper does• .. and so do many other papers (it is a hard
problem !)
![Page 11: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/11.jpg)
A survey of structure prediction
• The most reliable technique is “comparative modeling”– Find a protein P whose amino acid sequence is
very similar to your “target” protein T– Hope that this other protein P does have a known
structure– Predict a similar structure similar to that of P, after
carefully considering how the sequences of P and T differ
![Page 12: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/12.jpg)
A survey of structure prediction
• Comparative modeling fails if we don’t have a suitable homologous “template” protein P for our protein T
• “Ab initio” tertiary methods attempt to predict the structure without using a protein structure– Incorporate basic physical and chemical principles into the
structure calculation– Gets very hairy, and highly computationally intensive
• The other option is prediction of secondary structure only (i.e., making the goal more modest)– These may be used to provide constraints for tertiary
structure prediction
![Page 13: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/13.jpg)
Secondary structure prediction
• Early methods were based on stereochemical principles
• Later methods realized that we can do better if we use not only the one sequence T (our sequence), but also a family of “related sequences”
• Search for sequences similar to T, build a multiple alignment of these, and predict secondary structure from the multiple alignment of sequence
![Page 14: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/14.jpg)
What’s multiple alignment doing here ?
• Most conserved regions of a protein sequence are either functionally important or buried in the protein “core”
• More variable regions are usually on surface of the protein, – there are few constraints on what type of amino
acids have to be here (apart from bias towards hydrophilic residues)
• Multiple alignment tells us which portions are conserved and which are not
![Page 15: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/15.jpg)
http://bio.nagaokaut.ac.jp/~mbp-lab/img/hpc.png
hydrophobic core
![Page 16: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/16.jpg)
What’s multiple alignment doing here ?
• Therefore, by looking at multiple alignment, we could predict which residues are in the core of the protein and which are on the surface (“solvent accessibility”)
• Secondary structure then predicted by comparing the accessibility patterns associated with helices, strands etc.
• This approach (Benner & Gerloff) mostly manual
• Today’s paper suggest an automated method
![Page 17: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/17.jpg)
The PSI-PRED algorithm
• Given an amino-acid sequence, predict secondary structure elements in the protein
• Three stages:1. Generation of a sequence profile (the
“multiple alignment” step)2. Prediction of an initial secondary structure
(the neural network step)3. Filtering of the predicted structure (another
neural network step)
![Page 18: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/18.jpg)
Generation of sequence profile
• A BLAST-like program called “PSI-BLAST” used for this step
• We saw BLAST earlier -- it is a fast way to find high scoring local alignments
• PSI-BLAST is an iterative approach– an initial scan of a protein database using the target
sequence T– align all matching sequences to construct a “sequence
profile”– scan the database using this new profile
• Can also pick out and align distantly related protein sequences for our target sequence T
![Page 19: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/19.jpg)
The sequence profile looks like this
• Has 20 x M numbers• The numbers are log likelihood of each residue at each position
![Page 20: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/20.jpg)
Preparing for the second step
• Feed the sequence profile to an artificial neural network
• But before feeding, do a simply “scaling” to bring the numbers to 0-1 scale
€
x →1
1+ e−x
![Page 21: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/21.jpg)
Intro to Neural nets (the second and third steps of
PSIPRED)
![Page 22: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/22.jpg)
Artificial Neural Network
• Supervised learning algorithm• Training examples. Each example has a
label – “class” of the example, e.g., “positive” or
“negative”– “helix”, “strand”, or “coil”
• Learns how to predict the class of an example
![Page 23: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/23.jpg)
Artificial Neural Network
• Directed graph
• Nodes or “units” or “neurons”
• Edges between units
• Each edge has a weight (not known a priori)
![Page 24: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/24.jpg)
Layered Architecture
Input here is a four-dimensional vector. Each dimension goesinto one input unit
http://www.akri.org/cognition/images/annet2.gif
![Page 25: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/25.jpg)
Layered Architecturehttp://www.geocomputation.org/2000/GC016/GC016_01.GIF
(units)
![Page 26: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/26.jpg)
What a unit (neuron) does
• Unit i receives a total input xi from the units connected to it, and produces an output yi = fi(xi) where fi() is the “transfer function” of unit i
€
x i = wij y j + wij∈N−{i}
∑
y i = f i(x i) = f i wij y j + wij∈N−{i}
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
wi is called the “bias” of the unit
![Page 27: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/27.jpg)
Weights, bias and transfer function
Unit takes n inputsEach input edge has weight wi
Bias bOutput a
Transfer function f()Linear, Sigmoidal, or other
![Page 28: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/28.jpg)
Weights, bias and transfer function
• Weights wij and bias wi of each unit are “parameters” of the ANN.– Parameter values are learned from input data
• Transfer function is usually the same for every unit in the same layer
• Graphical architecture (connectivity) is decided by you. – Could use fully connected architecture: all units in
one layer connect to all units in “next” layer
![Page 29: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/29.jpg)
Where’s the algorithm?
• It’s in the training of parameters !• Given several examples and their labels: the
training data• Search for parameter values such that output
units make correct predictions on the training examples
• “Back-propagation” algorithm – Read up more on neural nets if you are interested
![Page 30: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/30.jpg)
Back to PSIPRED …
![Page 31: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/31.jpg)
Step 2• Feed the sequence profile to the input layer of an
ANN• Not the whole profile, only a window of 15
consecutive positions• For each position, there are 20 numbers in the profile
(one for each amino acid)• Therefore ~ 15 x 20 = 300 numbers fed• Therefore, ~ 300 “input units” in ANN• 3 output units, for “strand”, “helix”, “coil”
– each number is confidence in that secondary structure for the central position in the window of 15
![Page 32: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/32.jpg)
15
Input layer Hidden layer
helix
strand
coil
e.g.,
0.18
0.09
0.67
![Page 33: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/33.jpg)
Step 3
• Feed the output of 1st ANN to the 2nd ANN• Each window of 15 positions gave 3
numbers from the 1st ANN• Take 15 successive windows’ outputs and
feed them to 2nd ANN• Therefore, ~ 15 x 3 = 45 input units in ANN• 3 output units, for “strand”, “helix”, “coil”
![Page 34: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/34.jpg)
Test of performance
![Page 35: Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649ec05503460f94bcaf43/html5/thumbnails/35.jpg)
Cross-validation• Partition the training data into “training set” (two
thirds of the examples) and “test set” (remaining one third)
• Train PSIPRED on training set, test predictions and compare with known answers on test set.
• What is an answer? – For each position of sequence, a prediction of what
secondary structure that position is involved in– That is, a sequence over “H/S/C” (helix/strand/coil)
• How to compare answer with known answer?– Number of positions that match