artificial neural networks 2 morten nielsen depertment of systems biology , dtu
DESCRIPTION
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology , DTU. Outline. Optimization procedures Gradient decent (this you already know) Network training back propagation cross-validation Over-fitting examples. Neural network. Error estimate. Linear function. I 1. - PowerPoint PPT PresentationTRANSCRIPT
Artificial Neural Networks 2
Morten NielsenDepertment of Systems Biology,
DTU
Outline
• Optimization procedures – Gradient decent (this you already know)
• Network training– back propagation– cross-validation– Over-fitting– examples
Neural network. Error estimate
I1 I2
w1 w2
Linear function
o
Neural networks
Gradient decent (from wekipedia)
Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if
for > 0 a small enough number, then F(b)<F(a)
Gradient decent (example)
Gradient decent (example)
Gradient decent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
What about the hidden layer?
Hidden to output layer
Hidden to output layer
Hidden to output layer
Input to hidden layer
Input to hidden layer
Input to hidden layer
Summary
Or
Or
Ii=X[0][k]
Hj=X[1][j]
Oi=X[2][i]
Can you do it your self?
v22=1v12=1
v11=1v21=-1
w1=-1 w2=1
h2H2
h1H1
oO
I1=1 I2=1
What is the output (O) from the network?What are the wij and vjk values if the target value is 0 and =0.5?
Can you do it your self (=0.5). Has the error decreased?
v22=1v12=1
v11=1v21=-1
w1=-1 w2=1
h2=H2=
h1=H1=
o=O=
I1=1 I2=1
v22=.
v12=V11=
v21=
w1= w2=
h2=H2=
h1=H1=
o=O=
I1=1 I2=1Before After
Sequence encoding
• Change in weight is linearly dependent on input value
• “True” sparse encoding is therefore highly inefficient
• Sparse is most often encoded as– +1/-1 or 0.9/0.05
Training and error reduction
Training and error reduction
Training and error reduction
Size matters
• A Network contains a very large set of parameters
– A network with 5 hidden neurons predicting binding for 9meric peptides has more than 9x20x5=900 weights
• Over fitting is a problem• Stop training when test performance is optimal
Neural network training
yearsTe
mpe
ratu
re
What is going on?
years
Tem
pera
ture
Examples
Train on 500 A0201 and 60 A0101 binding dataEvaluate on 1266 A0201 peptides
NH=1: PCC = 0.77NH=5: PCC = 0.72
Neural network training. Cross validation
Cross validation
Train on 4/5 of dataTest on 1/5=>Produce 5 different neural networks each with a different prediction focus
Neural network training curve
Maximum test set performanceMost cable of generalizing
5 fold training
Which network to choose?
5 fold training
How many folds?• Cross validation is always good!, but
how many folds?– Few folds -> small training data sets– Many folds -> small test data sets
• Example from Tuesdays exercise– 560 peptides for training
• 50 fold (10 peptides per test set, few data to stop training)
• 2 fold (280 peptides per test set, few data to train)
• 5 fold (110 peptide per test set, 450 per training set)
Problems with 5fold cross validation• Use test set to stop training, and test
set performance to evaluate training– Over-fitting?
• If test set is small yes• If test set is large no• Confirm using “true” 5 fold cross
validation– 1/5 for evaluation– 4/5 for 4 fold cross-validation
Conventional 5 fold cross validation
“True” 5 fold cross validation
When to be careful
• When data is scarce, the performance obtained used “conventional” versus “true” cross validation can be very large
• When data is abundant the difference is small, and “true” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “true” cross validation approach
Do hidden neurons matter?
• The environment matters
NetMHCpan
Context matters• FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201• FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101• DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201• DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101
Example
Summary• Gradient decent is used to determine the
updates for the synapses in the neural network
• Some relatively simple math defines the gradients– Networks without hidden layers can be solved on
the back of an envelope (SMM exercise)– Hidden layers are a bit more complex, but still ok
• Always train networks using a test set to stop training– Be careful when reporting predictive performance
• Use “true” cross-validation for small data sets• And hidden neurons do matter (sometimes)
And some more stuff for the long cold winter nights• Can it might be made differently?
Predicting accuracy
• Can it be made differently?
Reliability
• Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system
Master these by Frederik Otzen Bagger
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights
Making sense of ANN weights