project report -vaibhav
TRANSCRIPT
A Project Report On
SIMULATING A FEED FORWARD ARTIFICIAL NEURAL
NETWORK IN C++
Submitted in partial fulfilment of the requirements
For the award of degree
Of
INTEGRATED DUAL DEGREE
In
COMPUTER SCIENCE AND ENGINEERING
(With Specialization in Information Technology)
Submitted by
Vaibhav Dhattarwal
CSE-IDD
Enrolment No: 08211018
Under the guidance of
DR. DURGA TOSHINWAL
Professor
ELECTRONICS AND COMPUTER ENGINEERING DEPARTMENT
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
ROORKEE-247667
OCTOBER 2012
Abstract
This report presents an overview of how a feed forward artificial neural network was
implemented in C++. An Artificial neural network is a system composed of many simple
processing elements operating in parallel whose function is determined by network structure,
connection strengths, and the processing performed at computing elements or nodes. A neural
network is a massively parallel distributed processor that has a natural inclination for storing
experiential knowledge and making it available for use. This report also provides a brief
overview of artificial neural networks and questions their practical applicability. This is
followed by a detailed explanation of the design and implementation of a three-layer feed
forward neural network using back propagation algorithm.
Table of Contents
Page
Abstract i
Table of Contents ii
List of Figures iii
Chapter 1 Introduction 1
1.1 Objective of Project 2
Chapter 2 Artificial Neural Network 3
2.1 Neural Network Definition 3
2.2 Neural Network Applications 5
2.3 Neural Network Categorization 6
2.4 Types of Neural Network 8
Chapter 3 Design 10
3.1 Back Propagation Algorithm 10
3.2 Pseudo Code for One Layer 11
3.3 Pseudo Code for all the layers 13
Chapter 4 Implementation 15
4.1 Pseudo Code for training patterns 15
4.2 Pseudo Code for minimizing error 16
Chapter 5 Results 19
References 20
List of Figures
Figure Title Page
2.1 an Artificial Neural Network 3
2.2 the sigmoid curve 6
3.1 the design for calculating output activation 8
3.2 Output Screenshot 9
1 Introduction
An Artificial Neural Network (ANN), usually called neural network (NN), is a mathematical
model or computational model that is inspired by the structure and functional aspects of
biological neural networks. A neural network consists of an interconnected group of artificial
neurons, and it processes information using a connection based approach to computation. In
most cases an ANN is an adaptive system that changes its structure based on external or
internal information that flows through the network during the learning phase. Modern neural
networks are non-linear statistical data modelling tools. They are usually used to model
complex relationships between inputs and outputs or to find patterns in data.
Neural network is a set of connected input/output units and each connection has a weight
present with it. During the learning phase, network learns by adjusting weights so as to
predict the correct class labels of the input tuples. Neural networks have the remarkable
ability to derive meaning from complicated or imprecise data and can be used to extract
patterns and detect trends that are too complex to be noticed by either humans or other
computer techniques. These are well suited for continuous valued inputs and outputs. Neural
networks are best at identifying patterns or trends in data and well suited for prediction or
forecasting needs.
Neural networks are non-linear statistical data modelling tools. They can be used to model
complex relationships between inputs and outputs; or to find patterns in data and to infer rules
from them. Neural networks are useful in providing information on associations,
classifications, clusters, and forecasting. Using neural networks as a tool, data warehousing
firms can harvest information from datasets in the data mining process. Neural networks are
programmed to store, recognize, and associatively retrieve patterns or database entries; to
solve combinatorial optimization problems; to filter noise from measurement data; to control
ill-defined problems; in summary, to estimate sampled functions when we do not know the
form of the functions. The two abilities: pattern recognition and function estimation make
neural networks a very prevalent utility in data mining. With their model-free estimators and
their dual nature, neural networks serve data mining in a variety of ways.
Neural networks, depending on the architecture, provide associations, classifications, clusters,
prediction and forecasting to the data mining industry. Neural networks essentially comprise
three pieces: the architecture or model; the learning algorithm; and the activation functions.
Due to neural networks, we can mine valuable information from a mass of history information
so that it can be efficiently used in financial areas. Hence, the applications of neural networks
in financial forecasting have become very popular.
1.1 Objective of the Project
The introduction of Artificial Neural Networks and a description of the Neural Networks are
presented in this project report. The objective of this project is to implement a Feed Forward
Artificial Neural Network in C++ using the back propagation algorithm. The design of this
simulation has also been discussed followed by an explanation of the implementation of the
Network. The results of the output program will also be included in this project.
2 Artificial Neural Network
Figure 2.1 an Artificial Neural Network
First of all, when we are talking about a neural network, we should more properly say
"artificial neural network" (ANN), because that is what we mean most of the time in this
project. Biological neural networks are much more complicated than the mathematical
models we use for ANNs. But it is customary to be lazy and drop the "A" or the "artificial".
2.1 Neural Network Definition
There is no universally accepted definition of an NN. But perhaps most people in the field
would agree that an NN is a network of many simple processors ("units"), each possibly
having a small amount of local memory. The units are connected by communication channels
("connections") which usually carry numeric data, encoded by any of various means. The
units operate only on their local data and on the inputs they receive via the connections. The
restriction to local operations is often relaxed during training.
Some NNs are models of biological neural networks and some are not, but historically, much
of the inspiration for the field of NNs came from the desire to produce artificial systems
capable of sophisticated, perhaps "intelligent", computations similar to those that the human
brain routinely performs, and thereby possibly to enhance our understanding of the human
brain.
Most NNs have some sort of "training" rule whereby the weights of connections are adjusted
on the basis of data. In other words, NNs "learn" from examples, as children learn to
distinguish dogs from cats based on examples of dogs and cats. If trained carefully, NNs may
exhibit some capability for generalization beyond the training data, that is, to produce
approximately correct results for new cases that were not used for training.
NNs normally have great potential for parallelism, since the computations of the components
are largely independent of each other. Some people regard massive parallelism and high
connectivity to be defining characteristics of NNs, but such requirements rule out various
simple models, such as simple linear regression (a minimal feed forward net with only two
units plus bias), which are usefully regarded as special cases of NNs.
Some popular descriptive definitions of Neural Networks
A neural network is a system composed of many simple processing elements
operating in parallel whose function is determined by network structure, connection
strengths, and the processing performed at computing elements or nodes. A neural
network is a massively parallel distributed processor that has a natural propensity for
storing experiential knowledge and making it available for use. It resembles the brain
in two respects:
1. Knowledge is acquired by the network through a learning process.
2. Interneuron connection strengths known as synaptic weights are used to store the
knowledge.
A neural network is a circuit composed of a very large number of simple processing
elements that are neural based. Each element operates only on local information.
Furthermore each element operates asynchronously; thus there is no overall system
clock.
Artificial neural systems, or neural networks, are physical cellular systems which can
acquire, store, and utilize experiential knowledge.
2.2 Neural Network Applications
Practical applications of NNs most often employ supervised learning. For supervised
learning, you must provide training data that includes both the input and the desired result
(the target value). After successful training, you can present input data alone to the NN (that
is, input data without the desired result), and the NN will compute an output value that
approximates the desired result. However, for training to be successful, you may need lots of
training data and lots of computer time to do the training. In many applications, such as
image and text processing, you will have to do a lot of work to select appropriate input data
and to code the data as numeric values.
In practice, NNs are especially useful for classification and function approximation/mapping
problems which are tolerant of some imprecision, which have lots of training data available,
but to which hard and fast rules (such as those that might be used in an expert system) cannot
easily be applied. Almost any finite-dimensional vector function on a compact set can be
approximated to arbitrary precision by feed forward NNs (which are the type most often used
in practical applications) if you have enough data and enough computing resources.
In principle, NNs can compute any computable function, i.e., they can do everything a
normal digital computer can do, or perhaps even more, under some assumptions of doubtful
practicality.
Neural Networks are interesting for quite a lot of very different people:
Computer scientists want to find out about the properties of non-symbolic information
processing with neural nets and about learning systems in general.
Statisticians use neural nets as flexible, nonlinear regression and classification
models.
Engineers of many kinds exploit the capabilities of neural networks in many areas,
such as signal processing and automatic control.
Cognitive scientists view neural networks as a possible apparatus to describe models
of thinking and consciousness (High-level brain function).
Neurophysiologists use neural networks to describe and explore medium-level brain
function (e.g. memory, sensory system, and motorics).
Physicists use neural networks to model phenomena in statistical mechanics and for a
lot of other tasks.
Biologists use Neural Networks to interpret nucleotide sequences.
Philosophers and some other people may also be interested in Neural Networks for
various reasons.
2.3 Neural Network Categorization
There are many kinds of NNs by now. Nobody knows exactly how many. New ones (or at
least variations of old ones) are invented every week. Below is a collection of some of the
most well known methods:
The two main kinds of learning algorithms are supervised and unsupervised.
In supervised learning, the correct results (target values, desired outputs) are known
and are given to the NN during training so that the NN can adjust its weights to try
matching its outputs to the target values. After training, the NN is tested by giving it
only input values, not target values, and seeing how close it comes to outputting the
correct target values.
In unsupervised learning, the NN is not provided with the correct results during
training. Unsupervised NNs usually perform some kind of data compression, such as
dimensionality reduction or clustering.
The distinction between supervised and unsupervised methods is not always clear-cut. An
unsupervised method can learn a summary of a probability distribution, then that summarized
distribution can be used to make predictions. Furthermore, supervised methods come in two
sub varieties: auto-associative and hetero-associative. In auto-associative learning, the target
values are the same as the inputs, whereas in hetero-associative learning, the targets are
generally different from the inputs. Many unsupervised methods are equivalent to auto-
associative supervised methods.
Two major kinds of network topology are feed forward and feedback.
In a feed forward NN, the connections between units do not form cycles. Feed
forward NNs usually produce a response to an input quickly. Most Feed forward NNs
can be trained using a wide variety of efficient conventional numerical methods in
addition to algorithms invented by NN researchers.
In a feedback or recurrent NN, there are cycles in the connections. In some
feedback NNs, each time an input is presented, the NN must iterate for a potentially
long time before it produces a response. Feedback NNs are usually more difficult to
train than Feed forward NNs.
Some kinds of NNs can be implemented as either Feed forward or feedback networks.
NNs also differ in the kinds of data they accept. Two major kinds of data are categorical and
quantitative.
Categorical variables take only a finite (technically, countable) number of possible
values, and there are usually several or more cases falling into each category.
Categorical variables may have symbolic values (e.g., "male" and "female", or "red",
"green" and "blue") that must be encoded into numbers before being given to the
network. Both supervised learning with categorical target values and unsupervised
learning with categorical outputs are called "classification."
Quantitative variables are numerical measurements of some attribute, such as length
in meters. The measurements must be made in such a way that at least some
arithmetic relations among the measurements reflect analogous relations among the
attributes of the objects that are measured. Supervised learning with quantitative
target values is called "regression."
Some variables can be treated as either categorical or quantitative, such as number of children
or any binary variable. Most regression algorithms can also be used for supervised
classification by encoding categorical target values as 0/1 binary variables and using those
binary variables as target values for the regression algorithm. The outputs of the network are
posterior probabilities when any of the most common training methods are used.
2.4 Types of Neural Network
Here are some well-known kinds of Neural Networks:
A. Supervised
1. Feed forward
Linear
Hebbian
Perceptron
Adaline
Higher Order
Functional Link
MLP: Multilayer perceptron
Backprop
Cascade Correlation
Quickprop
RPROP
RBF networks
OLS: Orthogonal Least Squares
CMAC: Cerebellar Model Articulation Controller
Classification only
LVQ: Learning Vector Quantization
PNN: Probabilistic Neural Network
Regression only
GNN: General Regression Neural Network
2. Feedback
BAM: Bidirectional Associative Memory
Boltzman Machine
Recurrent time series
Back propagation through time
Elman
FIR: Finite Impulse Response
Jordan
Real-time recurrent network
Recurrent back propagation
TDNN: Time Delay NN
3. Competitive
ARTMAP
Fuzzy ARTMAP
Gaussian ARTMAP
Counter propagation
Neocognitron
B. Unsupervised
1. Competitive
Vector Quantization
Grossberg
Kohonen
Conscience
Self-Organizing Map
Kohonen
GTM:
Local Linear
Adaptive resonance theory
ART 1
ART 2
ART 2-A
ART 3
Fuzzy ART
DCL: Differential Competitive Learning
2. Dimension Reduction
Hebbian
Oja
Sanger
Differential Hebbian
3. Auto association
Linear autoassociator
BSB: Brain State in a Box
Hopfield
3 Design
The simplified process for training a Feed Forward Neural Network is as follows:
1. Input data is presented to the network and propagated through the network until it
reaches the output layer. This forward process produces a predicted output.
2. The predicted output is subtracted from the actual output and an error value for the
networks is calculated.
3. The neural network then uses supervised learning, which in most cases is back
propagation, to train the network. Back propagation is a learning algorithm for
adjusting the weights. It starts with the weights between the output layer PE’s and
the last hidden layer PE’s and works backwards through the network.
4. Once back propagation has finished, the forward process starts again, and this cycle
is continued until the error between predicted and actual outputs is minimized.
3.1. The Back Propagation Algorithm:
Back propagation, or propagation of error, is a common method of teaching artificial neural
networks how to perform a given task. Back propagation is the method of training artificial
neural networks so as to minimize the objective function. The back propagation algorithm
performs learning on a feed-forward neural network. The back propagation algorithm is used
in layered feed forward ANNs. This means that the artificial neurons are organized in layers,
and send their signals “forward”, and then the errors are propagated backwards. The back
propagation algorithm uses supervised learning, which means that we provide the algorithm
with examples of the inputs and outputs we want the network to compute, and then the error
(difference between actual and expected results) is calculated. The idea of the back
propagation algorithm is to reduce this error, until the ANN learns the training data.
Algorithm for a 3-layer network:
1. Initialize the weights in the network
2. Do
a. For each example E in the training set
oNeural-net-output (network, E); forward pass
oT = teacher output for E
oCalculate error (T - O) at the output units
oCompute ΔWho for all weights from hidden layer to output layer;
oBackward pass
oCompute ΔWih for all weights from input layer to hidden layer;
oBackward pass continued
oUpdate the weights in the network
3. Until all examples classified correctly or stopping criterion satisfied
4. Return the network
The Back Propagation learning algorithm can be divided into two phases:
Phase 1: Propagation
This phase involves the following steps:
1. Forward propagation of a training pattern's input through the neural network.
2. Backward propagation of the propagation's output activations through the neural
network using the training pattern's target.
Phase 2: Weight update
For each weight-synapse the following steps are used:
1. Multiply its output delta and input activation to get the gradient of the weight.
2. Bring the weight in the opposite direction of the gradient by subtracting a ratio of it
from the weight.
Repeat phase 1 and 2 until the performance of the network is satisfactory.
3.2 Pseudo Code for one Layer
A single neuron (i.e. processing element) takes in total input PEinput and produces output
activation PEout. In this project, we are taking the activation function as Sigmoid Function.
Hence we can consider the out PEout=Sigmoid(PEinput). Sigmoid function refers to the special
case of the logistic function shown below and defined by the formula
Figure 3.1 the sigmoid curve
Though other activation functions are often used (e.g. linear or hyperbolic tangent). This has
the effect of squashing the infinite range of PEinput into the range 0 to 1. It also has the
convenient property that its derivative takes the particularly simple form
dSdt
=S∗(1−S)
Typically, the input PEinput into a given neuron will be the weighted sum of output activations
feeding in from a number of other neurons. It is convenient to think of the activations flowing
through layers of neurons. So, if there are NumUnitLayer1 neurons in layer 1, the total
activation flowing into our layer 2 neuron is the sum over the product OutputLayer1[i]*Wt[i],
where Wt[i] is the strength/weight of the connection between PE[i] in layer 1 and our PE in
layer 2. Each neuron will also have a bias, or resting state, that is added to the sum of inputs,
and it is convenient to call this Wt[0]. We can then write
InputLayer2 = Wt[0] // consider the resting state bias weight //
for( i = 1 | i < = NumUnitLayer1 | i++ ) // setting loop condition //
{
Add to InputLayer2 the sum over the product OutputLayer1[i] * Wt[i]
}
Compute the sigmoid Out put Layer 2= 11+e− Input Layer 2 to get activation output
Similarly layer 2 will have many processing elements as well, so it is appropriate to write the
weights between PE[i] in layer 1 and PE[j] in layer 2 as a two dimensional array Wt[i][j].
Thus to get the output of PE[j] in layer 2 we have
InputLayer2[j] = Wt[0][j]
For ( i = 1 | i < = NumUnitLayer1 | i++ )
{
Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j]
}
Compute the sigmoid Out put Layer 2[ j ]= 11+e−Input Layer 2 [ j ] to get activation output
Now we know that Layer 2 has number of processing units given by NumUnitLayer2 and the
above code calculates the output for only one processing element PE[j]. However we require
the output for all the processing elements in Layer 2. Hence we introduce another loop to get
all the layer 2 outputs
For ( j = 1 | j < = NumUnitLayer2 | j++ )
{
InputLayer2[j] = Wt[0][j]
For ( i = 1 | i < = NumUnitLayer1 | i++ )
{
Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j]
}
Compute sigmoid Out put Layer 2[ j ]= 11+e−Input Layer 2 [ j ] for output
}
3.3 Pseudo Code for all Layers
Now that we have calculated the output for all the processing elements in one layer, we can
look at writing the code which calculates the output for all the layers in our network. Three
layer networks are necessary and sufficient for most purposes, so our layer 2 outputs feed into
a third layer in the same way as the above cases. The feed forward neural network chosen for
this project has three layers 1, 2, 3 and here is the calculation of output for all three layers of
the network
For ( j = 1 | j < = NumUnitLayer2 | j++ ) // computes Layer 2 outputs //
{
InputLayer2[j] = WtLayer1/Layer2[0][j]
For ( i = 1 | i < = NumUnitLayer1 | i++ )
{
Add to InputLayer2[j] the sum over OutputLayer1[i] * WtLayer1/Layer2 [i][j]
}
Compute sigmoid Out put Layer 2[ j ]= 11+e−Input Layer 2 [ j ] for output
}
For ( k = 1 | k < = NumUnitLayer3 | k++ ) // computes Layer 3 outputs //
{
InputLayer3[k] = WtLayer2/Layer3[0][k]
For ( j = 1 | j < = NumUnitLayer2 | j++ )
{
Add to InputLayer3[k] the sum over OutputLayer2[j] * WtLayer2/Layer3 [j][k]
}
Compute sigmoid Out put Layer 3 [k ]= 11+e−Input Layer3 [ k ] for output
}
To avoid confusion in the pseudo code there is a different index for each layer: i, j, k
for Layers 1, 2, 3 respectively. Weights for connections are also different for
distinguishing between the different layers, WtLayer1/Layer2 and WtLayer2/Layer3. For obvious
reasons, for three layer networks, it is traditional to call layer 1 the Input layer, layer 2
the Hidden layer, and layer 3 the Output layer. The neural network in this project has
a design similar to the figure shown below.
Figure 3.2 the design for calculating output activation
Now we can denote the layers 1, 2, 3 as input layer, hidden layer, and output layer
respectively. The weights for the connections have also been denoted appropriately. As
shown in the above figure, the initial bias weights are also included in the input for each layer
and consequently the output also.
For ( j = 1 | j < = NumUnitHidden | j++ ) // computes Hidden Layer PE outputs //
{
InputHidden[j] = WtInput/Hidden[0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to InputHidden[j] the sum over OutputInput[i] * WtInput/Hidden [i][j]
}
Compute sigmoid Out put Hidden [ j ]= 11+e−Input Hidden [ j] for output
}
For ( k = 1 | k < = NumUnitOuput | k++ ) // computes Output Layer PE outputs //
{
InputOutput[k] = WtHidden/Output[0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to InputOutput [k] sum over OutputHidden[j] * WtHidden/Output [j][k]
}
Compute sigmoid Out put [k ]= 11+e− Input Output [k ] for output
}
4 Implementation
4.1 Pseudo Code for training patterns
In this project, there will be a whole set of training patterns(NumExamples), i.e. pairs of input and
target output vectors,
Input[E][i] , Target[E][k]
labelled by the index E. The network learns by minimizing some measure of the error of the
network's actual outputs compared with the target outputs. The sum squared error for all the
output units, denoted by k and all training patterns, denoted by E will be given by
Error = 0.0 ;
For ( E= 1 | E < = NumUnitHidden | E++ )
{
For ( k = 1 | k < = NumUnitOuput | k++ )
{
Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E]
[k]) * (Target[E][k] - Output[E][k]) ;
}
}
The factor of 0.5 is conventionally included to simplify the algebra in deriving the learning
algorithm. If we insert the above code for computing the network outputs into the E loop of
this, we end up with
Error = 0.0 ;
For ( E= 1 | E < = NumUnitHidden | E++ )
{ // computes for all training patterns(E) //
For ( j = 1 | j < = NumUnitHidden | j++ )
{
InputHidden[E][j] = WtInput/Hidden[0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to InputHidden[E] [j] the sum over OutputInput[E] [i] *
WtInput/Hidden [i][j]
}
Compute sigmoid Out put Hidden [E ][ j ]= 11+e− Input Hidden [E ][ j] for
output
}
For ( k = 1 | k < = NumUnitOuput | k++ )
{
InputOutput[E] [k] = WtHidden/Output[0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to InputOutput [E] [k] sum over OutputHidden[E] [j] *
WtHidden/Output [j][k]
}
Compute sigmoid Out put [ E][k ]= 11+e−Input Output [E] [k ] for output
Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E]
[k]) * (Target[E][k] - Output[E][k])
}
}
4.2 Pseudo Code for minimizing error
The next stage of the project involves iteratively adjusting the weights to minimize the network's error. The method adopted in this project is by 'gradient descent' on the error function. We can compute how much the error is changed by a small change in each weight (i.e. compute the partial derivatives dError/dWt) and shift the weights by a small amount in the direction that reduces the error. As stated before, we use the back-propagation algorithm. After the calculation of the above sum squared error, we can compute and apply one iteration (or 'epoch') of the required weight changes ΔWho and ΔWih using
Error = 0.0 ;
For ( E= 1 | E < = NumUnitHidden | E++ )
{ // computes for all training patterns(E) //
For ( j = 1 | j < = NumUnitHidden | j++ )
{
InputHidden[E][j] = WtInput/Hidden[0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to InputHidden[E] [j] the sum over OutputInput[E] [i] *
WtInput/Hidden [i][j]
}
Compute sigmoid Out put Hidden [E ][ j ]= 11+e− Input Hidden [E ][ j] for
output
}
For ( k = 1 | k < = NumUnitOuput | k++ )
{
InputOutput[E] [k] = WtHidden/Output[0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to InputOutput [E] [k] sum over OutputHidden[E] [j] *
WtHidden/Output [j][k]
}
Compute sigmoid Out put [ E][k ]= 11+e−Input Output [E] [k ] for output
Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E]
[k]) * (Target[E][k] - Output[E][k]) ;
ΔOutput[k] = (Target[E][k] - Output[E][k]) * Output[E][k] * (1 -
Output[E][k]) // derivative of the function //
}
For ( j = 1 | j < = NumUnitHidden | j++ )
{ // Back Propagation of error to hidden layer //
Sum of ΔOutput [j] = 0.0
For ( k = 1 | k < = NumUnitOuput | k++ )
{
Add to Sum of ΔOutput [j] the sum over the product
WtHidden/Output [j][k] * ΔOutput [k] ;
}
ΔH[j] = Sum of ΔOutput [j] * OutputHidden [E][j] * (1.0 - OutputHidden [E]
[j]) // derivative of the function //
}
For ( j = 1 | j < = NumUnitHidden | j++ )
{ // This loop updates the weight input to hidden //
Add to ΔWih [0][j] the sum of: product β * ΔH [j] to the product: α *
ΔWih [0][j]
Add to WtInput/Hidden [0][j] the change ΔWih [0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to ΔWih [i][j] the sum of product β * InputHidden [p][i] * ΔH
[j] to the product: α * ΔWih [i][j]
Add to WtInput/Hidden [i][j] the change ΔWih [i][j]
}
}
For ( k = 1 | k < = NumUnitOuput | k++ )
{ // This loop updates the weight hidden to output //
Add to ΔWho [0][k] the sum of: product β * ΔOutput[k] to the product:
α * ΔWho [0][k]
Add to WtHidden/Output [0][k] the change ΔWho [0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to ΔWho [j][k] the sum of product β * OutputHidden [p][j] *
ΔOutput [k] to the product: α *ΔWho [j][k]
Add to WtHidden/Output [j][k] the change ΔWho [j][k]
}
}
}
The weight changes ΔWih and ΔWho are each made up of two components. First,
the beta component that is the gradient descent contribution. Second, the alpha component is
a 'momentum' term which effectively keeps a moving average of the gradient descent weight
change contributions, and thus smoothes out the overall weight changes.
The complete training process will consist of repeating the above weight updates for a
number of epochs until some error criterion is met.
5 Results
Figure 5.1 Output Screenshot
The program based on the design discussed in the previous section was executed
successfully. The pseudo code was successfully implemented and the three layered feed
forward neural network was simulated on the basis of the back propagation algorithm.
6 References
[1] Pinkus, A. (1999), "Approximation theory of the MLP model in neural networks," Acta Numerica, 8, 143-196.
[2] Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan.
[3] Nigrin, A. (1993), Neural Networks for Pattern Recognition, Cambridge, MA: The MIT Press.
[4] Zurada, J.M. (1992), Introduction To Artificial Neural Systems, Boston: PWS Publishing Company.
[5] Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press.
[6] Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization and Signal Processing. NY: John Wiley & Sons, ISBN 0-471-93010-5.
[7] Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural Networks: Theory and Applications, NY: Wiley.
[8] Fausett, L. (1994), Fundamentals of Neural Networks, Englewood Cliffs, NJ: Prentice Hall.
[9] Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs, N.J.: Prentice-Hall.
[10] Masters, T. (1993). Practical Neural Network Recipes in C++, San Diego: Academic Press.
[11] Masters, T. (1995) Advanced Algorithms for Neural Networks: A C++ Sourcebook, NY: John Wiley and Sons, ISBN 0-471-10588-0
[12] Oja, E. (1989), "Neural networks, principal components, and subspaces," International Journal of Neural Systems, 1, 61-68.
[13] Pao, Y. H. (1989), Adaptive Pattern Recognition and Neural Networks, Reading, MA: Addison-Wesley Publishing Company, ISBN 0-201-12584-6.
[14] Reed, R.D., and Marks, R.J, II (1999), Neural Smithing: Supervised Learning in Feed forward Artificial Neural Networks, Cambridge, MA: The MIT Press, ISBN 0-262-18190-8.
[15] Sanger, T.D. (1989), "Optimal unsupervised learning in a single-layer linear Feed forward neural network," Neural Networks, 2, 459-473.