a new approach for optimizing backpropagation training with variable gain using pso

A NEW APPROACH FOR OPTIMIZING BACKPROPAGATION TRAINING

WITH VARIABLE GAIN USING PSO Magy. M. Kandil*, Fayza. A. Mohamed*, Fathy Saleh** , Magda Fayek**

*Atomic Energy Authority of Egypt, PO code 11787, Cairo ** Cairo University, Faculty of Engineering

Abstract The issue of variable weight, learning rate, step size and bias in the Backpropagation training algorithm has been widely investigated. Several techniques employing heuristic factors have been suggested to improve training time and reduce convergence to local minima. In this contribution, backpropagation training is based on variable gain of the log-sigmoid function is optimized by using Stretched Particle Swarm Optimization technique. It is computationally efficient and bosses interesting convergence properties utilizing estimates of the gain to reduce the computational cost time. The algorithm has been implemented and trained as real time pattern recognition and the results have been very satisfactory. Keywords: Feedforward neural network, Particle Swarm Optimization algorithm, Stretching function, Levenberg-Marquaredt, real time pattern recognition. 1. Introduction The efficient supervised training of feedforward neural networks (FNNs) is a subject of considerable ongoing research and numerous algorithms have been proposed to this end. The backpropagation (BP) algorithm [1] is one of the most common supervised training methods. Although BP training has proved to be efficient in many applications, its convergence tends to be slow, and yields to suboptimal solutions [2]. Attempts to speed up training and reduce convergence to local minima have been made in the context of gradient descent by [3,4.5,6,7,8,9]. However these methods are based on variable weight, learning rate, step size and bias to dynamically adapt BP algorithm, and use a constant gain for any sigmoid function during its training cycle. In this contribution, BP training with variable gain of the log-sigmoid function is presented. The new approach is based on a modification of the Backpropagation Training Cycle, that allows variable the gain of the log-sigmoid function, as a consequence of minimization of the objective function (function of

the mean square error) and observation of the trajectory in the weight space. Its convergence is guaranteed, utilizing estimates of the gain by using the Stretched Particle Swarm Optimization technique (SPSO). The new approach algorithm is trained and compared with the Popular training (Levenberg-Marquaredt Backpropagation) method, on application examples. 2. The Backpropagation Training When a Backpropagation network is cycled, the activations of the input units are propagated forward to the output layer through the connecting weights. The net input to (a) unit is determined by the weighted sum of its inputs. A scalar input (p) is transmitted through a connection that multiplies its strength by the scalar weight (w), to form the product wp, again a scalar. The net input can be calculated from equation: n = Wp + b (1) The neuron output is calculated as: a = f (Wp + b) (2) The actual output a, influenced by the bias, depends on the transfer functions. Typically, after the transfer function is applied with constant gain the parameters w and b will be adjusted, so that the neuron input/output relationship meets the specific goal. The log-sigmoid function is common in BP networks because it permits logsig adaptation using classical optimization techniques. The log-sigmoid function takes the net input and constrains the neuron output, a to some desired range then feeding this output forward to the next layer. The purpose for the function is to smooth the output so that differentiation can be performed to facilitate the learning process. The log-sigmoid function calculates the output vector a of a layer of hyperbolic neurons given a network input vector p, the layer’s weight matrix W and bias vector b, and denote by a=logsig λ (Wp + b) (3) This transfer function takes the net input of any value between plus and minus infinity, and squashes

GVIP 05 Conference, 19-21 December 2005, CICC, Cairo, Egypt

abc

Typewritten Text

466

https://www.researchgate.net/publication/222438485_Hornik_K_Neural_networks_and_principal_component_analysis_Learning_from_examples_without_local_minima_Neural_Networks_21_53-58?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==

the output into the range 0 to 1, according to the expression: λ(Wp + b) F(a) = 1 / (1 + e ) (4) Where: λ is the gain value The output from log-sigmoid function is affected by a linear transfer function known as purelin. This function is used because the weights often go outside the range of the previous function [10]. When the final feedforward process is finished, the net compares the neuron outputs to the associative target vectors. The comparative analysis is finished once the net backward-feeds the adjusted weights and biases (backpropagated) to the first layer. The linear transfer function purelin has output equal to its input, according to the expression: a = n (5) In most researches the gain value for the log-sigmoid is considered constant during the backpropagation training cycle. In our new approach, the gain value (λ) has different values, corresponding to the value of the net input and weights which are the variable parameters for the learning objective error function. This objective error function is optimized by using SPSO, to produce global minimum between target and the actual output. These gain values are variable during each backpropagation training cycle to increase the speed of the training, to reach the goal in minimum time and to overcome the local minima problem which is the major problem in the most neural network. 2.1 Adapting gain of the log-sigmoid function In our new approach, the gain value of the log-sigmoid function is adapted (optimized). The optimization process is made by using SPSO. 2.1.1 The Particle Swarm Optimization PSO’s precursor was a simulator of social behavior, that was used to visualize the movement of a birds’ flock. Several versions of the simulation model were developed, incorporating concepts such as nearest-neighbor velocity matching and acceleration by distance [11,12]. Two variants of the PSO algorithm were developed. One with a global neighborhood, and one with a local neighborhood [13]. In our new approach, the global variant was considered because it exhibited faster convergence compared to the local one. Suppose that the search space is D-dimensional, then the i-th particle of the swarm can be represented by a D-dimensional vector, Xi = (xi1, xi2, . . . , xiD). The velocity (position change) of this particle, can be represented by another D-dimensional vector Vi = (vi1, vi2, . . . , viD). The best previously visited position of the i-th particle is denoted as Pi = (pi1, pi2, . . . , piD). Defining g as the index of the best particle in the swarm (i.e., the g-th particle is the best), and let the superscripts denote the iteration number, then the swarm is manipulated according to the following two equations[12]:

),()( 211 n

idngd

nnid

nid

nnid

nid xpcrxpcr −+−+=+ υυ (6)

,11 ++ += n

idnid

nid xx υ (7)

where d = 1, 2, . . .,D; i = 1, 2, . . . , N, and N is the size of the swarm; c is a positive constant, called acceleration constant; r1, r2 are random numbers, uniformly distributed in [0,1]; and n = 1, 2, . . ., determines the iteration number. Equations (6), (7) define the initial version of the PSO algorithm. Since there was no actual mechanism for controlling the velocity of a particle, it was necessary to impose a maximum value Vmax on it. If the velocity exceeded this threshold, it was set equal to Vmax. This parameter proved to be crucial, because large values could result in particles moving past good solutions, while small values could result in insufficient exploration of the search space. This lack of a control mechanism for the velocity resulted in low efficiency for PSO. Various attempts have been made to improve the performance of the baseline PSO with varying success. Eberhart and Shi focus on optimizing the update equations for the particles [13]. Angeline used a selection mechanism in an attempt to improve the general quality of the particles in a swarm. Kennedy uses cluster analysis to modify the update equation, so that particles attempt to conform to the center of their clusters rather than attempting to conform to a global best. The aforementioned problem was addressed by incorporating a weight parameter for the previous velocity of the particle. Thus, in the latest versions of the PSO, Equations (6), (7) are changed to the following ones [14]:

)),()(( 22111 n

idngd

nnid

nid

nnid

nid xprcxprcx −+−+=+ ωυυ (8)

,11 ++ += n

idnid

nid xx υ

(9)

where w is called inertia weight; c1, c2 are two positive constants, called cognitive and social parameter respectively; and χ is a constriction factor, which is used, alternatively to w to limit velocity. The role of these parameters is discussed in [14]. 2.1.2 Equipping PSO with Function “Stretching” PSO suffers from the local minima problem. There are many methods to over come the local minima problem in PSO such as genetic algorithm, Hill clamping, deflation function and stretching function. In this paper, stretching function is used to modify the PSO algorithm because it causes faster convergence compared to the other method. “Stretching” is a new technique that provides a way to escape from the local minima when PSO’s


abc

Typewritten Text

467

https://www.researchgate.net/publication/220049055_Lecture_Notes_in_Computer_Science?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==

https://www.researchgate.net/publication/273453883_Particle_Swarm_Optimization?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==


convergence stalls. It consists of a two-stage transformation of the original objective function f(x). This can be applied immediately after a local minimum x of the function f(x) has been detected. This transformation has been proposed by [15], and it is defined as follows:

),1))()(((2

)()( 1 +−−+= xfxfsignxxxfxGγ (10)

)))()((tanh(2)1))()(((

)()( 2

xGxGxfxfsign

xGxH−

+−+=

µγ (11)

where γ1, γ2, and µ are arbitrary chosen positive constants, and sign(x) defines the well known triple valued sign function

The first transformation stage, defined in Equation (10), elevates the function f(x) and eliminates all the local minima that are located above f(x). The second stage, equation (11), stretches the neighborhood of x upwards, since it assigns higher function values to those points. Both stages do not alter the local minima located below x; thus, the location of the global minimum is left unaltered. 2.1.3. Implementation the Stretched Particle Swarm Optimization (SPSO) In our new approach, the gain parameter, λ, is used for optimizing the squared error function by using SPSO. The fitness value corresponds to a forward propagation through the network, which is the objective learning function (square error function), and the position vector corresponds to the optimum gain value of the transfer function log-sigmoid of the network. The particle's best neighbor (yielding the lowest error) and the global best particle are used to guide the particle to new solutions. At the end of the algorithm, the global best particle's position serves as the answer. During each epoch every particle is accelerated towards its best neighboring position as well as in the direction of the global best positions. This is achieved by calculating a new velocity term for each particle based on its current velocity, the distance from its best neighbor, as well as the distance from the global best position. An inertia weight, reduced linearly by epoch, is multiplied by the current velocity and the other two components are weighted randomly to produce the new velocity value for this particle, which in turn affects the next position of the particle during the next epoch. The fitness function = ½( actual output –target)^2 = ½ (F(a) – T)^2 λ(Wp + b) = ½ (1/ ( 1 + e ) – T ) ^ 2 (12) Where: λ is the gain value ,W is sum of weights, p is sum of input data, b is bias value and T is the target.

In the new approach, the initialization of swarm (value of λ) and velocities, is usually performed randomly and uniformly in the search space, the values of γ1, γ2 and µ were fixed: γ1 = 1.5, γ2 = 0.5 and µ = 0.0010. Default values for the parameters c1 and c2 were used: c1 = c2 = 0.5. Although the choice of the parameter values seems not to be critical for the success of the methods, it appears that faster convergence can be obtained by proper fine-tuning. A time decreasing inertia weight value, i.e., start from 1 and gradually decrease towards 0.4, proved to be superior than a constant value. 2.2 Adapting weights and biases Backpropagation algorithm was created by generalizing the Widrow-Hoff learning rule [15], known as the delta rule or the Least Mean Square (LMS) method, where it involves gradient descent techniques in which the performance index is mean square error (MSE). Gradient descent is the technique where parameters such as weights and biases are moved in the opposite direction to the error gradient [16]. Much like the gradient descent technique, Least Mean Square (LMS) method differs only in the way the error derivatives are calculated. When using the BP algorithm, it was determined that it was not sufficient enough to train and produce an acceptable level of computation. The algorithm when applied to the large sets of data developed poor results and lengthy training times. Therefore a variation of the BP was used and called the Levenberg-Marquardt Backpropagation (LMBP). The LMBP can be thought of as an enhanced BP because it uses a technique from BP, a gradient descent method, plus the Gauss-Newton method [17]. The combination of these techniques allow the neural network to train with a high degree of efficiency. Backpropagation algorithms BP and LMBP had been proved efficient in many problems. However, the speed of convergence is a main drawback; in addition, they often get stuck in local minima for the mean square error function which decreases its efficiency. The new approach can be presented as a new supervised learning algorithm, which is presented to overcome that drawbacks. 3. The new approach In our new approach, the optimization process for the mean squared error function, to produce its global minimum, is made alternately with two algorithms, one by SPSO which optimizes the gain of the log-sigmoid transfer function, and the LMBP approximation learning rule is used to optimize the weights and biases. The purpose of training our new approach is creation a set of weights and biases matrices and gain vectors that make up a mathematical model to give the target. This set optimizes the objective learning function to give global minimum error between the actual output and target to produce the best and fast convergence.


abc

Typewritten Text

468

During the training, gains of the log-sigmoid function, weights and biases are vary to optimize the error function. The association of the target vectors and the output vectors of the neural net are made by invoking the input vectors through the adjusted neural net. The neural net is now trained to predict the target. While training the new approach, the variation between SPSO & LMBP methods can be witnessed by the Levenberg-Marquardt parameter, µ As their programs run, the training process are depicted on the screen of the monitor that can be set to the designers preference (df). Along with a maximum number of epochs to train (me) and a sum-square error goal (eg). In the next section we will discuss a simulation of using the LMBP and the new approach as real time pattern recognition. 4. Implementation of the LMBP and the new approach as real time pattern recognition Backpropagation algorithm is rarely used as real time pattern recognition, since during its training cycle, its convergence tends to be slow, and yields to suboptimal solutions due to local minima problem. Our new approach can be used as real time pattern recognition due to its high speed of convergence and global unique solution. 4.1 Application of the new approach on the rooms of the nuclear reactor of Egypt. The reactor in Egypt is monitored, controlled and managed by the Instrumentation and Control Systems and control loops. It is a distributed, computer-based system which reads all plant data (about 1000 signals) related to nuclear, non-nuclear and radiation protection instrumentation. It is connected with LAN network and digital cameras to show each room of the reactor which can not be entered by the human due to the high radiation level. The new approach is used in real time to recognize the status of that concerning rooms by detecting any critical cases, mainly fire smoke for the surveillance. The new approach for each group of images for the mentioned rooms 100 runs have been performed for each group of three images for each room. Each room are sampling with 100 images with different cases normal and abnormal (different shapes of fire smoke) using the LMBP, and the new approach. After the results are obtained then the average performance is exhibited in terms of the mean value and the standard deviation for the required number of images evaluations, and the percentage of the two algorithms success. The features of the three images which format the input data are arranged in a matrix consists of nine variables. The nine variables are arranged three variables in three rows and the other three variables in three columns. Each raw represents the covariance, mean and standard deviation for each image. The

target data is a single column vector that is required to generate for the given input data. Target represents the status of the required room one for normal or zero for abnormal. The neural net reads the input vectors and target vectors by the formulation p = [covariance, standard deviation, mean]=[3*3] T =[target vector],[1*3] The new approach is made by matlab .7. with frequency of progress displays (in epochs) (df) =2, Maximum number of epochs to train (me)= 300 and sum-square error goal (eg) = 1e-10; In the next section we will discuss a simulation of using the new approach as real time pattern recognition for three examples rooms. 4.1.1 Training results for the first example room (instrumentation room) The instrumentation room contains the instrumentation (rod drive mechanical) system which moves the reactivity control rods to operate or shutdown the reactor. Figure 1 and figure 2 represent one image in normal and abnormal (fire smoke state) from the images of the instrumentation room.

Fig. 1 the first room in normal Fig. 2 the first room with fire smoke.

Figure 3 and figure 4 represent the average of the training performance for LMBP and the new approach, when they are applied hundred times in three sampling of real time images for the first example room. In the figures names of: (TRAINLM) is the Levenberg-Marquardt learning function, (MSE) is the mean square error in the gradient descent technique, which is the performance index , and (Gradient) is the gradient value for the mean square error .


abc

Typewritten Text

abc

Typewritten Text

469

Figure .3. the average of the training performance for LMBP algorithm on the images of the first example room

Figure (4) the average of the training performance for the new approach on the images of the first room. When comparison of figure 3 with figure 4 is made, it is found that the mean square error MSE of LMBP is reduced to (2.67034e-012) and number of epochs to reach the goal are reduced to four epochs. Thus, the new approach reaches the global minimum error faster than LMBP. Table .1. represents the comparison between the average of the training performance between LMBP, and the new approach, which is exhibited in terms of the gradient, mean value and the standard deviation of the three sampling images for the instrumentation room, and the percentage of the two algorithms success. Algorithm name

Gradient Mean Error

Standard Deviation

Success

LMBP 1.35694e-005

2.04479e-011

1.8541e-004

90%

The new approach

6.89025e-007

2.67034e-012

2.9783e-009

100%

Table .1. Comparison the average of the training performance between LMBP, and the new approach for the images of the instrumentation room. From table .1. the new approach has the best (less) Gradient, MSE and standard deviation. In addition, it has best success 100% to reach the goal. 3.1.2 Training results for the second example room Figure 5 and figure 6 represent one image from the second example room in normal and abnormal state(with fire smoke).

Fig. 5 the room in normal Fig. 6 the room with fire smoke. Figure 7 and figure 8 represent the average of the training performance for LMBP and the new approach,

when they are applied hundred times in three sampling of real time images for the second example room.

Figure .7. the average of the training performance for LMBP the images of the second room.

Figure .8. The average of the training performance for the new approach in the images of the second room. When comparison of figure 7 with figure 8 is made, it is found that the mean square error MSE of LMBP is reduced to (4.42539e-013) and number of epochs to reach the goal are reduced to three epochs. Thus, the new approach reaches the global minimum error faster than LMBP. Table .2. represents the comparison between the average of the training performance between LMBP, and the new approach, which is exhibited in terms of the gradient, mean value and the standard deviation of the three sampling images for the second concerning room, and the percentage of the two algorithms success.

Algorithm name

Gradient Mean Error

Standard Deviation

Success

LMBP 4.1783e-005

3.488 e -011

5.0766e-003

91%

The new approach

3.53146e-006

4.42539 e-13

2.9783 e-009

100%

Table .2. the comparison between the average of the training performance between LMBP and the new approach. From table .2. the new approach has the best (less) Gradient, MSE and standard deviation. In addition, it is faster and it has best success 100% to reach the goal. 4.1.3 Training results for the third example room Figure 9 and figure 10 represent one image from the third example room, in normal and abnormal state.


abc

Typewritten Text

470

Fig. 9 the room in normal Fig. 10 the room with fire smoke. Figure 11 and figure 12 represent the average of the training performance for LMBP and the new approach, when they are applied hundred times in three sampling of real time images for the third example room.

Figure .11. the average of the training performance for LMBP the images of the third room.

Figure .12. The average of the training performance for the new approach in the images of the third room. When comparison of figure 11 with figure 12 is made, it is found that the mean square error MSE of LMBP is reduced to (5.73791e-013) and number of epochs to reach the goal are reduced to five epochs. Thus, the new approach reaches the global minimum error faster than LMBP. Table .3. represents the comparison between the average of the training performance between LMBP, and the new approach, which is exhibited in terms of the gradient, mean value and the standard deviation of the three sampling images for the third room, and the percentage of the two algorithms success.

Algorithm name

Gradient Mean Error

Standard Deviation

Success

LMBP 6.61043e-006

5.70249e-012

3.0766e-003

92%

The new approach

4.94864e-005

5.73791e-013

Table .3. the comparison between the average of the training performance between LMBP, and the new approach.

From table .3. the new approach has the best Gradient, MSE and standard deviation. In addition, it is faster and it has best success 100% to reach the goal. 5. Conclusion In this research, a new approach is proposed for optimizing backpropagation algorithm by using the Stretched Particle Swarm Optimization technique (SPSO). The approach focuses on the gain parameter of the transfer log-sigmoid function, as a major contribution to increase the speed of convergence in the training and learning phase. In addition, it optimizes the function of the mean squared error to overcome the local minima problem which is the major problem in the most neural network. The results suggested that our new approach can be used successfully as real time pattern recognition (on line training) for the surveillance of the rooms of the nuclear reactor, which have high radiation level to detect any critical cases, mainly fire smoke. 6. References [1] Armij, L. “Minimization of functions having Lischitz continuous first partial derivatives”, Pacific Journal of Mathematics, 16, 1-3, 1966. [2] Baladi, P. & Hornik K., “Neural networks and principal component analysis: learning from examples and local minima “, Neural Networks, 2,53-58.1989. [3] Bello, M. G., ”Enhanced training algorithms and integrated training / architecture selection for multilayer perceptron networks”, IEEE Transaction Neural Networks, 3,864-874,1992. [4] Battiti, R., “Accelerated backpropagation learning: two optimization methods”, Complex Systems, 3, 331-342,1989. [5] Kollias, S., & Anastassiou, S., “ An adaptive least squares algorithm for the efficient training of artificial neural networks” ,IEEE Transaction Circuits & System, 36, 1092-1101,1989. [6] Rumelhart, D. E & McClelland, J. L.,”Parallel distributed processing”, Cambridge,MA:MIT PRESS, 1986. [7] Antsaklis, P.J., and Moody, J.O. “The Dependence Identification Neural Network Construction Algorithm”, IEEE Transactions on Neural Networks, Vol., No.1, January 1996, pp. 3-15. [8] Baran, Robert H. and Coughlin, James P., “Neural Computation in Hopfield Networks and Boltzmann Machines.”, Newark: University of Delaware Press, 1994. [9] Barbeau, E.J., “Polynomials.”, New York: Springer-Verlag, 1989. [10] Beal, Mark. and Demuth, Howard. “Training Functions in a Neural Network Design Tool.” In Proceedings of the Third Workshop on Neural Networks Academic/ Industrial/ NASA/ Defense ’92. SPIE Vol. 1721, 1992. [11] Kennedy J. and Eberhart R.” Swarm Intelligence”, Morgan Kaufmann Publishers, 2001.


abc

Typewritten Text

471




https://www.researchgate.net/publication/3301683_Enhanced_training_algorithms_and_integrated_trainingarchitecture_selection_for_multilayer_perceptrons?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==




https://www.researchgate.net/publication/5595745_The_dependence_identification_neural_network_construction_algorithm?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==







https://www.researchgate.net/publication/237127809_Accelerated_backpropagation_learning_Two_optimization_methods?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==



https://www.researchgate.net/publication/216722106_Neural_Computation_in_Hopfield_Networks_and_Boltzmann_Machines?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==




https://www.researchgate.net/publication/3184450_Adaptive_least_squares_algorithm_for_the_efficient_training_of_artificial_neural_networks?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==




https://www.researchgate.net/publication/243672900_A_General_Framework_for_Parallel_Distributed_Processing?el=1_x_8&enrichId=rgreq-4727887b-6045-4782-8445-32ea150002a8&enrichSource=Y292ZXJQYWdlOzI2NjQwNjU1MjtBUzoyMTcxNTAwNjIxMDg2NzNAMTQyODc4NDAxNzcxNw==



[12] Kennedy J and Eberhart R. “ Particle Swarm Optimization”, Proceedings IEEE International Conference on Neural Networks, IV: pp. 1942–1948.. IEEE Service Center, Piscataway, NJ , 1995. [13] Y. Shi and R. C. Eberhart, "Parameter Selection in Particle Swarm Optimization", Evolutionary Programming VII (1998), Lecture Notes in Computer Science 1447, pp.591-600, Springer, 1998. [14] Shi Y. and Eberhart R, "A modified Particle Swarm Optimiser", IEEE International Conference on Evolutionary Computation, Anchorage, Alaska, 1998.

[15] Bose k., Liang p., “Neural Network Fundamentals with Graphs, Algorithms, and Application”, Mcgraw-hill international editions, 1996. [16] Antonio A., and Frank W. “Enhancements to SIMMOD: A Neural Network Post-processor to Estimate Aircraft Fuel Consumption”, Research Report RR-97-8, Virginia Tech Blacksburg, VA 24061, December 1997. [17] Xinghuo, Y., & Onder, M., “ a Backpropagation Learning Framework for Feedforward Neural Networks”, IEEE, 2001.


abc

Typewritten Text

472





a new approach for optimizing backpropagation training with variable gain using pso

Documents