keywords: back-propagation algorithm, quickpropagation ...27 rice yield prediction-a comparison...

27

Rice Yield Prediction - A Comparison between Enhanced Back Propagation LearningAlgorithms

Puteh Saadl.2, Nor Khairah JamaludinI, Nursalasawati RusliI, Aryati Bakre and SitiSakira Kamarudin3

IArtificial Intelligence and Software Engineering Research Lab, School of Computer &Communication Engineering,

Kolej Universiti Kejuruteraan Utara Malaysia(KUKUM},Blok A, Kompleks Pusat Pengajian, lalan Kangar-Arau, 02600 lejawi, Per/is, Malaysia.

Telephone:604 - 9798302/8310/8322/8144,Fax: 604 - [email protected], [email protected]

2Faculty of Computer Science and Information System, University Technology Malaysia, Skudai,lohor, Malaysia.

lJuteh(a)'f.s'ksm.utm. mv, [email protected] of Information Technology, Universiti Utara Malaysia, Sintok, Kedah, Malaysia.

[email protected]

Abstract

Back Propagation algorithm(BP} has been popularly used to solve various problems, however it isshrouded with the problems of low convergence and instability. In recent years, improvements havebeen attempted to overcome the discrepancies aforementioned. In this study, we examine theperformance of four enhanced BP algorithms to predict rice yield in MADA plantation area in Kedah,Malaysia. Amidst the four algorithms explored, Conjugate Gradient Descent exhibits the bestperformance.

Keywords:

Back-PropagationAlgorithm,QuickPropagation,ConjugateGradientDescent,Levenberg-Marquardtalgorithmand RiceYieldPrediction.

1. Introduction

Back Propagation is by far the most widely used algorithm to train Multi-Layer Perceptron (MLP) foroptimization, function approximation and pattern recognition. However it is afflicted with severaldeficiencies, the major ones are of low convergence and instability. The problem is classified as NP-complete [4]. The low convergence rate and instability are attributed to the following reasons:i. the presence of local minima (i.e. isolated valleys) in addition to global minimum. Since Back-

Propagation is basically a hill-climbing technique, it runs the risk of being trapped in a localminimum, where every small change in synaptic weights increases the error function. Butsomewhere else in the weight space there exist another set of synaptic weights for which theerror function is smaller than local minimum in which the network is stuck. Clearly, it isundesirable to have the learning process terminating at a local minimum, especially if it islocated far above a global minimum [1].

ii. the error surface is highly curved along a weight dimension, in which case the derivative of theerror surface with respect to that weight is large in magnitude. In this second situation, theadjustment applied to the weight is large, which may cause the algorithm to overshoot theminimum of the error surface [9].

Jilid 16, Bil. 1 (Jun 2004) Jurnal Teknologi Maklumat

28

111. The direction of the negative gradient vector may point away from the minimum of the errorsurface, hence the adjustments applied to the weights may induce the algorithm to move in thewrong direction. Consequently, the rate of convergence in BP training tends to be relativelyslow [8].

The improvements made to overcome the above phenomena are performed using heuristic techniquesand numerical optimization techniques. In this study three techniques are chosen; Quick propagation,Conjugate Gradient Descent and Levenberg-Marquardt. Quick propagation and Conjugate GradientDescent are categorized as heuristic techniques and Levenberg-Marquardt as numerical optimizationtechniques.

Section 2 describes Quick propagation, Conjugate Gradient Descent and Levenberg-Marquardtenhancement techniques. Section 3 present the methodology adopted to perform the yield prediction.Section 4 discusses the results obtained and and the paper ends with a conclusion in Section 5.

2. The Enhanced Back Propagation Algorithms

Quick propagation computes the average gradient of the error surface across all cases before updatingthe weights once at the end of the epoch.

In the standard BP, the error function decreases most rapidly along the negative of the gradienthowever fastest convergence is not guaranteed. Conjugate gradient descent overcomes the discrepancyby constructing a series ofline searches across the error surface. It fIrst works out the direction ofsteepest descent, just as back propagation would do [2].

Po =-goA linesearchis thenperformedto determinethe optimaldistanceto movealongthe currentsearchdirection

Xk+l =Xk + akPkwhere

Xkis the vector of current weight and biasak is the learning ratePk is the gradient

The next search direction is determined so that it is conjugate to previous search directions. Thegeneral procedure for determining the new search direction is to combine the new steepest descentdirection with the previous search direction:

Pk = -gk + f3kPk-l

The constant Pk is computed based on the Fletcher-Reeves update:

f3k

TgkgkT

gk-lgk-l

The Levenberg-Marquardt algorithm was designed to approach second-order training speed withouthaving to compute the Hessian matrix. When the performance function has the form of a sum ofsquares, then the Hessian matrix can be approximated as:[6][3].

Jilid 16, Bil. 1 (Iun 2004) Iurnal Teknologi Maklurnat

29

H=fJ

and the gradient can be computed asg=fe

where

J: Jacobian matrix contains fIrst derivatives of the network errors with respect to theweights and biases.e : a vector of network errors

The weights and biases are computed based on the following formula:

[ T }\ TXk+\ = Xk - J J + j1lJ J e

where f.1is a scalar value. f.1is decreased after each successful step and is increased only when atentative step would increase the performance function. Hence, the performance function will alwaysbe reduced at each iteration of the algorithm.

3. The Methodology

In this study the domain chosen to evaluate the above algorithms is the rice yield prediction. Thesamples are obtained from Muda Agricultural Development Authority (MADA), Kedah, Malaysiafrom 4 areas consisting of27 localities (AI to G4). In this study, only three parameters are consideredthat affects rice yield namely; diseases, pests and weeds. Each parameter is further subdivided intovarious types. There are 12 types of diseases and pests and 11 types of weeds. The effect of each typeof diseases is summed up to one value, similarly with pests and weeds, generating only 3 parameters.Hence the input data table consists of 3 columns representing factors affecting yield and the lastcolumn is the yield obtained. Rows represent localities. 5 seasons (1/95 to 1/97) for 27 localities,producing a total of 135 rows are utilized to train the Neural Network. One season (2/97) for 27localities is used for prediction. Table 1 depicts an example of rice yield data used for training. Table2 consists of data for the three parameters to predict the yield.

The Neural Network is trained using the above enhanced techniques, based on the data in Table 1.Since there are a total of three parameters affecting the yield, thus the number of nodes in the inputlayer is 3. The number of nodes in the output layer is one since there is one output. The number ofnodes in the hidden layer is determined by training the Neural Network by varying it. Based on thesmallest sum of square error values, the suitable number of nodes in the hidden layer for the 4 learningalgorithms is depicted in Table 3.

Tab Network

Jilid 16, Bil. 1 (lun 2004) Jurnal Teknologi Maklumat

Ie J:A Sample ot Rice Yield Data to train the Neural

Localitv Pests Diseases Weeds Yield

Al(1) 63.63 189.11 91.85 4223

B 1(2) 538.46 283.51 236 4276

Cl(3) 151.61 224.38 326.7 3652

D1(4) 189.84 809.26 341.31 4686

E1(5) 176.66 334.08 280.5 3948

A2(6) 936.05 439.96 540.5 4625

B2(7) 868.85 555.59 622.35 4712

C2(8) 560.16 600.63 2507.6 4704D2(9) 562.05 476.2 394.3 5344

E2(1O) 816.12 542.06 1958.59 4764

F2(11) 846.1 547.62 1759.2 4716

30

'd

Jilid 16, Bi!. 1 (Joo 2004) Jurnal Teknologi Maklumat

G2(l2) 68.76 298.88 295.48 3998

H2(13) 297.45 455.05 434.95 4808

12()4) 317.87 505.23 892.26 4005A3(I5) 9.37 668.71 126.61 367283(16) 302.9 495.91 992.5 5196C3( 17) 142.15 580.36 95.23 5226D3(I8) 277.92 663.16 613.05 4406E3(19) 239.5 516.06 76.8 3808B(20) 549.95 379.32 374.8 5292A4(21) 205'.44 528.04 154.53 481484(22) 255.85 703.12 193.16 4621C4(23) 378.56 545.12 514.1 4675D4(24) 440.25 760.78 753.03 5529E4(25) 470.14 707.12 497 4456F4(26) 392.2 750.29 345.94 4883G4(27) 216.35 478.69 650.9 4747

- - . . - _-n _-_n_.- n- _u_- - ---

Localitv Pests Diseases Weeds

Al (I) 74.7 21.8 161.34

81(2) 90.74 22.95 136.94

C1(3) 88 26.54 124.97

D1(4) 56.81 53.21 305.4

E1(5) 138.03 204 323

A2(6) 459.95 31.5 476

82(7) 299.03 17.64 107.4

C2(8) 524.95 132 1174.3

D2(9) 369.8 0.1 175.4

E2(I0) 832.8 14.55 3136.62

F2(1 I) 478 0.1 278

G2(12) 125.15 4.2 102.6

H2(13) 185.05 2.6 573.05

12(14) 102.2 8 269.25A3(15) 18.26 0.6 81.9483(I6) 152.52 51.25 789.9C3( 17) 361.21 0.1 221.32D3( 18) 225.79 0.5 707.45E3( 19) 31.02 0.48 37.76F3(20) 114.19 4.89 141.05A4(21) 211.49 157.1 182.384(22) 87.1 492.7 273.5C4(23) 827.01 380 689D4(24) 799.15 128 497E4(25) 473.95 52.5 517.5F4(26) 398.1 26.5 472.5G4(27) 457.1 51.02 698

31

4. Results and Discussion

The number of nodes in the input layerThe number of hidden layerThe number of nodes in the output layer

=3= 1= 1

Table 3 depicts the number of nodes in the hidden layer for each algorithm. Conjugate GradientDescent and Levenberg-Marquardt use only two nodes in the hidden layer as compared to BackPropagation that uses double the value. Quick Propagation being a heuristic technique uses 3 nodes inthe hidden layer.

Table 3: Number of Nodes in the Hidden Layer for the Learning Algorithms

AI .th I Number of nodes ingon ms .

the hidden layer3

2

2

4

Fewer nodes are required by the Conjugate Gradient algorithm is due to it's nature that perform asearch for minimum value of error function in a straight line fashion as compared to Back Propagationalgorithm that perform a search for a minimum value of error function proportional to the learning rate.

Levenberg-Marquardt algorithm compromises between the linear model and a gradient-descentapproach, thus fewer nodes are used in the hidden layer. A move to a next step is allowed if the errorvalue is less than that of the current value. The allowable downhill movement consists of a sufficientlysmall step.

As for the Quick Propagation, it enhances the Back Propagation algorithm by merely computing theaverage gradient before updating the weights. Thus, it still model the non-linear relationship betweendata, hence there is a slight improvement in the number of nodes in the hidden layer as compared toBack-propagation algorithm.

The Neural Network Model fitted with the above learning algorithm is then used to predict the yieldbased on the sample data in Table 2. A graph of average absolute error versus each of the abovealgorithms is plotted as depicted in Figure. 1. The results obtained tally with the number nodes used inthe hidden layer. With the highest number of nodes in the hidden layer, Back Propagation algorithmshows the highest absolute error.

The absolute error for Quick Propagation is slightly better that back-propagation, absolute error forLavenberg-Marquart is better than Quick Propagation. Conjugate Gradient Descent displayed thelowest absolute error. The lowest error depicted by the Conjugate Gradient Descent algorithm is dueto the search direction to obtain the minimum error value, assuring that the algorithm is not stuck atlocal minima.

Jilid 16, Bil. 1 (Jun 2004) Jurnal Teknologi Maklumat

L

32

600

gsoowS 400~

"0

~300~ 200~..~ 100

ItI A\erage absolute error I

0

;jj>°~ <f'~ '{}o'- .o~f/'I>'" (j'''' '{}~ .p,<f

*«<" Ii'" <c;~ <f°</.o:S-" 0''1> ~'Q0 ,jF,-0 0",0 ~

p,'I> V",f

CP

Enha need BP Algorithm

Figure 1: Average Absolute Error versus Different Enhanced Back Propagation Algorithms

In order to illustrate the performance of each learning algorithm, a graph of actual and predicted yieldis plotted against locality. Figure 2 depicts the performance of Quick Propagation algoritlun. Figure 3depicts the performance of Conjugate Gradient Descent algorithm. Figure 4 depicts the performanceof Levenberg-Marquardt algoritlun and Figure 5 highlights the perfonnance of Back Propagationalgoritlun.

5500

-+- actual

predicted

5250

5000

Yield 4750

4500

4250

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Locality

Figure 2: Yield versus Locality of Quick Propagation

5600

5400

-+- actual

predicted

5200

5000

Yield 4800

4600

4400

4200

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

LocaIlly

Figure 3: Yield versus Locality of Conjugate Gradient Descent

Jilid 16, Bil. 1 (Jun2004) Jurnal Teknologi Maklumat

33

5600

5400

5200

5000--- actual--- predicted

Yield4800

4600

4400

4200

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Locality

Figure.4: Yield versus Locality of Levenberg-Marquardt

5500

--- actual--- predicted

5250

5000

Yield 4750

4500

4250

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

L"""lIty

Figure 5: Yield versus Locality of Back Propagation

Based on Figure 2 to Figure 5, Conjugate Gradient Descent algorithm portrays the best perfonnance ascompared to the other. The next to it is Lavenberg-Marquart and then Quick Propagation. BackPropagation algorithm failed to produce the desired output due the major problem of being stuck atlocal minima The outstanding perfonnance of the Conjugate Gradient Descent algorithm is due to thestrategy that successive weight correction steps are orthogonal to the gradient. Thus attributing it toexhibit a quadratic convergence property that avoid the local minima phenomena. Lavenberg-Marquart is another alternative to choose in training the Neural Network for rice yield prediction. Itssuperiority as compared to Back Propagation algorithm is that the training is based on second-orderderivative approach that avoid local minima problem and exhibit a faster convergence. HoweverLavenberg-Marquart algorithm has a major drawback that it requires the storage of some matrices thatcan be quite large for this kind ofproblerns. Thus, when comparison is performed between Lavenberg-Marquart and Conjugate Gradient Descent, Conjugate Gradient Descent wins. From this finding,Conjugate Gradient Descent is adopted to train the Neural Network to predict the rice yield in thisstudy.

5. Conclusion

In this study, 4 superVisedlearning algorithms are explored to predict rice yield based on weeds,diseases and pests in Muda Agricultural Development Authority (MADA), Kedah, Malaysia. It isfound that Conjugate Gradient Descent algorithm exhibits the outstanding perfonnance as compared toLevenberg-Marquardt, Quick Propagation and Back-propagation algorithms. Although the ConjugateGradient Descent is based on heuristics approaches performing the line searches of minimum errorvalue, it is suitable for this kind of problem due to its quadratic convergence property. The next step ofthis study is to incorporate the Neural Network Model acts as intelligent component into the IntelligentDecision Support System to help paddy farmers to predict yield based on the factors affecting yield.

Jilid 16, Bil. 1 (Jun 2004) Jumal Teknologi Maklurnat

I

33

5600

5400

5200

-+-actual

predicted

5000

Yield 4800

4600

4400

4200

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Locality

Figure.4: Yield versus Locality of Levenberg-Marquardt

5500

-+-actual

predicted

5250

5000

Yield 4750

4500

4250

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Locality

Figure 5: Yield versus Locality of Back Propagation

Based on Figure 2 to Figure 5, Conjugate Gradient Descent algorithm portrays the best performance ascompared to the other. The next to it is Lavenberg-Marquart and then Quick Propagation. BackPropagation algorithm failed to produce the desired output due the major problem of being stuck atlocal minima The outstanding performance of the Conjugate Gradient Descent algorithm is due to thestrategy that successive weight correction steps are orthogonal to the gradient. Thus attributing it toexhibit a quadratic convergence property that avoid the local minima phenomena. Lavenberg-Marquart is another alternative to choose in training the Neural Network for rice yield prediction. Itssuperiority as compared to Back Propagation algorithm is that the training is based on second-orderderivative approach that avoid local minima problem and exhibit a faster convergence. HoweverLavenberg-Marquart algorithm has a major drawback that it requires the storage of some matrices thatcan be quite large for this kind of problems. Thus, when comparison is performed between Lavenberg-Marquart and Conjugate Gradient Descent, Conjugate Gradient Descent wins. From this finding,Conjugate Gradient Descent is adopted to train the Neural Network to predict the rice yield in thisstudy.

5. Conclusion

In this study, 4 superVisedlearning algorithms are explored to predict rice yield based on weeds,diseases and pests in Muda Agricultural Development Authority (MADA), Kedah, Malaysia. It isfound that Conjugate Gradient Descent algorithm exhibits the outstanding performance as compared toLevenberg-Marquardt, Quick Propagation and Back-propagation algorithms. Although the ConjugateGradient Descent is based on heuristics approaches performing the line searches of minimum errorvalue, it is suitable for this kind of problem due to its quadratic convergence property. The next step ofthis study is to incorporate the Neural Network Model acts as intelligent component into the IntelligentDecision Support System to help paddy farmers to predict yield based on the factors affecting yield.

JiIid 16, Bil. 1 (Iun 2004) JumaI Teknologi MakIumat

I

34

Acknowledgement

We are indebted to Ministry of Science Technology and Environment (MOSTE), Malaysia forproviding funds to carry out this research project under the IRPA Grant No: 04-02-06-0066 EAOOIwith the title of "Development of Intelligent Decision Support System for Rice Yield Prediction inPrecision Farming". We are also indebted to KUKUM for providing funds under the short termresearch grant with the title of "The development of Neural Network Learning Algorithm to forecastcrop yield and recognition of 2-D images."

References

[1] Ahmed, W.A.M., Saad, E.S.M. and Aziz, E.S.A. 2001. Modified back propagation algorithm forlearning artificial neural networks. In Proceedings of the Eighteenth National Radio ScienceConference NRSC 2001. 345 - 352.

[2] Charalambous, C. 1992. Conjugate gradient algorithm for efficient training of artificial neuralnetworks. In Proceedings of the IEEE. 139 (3).301-310.

[3] Hagan, M.T. and Menhaj, M.B. 1994. Training feedforward networks with the Marquardtalgorithm. IEEE Transactions on Neural Networks. 5 (6).989 - 993.

[4] Looney, C.G. 1997. Pattern Recognition Using Neural Networks - Theory and Algorithms forEngineers and Scientists. New York: Oxford University Press.

[5] Luh, P.B. and Li Zhang. 1999. A novel neural learning algorithm for multilayer perceptrons. InProceedings of the Int. Joint Conf. on Neural Networks IJCNN '99.3. 1696 - 1701.

[6] Marquardt, D. 1963. An algorithm for least squares estimation of non-linear parameters. J. Soc.Ind. Appl. Math. 431-441.

[7] Ng, S.C. and Leung, S.H. 2001. On solving the local minima problem of adaptive learning byusing deterministic weight evolution algorithm. In Proceedings. of the 2001 Congress onEvolutionary Computation. 1. 251 - 255.

[8] Sidani, A. and Sidani, T. 1994. A comprehensive study of the backpropagation algorithm andmodifications. Southcon/94. Conference Record. 80 - 84.

[9] Wen, J., Zhao, J.L., Luo, S.W. and Han, Z. 2000. The improvements of BP neural networklearning algorithm. In Proceedings of the 5th Int. Conf. on Signal Processing WCCC-ICSP 2000.3. 1647 - 1649.

Jilid 16, Bil. 1 (Jun 2004) Jurnal Teknologi Maklurnat

keywords: back-propagation algorithm, quickpropagation ...27 rice yield prediction-a comparison...

Documents