phd thesis

Download Phd Thesis

If you can't read please download the document

Upload: vtarim

Post on 27-Nov-2014

120 views

Category:

Documents


4 download

TRANSCRIPT

PhDThesisNeuralNetworksforVariationalProblemsinEngineeringRoberto Lopez GonzalezDirector: Prof. EugenioO nateIba nezdeNavarraCo-director: Dr. EvaBalsaCantoTutor: Dr. LlusBelancheMu nozPhDPrograminArticialIntelligenceDepartmentofComputerLanguagesandSystemsTechnicalUniversityofCatalonia21September2008This page is intentionally left blank.NeuralNetworksforVariationalProblemsinEngineeringThis page is intentionally left blank.Dedicadoamifamilia,Runo,EsmeraldayAnaBelen.This page is intentionally left blank.AbstractMany problems arising in science and engineering aim to nd a function which is the optimal valueof aspeciedfunctional. Someexamplesincludeoptimal control, inverseanalysisandoptimalshapedesign. Onlysomeofthese, regardedasvariational problems, canbesolvedanalytically,and the only general technique is to approximate the solution using direct methods. Unfortunately,variational problems are very dicult to solve, and it becomes necessary to innovate in the eld ofnumerical methods in order to overcome the diculties.TheobjectiveofthisPhDThesisistodevelopaconceptual theoryofneural networksfromthe perspective of functional analysis and variational calculus. Within this formulation,learningmeans tosolveavariational problembyminimizinganobjectivefunctional associatedtotheneural network. The choice of the objective functional depends on the particular application. Onthe other side, its evaluation might need the integration of functions, ordinary dierential equationsor partial dierential equations.Asitwill beshown, neural networksareabletodeal withawiderangeof applicationsinmathematics and physics. More specically, a variational formulation for the multilayer perceptronprovides a direct method for solving variational problems. This includes typical applications suchasfunctionregression, patternrecognitionortimeseriesprediction, butalsonewonessuchasoptimal control, inverse problems and optimal shape design.This addition of applications causes that a standard neural network is not able to deal with someparticular problems, and it needs to be augmented. In this work an extended class of multilayerperceptron is developed which,besides the traditional neuron models and network architectures,includes independent parameters, boundary conditions and lower and upper bounds.The computational performance of this numerical method is investigated here through the solu-tion of dierent validation problems with analytical solution. Moreover, a variational formulationfor an extended class of multilayer perceptron is applied to several engineering cases within optimalcontrol, inverse problems or optimal shape design. Finally, this work comes with the open sourceneural networks C++ library Flood, which has been implemented following the functional analysisand calculus of variations theories.7This page is intentionally left blank.ResumenMuchos problemas en ciencia e ingenera se basan en encontrar una funcion para la cual un funcionaldado toma un valor extremo. Algunos tipos de ejemplos son control optimo, problemas inversos odise no optimo de formas. Solo algunos de estos,conocidos como problemas variacionales,tienensolucionanaltica, yla unicatecnicageneral esaproximarlasolucionusandometodosdirectos.Sin embargo, los problemas variacionales son muy difciles de resolver, y es necesario innovar en elcampo de los metodos numericos para paliar las dicultades.El objetivo de esta Tesis Doctoral es desarrollar una teora conceptual de las redes neuronalesdesdeel puntodevistadel analisisfuncional yel calculodevariaciones. Conestaformulacion,aprender esequivalentearesolver unproblemavariacional minimizandounfuncional objetivoasociado a la red neuronal. La eleccion del funcional objetivo depende de la aplicacion en particular.Porotrolado,suevaluacionpuedenecesitarlaintegraciondefunciones,ecuacionesdiferencialesordinarias o ecuaciones en derivadas parciales.Comosevera, lasredesneuronalessoncapacesderesolverunaampliagamadeaplicacionesenmatematicasyfsica. Masespeccamente, unaformulacionvariacional parael perceptronmulticapaproporcionaunmetododirectoparalaresoluciondeproblemasvariacionales. Estoincluye aplicaciones tpicas como regresion de funciones, reconocimiento de patrones y prediccionde series temporales, ademas de algunas nuevas, como control optimo, problemas inversos y dise nooptimo de formas.Dicho aumento de aplicaciones implica que una red neuronal estandar no sea apropiada o seaincapaz de tratar con algunos problemas particulares, y necesite ser ampliada. En este trabajo sedesarrolla una clase extendida de perceptron multicapa que, ademas de los modelos de neurona ylas arquitecturas de red habituales, incluye parametros independientes, condiciones de contorno ycotas inferiores y superiores.El rendimiento computacional de este metodo numerico es investigado mediante la resolucionde diversos problemas de validacion con solucion analtica. Ademas, una formulacion variacionalpara una clase extendida de perceptron multicapa es aplicada a distintos casos en ingeniera dentrodel control optimo, los problemas inversos y el dise no optimo de formas. Finalmente, este trabajose acompa na de la librera C++ en codigo abierto de redes neuronales llamada Flood, que ha sidoimplementada bajo la teora del analisis funcional y el calculo de variaciones.9This page is intentionally left blank.AcknowledgmentsSpecial thankstomyDirector, Prof. EugenioO nate, forhisbeliefandsupportinthisproject.ManythankstomyCo-Director, Dr. EvaBalsa, andmyTutor, Dr. Llu`sBelanche, fortheirfruitful comments and suggestions.Ialsoexpressmygratitudetoall thepeoplewhohascollaboratedtoanypartof thisPhDThesis. They include Dr. Carlos Agelet, Bego na Carmona, Dr. Michele Chiumenti, Dr. PooyanDadvand, AlanDaring, XavierDiego, EnriqueEscolano, Dr. RobertoFlores, Dr. JulioGarcia,Kevin Lau and Inma Ortigosa.FinallyI wouldliketothankall mycolleagues at theInternational Center for NumericalMethods in Engineering (CIMNE) for their pleasant friendship.11This page is intentionally left blank.Contents1 Introduction 12 Preliminaries 72.1 Extreme values of functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 The simplest variational problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 The simplest constrained variational problem. . . . . . . . . . . . . . . . . . . . . 152.4 Direct methods in variational problems . . . . . . . . . . . . . . . . . . . . . . . . . 183 Avariationalformulationforthemultilayerperceptron 233.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 The perceptron neuron model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 The elements of the perceptron. . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 The perceptron function space . . . . . . . . . . . . . . . . . . . . . . . . . 283.3 The multilayer perceptron network architecture . . . . . . . . . . . . . . . . . . . . 283.3.1 Feed-forward architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3.2 The multilayer perceptron function space . . . . . . . . . . . . . . . . . . . 303.3.3 The Jacobian matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.4 Universal approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.5 Pre and post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.6 Multilayer perceptron extensions . . . . . . . . . . . . . . . . . . . . . . . . 363.3.7 The input-output activity diagram. . . . . . . . . . . . . . . . . . . . . . . 383.4 The objective functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.1 The variational problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.2 The reduced function optimization problem. . . . . . . . . . . . . . . . . . 413.4.3 The objective function gradient. . . . . . . . . . . . . . . . . . . . . . . . . 423.4.4 The objective function Hessian . . . . . . . . . . . . . . . . . . . . . . . . . 473.5 The training algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.5.1 The function optimization problem. . . . . . . . . . . . . . . . . . . . . . . 483.5.2 Random search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.5.3 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.5.4 Newtons method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.5.5 Conjugate gradient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.5.6 Quasi-Newton method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.5.7 One-dimensional minimization algorithms . . . . . . . . . . . . . . . . . . . 563.5.8 Evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57iii4 Modelingofdata 654.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.1.1 Function regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.1.2 Pattern recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.1.3 The sum squared error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.1.4 The normalized squared error . . . . . . . . . . . . . . . . . . . . . . . . . . 674.1.5 The Minkowski error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1.6 Regularization theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1.7 Linear regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 The residuary resistance of sailing yachts problem . . . . . . . . . . . . . . . . . . 694.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2.2 Experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.2.3 The Delft series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.2.4 Neural networks approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.3 The airfoil self-noise problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.3.2 Experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.3.3 The Brooks-Pope-Marcolini (BPM) model . . . . . . . . . . . . . . . . . . . 774.3.4 Neural networks approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815 Classicalproblemsinthecalculusofvariations 855.1 The geodesic problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.1.2 Selection of function space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.1.3 Formulation of variational problem. . . . . . . . . . . . . . . . . . . . . . . 885.1.4 Solution of reduced function optimization problem. . . . . . . . . . . . . . 885.2 The brachistochrone problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.2.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.2.2 Selection of function space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.2.3 Formulation of variational problem. . . . . . . . . . . . . . . . . . . . . . . 935.2.4 Solution of reduced function optimization problem. . . . . . . . . . . . . . 945.3 The catenary problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.3.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.3.2 Selection of function space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.3.3 Formulation of variational problem. . . . . . . . . . . . . . . . . . . . . . . 985.3.4 Solution of reduced function optimization problem. . . . . . . . . . . . . . 995.4 The isoperimetric problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.4.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.4.2 Selection of function space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.4.3 Formulation of variational problem. . . . . . . . . . . . . . . . . . . . . . . 1045.4.4 Solution of reduced function optimization problem. . . . . . . . . . . . . . 105iii6 Optimalcontrolproblems 1096.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096.2 Validation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.2.1 The car problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.3 The fed batch fermenter problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.3.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.3.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4 The aircraft landing problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.4.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.4.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1316.4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1357 Inverseproblems 1397.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.2 Validation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417.2.1 The boundary temperature estimation problem. . . . . . . . . . . . . . . . 1427.2.2 The thermal conductivity estimation problem. . . . . . . . . . . . . . . . . 1477.3 Microstructural modeling of aluminium alloys . . . . . . . . . . . . . . . . . . . . . 1517.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1517.3.2 Dissolution models for aluminium alloys . . . . . . . . . . . . . . . . . . . . 1527.3.3 Experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1537.3.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588 Optimalshapedesign 1618.1 Mathematical formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1618.2 Validation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1638.2.1 The minimum drag problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 1638.3 Optimal airfoil design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1698.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1698.3.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1698.3.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1708.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1759 Conclusionsandfuturework 179AThesoftwaremodelofFlood 181A.1 The Unied Modeling Language (UML) . . . . . . . . . . . . . . . . . . . . . . . . 181A.2 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181A.3 Associations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182A.4 Derived classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182A.5 Attributes and operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185A.5.1 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185A.5.2 Multilayer perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185A.5.3 Objective functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186A.5.4 Training algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187ivBNumericalintegration 189B.1 Integration of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189B.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189B.1.2 Closed Newton-Cotes formulas . . . . . . . . . . . . . . . . . . . . . . . . . 189B.1.3 Extended Newton-Cotes formulas . . . . . . . . . . . . . . . . . . . . . . . . 190B.1.4 Ordinary dierential equation approach . . . . . . . . . . . . . . . . . . . . 191B.2 Ordinary dierential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191B.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191B.2.2 The Euler method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192B.2.3 The Runge-Kutta method. . . . . . . . . . . . . . . . . . . . . . . . . . . . 192B.2.4 The Runge-Kutta-Fehlberg method . . . . . . . . . . . . . . . . . . . . . . . 192B.3 Partial dierential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194B.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194B.3.2 The nite dierences method . . . . . . . . . . . . . . . . . . . . . . . . . . 195B.3.3 The nite element method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195CRelatedpublications 197DRelatedprojects 199E Relatedsoftware 201ListofFigures2.1 Illustration of the arc length functional. . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Illustration of the sum squared error functional. . . . . . . . . . . . . . . . . . . . . 102.3 The brachistochrone problem statement. . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Analytical solution to the brachistochrone problem. . . . . . . . . . . . . . . . . . . 152.5 The isoperimetric problem statement. . . . . . . . . . . . . . . . . . . . . . . . . . 172.6 Analytical solution to the isoperimetric problem. . . . . . . . . . . . . . . . . . . . 192.7 Illustration of the Euler method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.8 Illustration of the Ritz method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1 Activity diagram for the learning problem in the multilayer perceptron. . . . . . . 243.2 Perceptron neuron model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 A threshold activation function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.4 A sigmoid activation function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 A linear activation function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.6 The feed-forward network architecture. . . . . . . . . . . . . . . . . . . . . . . . . . 293.7 A two-layer perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.8 An activity diagram for the input-output process in the multilayer perceptron. . . 383.9 Geometrical representation of the objective function. . . . . . . . . . . . . . . . . . 413.10Illustration of the objective function gradient vector. . . . . . . . . . . . . . . . . . 433.11Training process in the multilayer perceptron. . . . . . . . . . . . . . . . . . . . . . 493.12Training process with the gradient descent training algorithm. . . . . . . . . . . . . 513.13Training process with the Newtons method. . . . . . . . . . . . . . . . . . . . . . . 533.14Training process with the conjugate gradient training algorithm. . . . . . . . . . . 553.15Training process with the quasi-Newton method. . . . . . . . . . . . . . . . . . . . 573.16Training process with the evolutionary algorithm. . . . . . . . . . . . . . . . . . . . 583.17Illustration the roulette wheel selection method. . . . . . . . . . . . . . . . . . . . . 603.18Illustration of the stochastic universal sampling selection method. . . . . . . . . . . 603.19Illustration of line recombination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.20Illustration of intermediate recombination. . . . . . . . . . . . . . . . . . . . . . . . 623.21Illustration of uniform mutation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.22Illustration of normal mutation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.1 Network architecture for the yacht resistance problem. . . . . . . . . . . . . . . . . 724.2 Evaluation history for the yacht resistance problem. . . . . . . . . . . . . . . . . . 734.3 Gradient norm history for the yacht resistance problem. . . . . . . . . . . . . . . . 734.4 Linear regression analysis plot for the yacht resistance problem. . . . . . . . . . . . 754.5 Airfoil self-noise mechanisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.6 Optimal network architecture for the airfoil self-noise problem. . . . . . . . . . . . 83vvi LISTOFFIGURES4.7 Evaluation history for the airfoil self-noise problem problem. . . . . . . . . . . . . . 834.8 Gradient norm history for the airfoil self-noise problem. . . . . . . . . . . . . . . . 844.9 Linear regression analysis in the airfoil noise problem. . . . . . . . . . . . . . . . . 845.1 The geodesic problem statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2 Network architecture for the geodesic problem. . . . . . . . . . . . . . . . . . . . . 875.3 Initial guess for the geodesic problem. . . . . . . . . . . . . . . . . . . . . . . . . . 875.4 Evaluation history for the geodesic problem. . . . . . . . . . . . . . . . . . . . . . . 895.5 Gradient norm history for the geodesic problem. . . . . . . . . . . . . . . . . . . . 895.6 Neural network results for the geodesic problem. . . . . . . . . . . . . . . . . . . . 905.7 The brachistochrone problem statement. . . . . . . . . . . . . . . . . . . . . . . . . 915.8 Network architecture for the brachistochrone problem. . . . . . . . . . . . . . . . . 925.9 Initial guess for the brachistochrone problem. . . . . . . . . . . . . . . . . . . . . . 935.10Evaluation history for the brachistochrone problem. . . . . . . . . . . . . . . . . . 955.11Gradient norm history for the brachistochrone problem. . . . . . . . . . . . . . . . 955.12Neural network results for the brachistochrone problem. . . . . . . . . . . . . . . . 965.13The catenary problem statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.14Network architecture for the catenary problem. . . . . . . . . . . . . . . . . . . . . 985.15Initial guess for the catenary problem. . . . . . . . . . . . . . . . . . . . . . . . . . 995.16Evaluation history for the catenary problem. . . . . . . . . . . . . . . . . . . . . . 1005.17Gradient norm history for the catenary problem. . . . . . . . . . . . . . . . . . . . 1005.18Neural network results for the catenary problem. . . . . . . . . . . . . . . . . . . . 1015.19The isoperimetric problem statement. . . . . . . . . . . . . . . . . . . . . . . . . . 1025.20Network architecture for the isoperimetric problem. . . . . . . . . . . . . . . . . . . 1035.21Initial guess for the isoperimetric problem. . . . . . . . . . . . . . . . . . . . . . . . 1045.22Evaluation history for the isoperimetric problem. . . . . . . . . . . . . . . . . . . . 1065.23Gradient norm history for the isoperimetric problem. . . . . . . . . . . . . . . . . . 1065.24Neural network results for the isoperimetric problem. . . . . . . . . . . . . . . . . . 1076.1 The car problem statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.2 Network architecture for the car problem. . . . . . . . . . . . . . . . . . . . . . . . 1136.3 Evaluation history for the car problem. . . . . . . . . . . . . . . . . . . . . . . . . . 1156.4 Gradient norm history for the car problem. . . . . . . . . . . . . . . . . . . . . . . 1166.5 Neural network solution for the optimal acceleration in the car problem. . . . . . . 1176.6 Neural network solution for the optimal deceleration in the car problem. . . . . . . 1176.7 Corresponding optimal trajectory for the position in the car problem. . . . . . . . 1186.8 Corresponding optimal trajectory for the position in the car problem. . . . . . . . 1186.9 The fed batch fermenter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.10Network architecture for the fed batch fermenter problem. . . . . . . . . . . . . . . 1216.11Evaluation history in the fed batch fermenter problem. . . . . . . . . . . . . . . . . 1236.12Gradient norm history in the fed batch fermenter problem. . . . . . . . . . . . . . 1246.13Optimal control for the fed batch fermenter.. . . . . . . . . . . . . . . . . . . . . . 1256.14Optimal trajectory for the concentration of cell mass in fed batch fermenter. . . . . 1256.15Optimal trajectory for the concentration of substrate in the fed batch fermenter. . 1266.16Optimal trajectory for the concentration of product in the fed batch fermenter. . . 1266.17Optimal trajectory for the broth volume in the fed batch fermenter problem. . . . 1276.18Optimal specic growth rate in the fed batch fermenter problem. . . . . . . . . . . 1276.19Optimal specic productivity in the fed batch fermenter problem. . . . . . . . . . . 1286.20Elevator deection angle and pitch angle. . . . . . . . . . . . . . . . . . . . . . . . 128LISTOFFIGURES vii6.21Desired altitude in the aircraft landing problem. . . . . . . . . . . . . . . . . . . . 1316.22Network architecture for the aircraft landing problem. . . . . . . . . . . . . . . . . 1326.23Evaluation history for the aircraft landing problem. . . . . . . . . . . . . . . . . . . 1346.24Optimal control (elevator deection angle) for the aircraft landing problem. . . . . 1356.25Optimal altitude trajectory for the aircraft landing problem. . . . . . . . . . . . . . 1356.26Optimal altitude rate trajectory for the aircraft landing problem. . . . . . . . . . . 1366.27Optimal pitch angle trajectory for the aircraft landing problem. . . . . . . . . . . . 1366.28Optimal pitch angle rate trajectory for the aircraft landing problem. . . . . . . . . 1377.1 The boundary temperature estimation problem statement. . . . . . . . . . . . . . . 1427.2 Analytical boundary temperature and corresponding center temperature. . . . . . 1437.3 Network architecture for the boundary temperature estimation problem. . . . . . . 1447.4 Evaluation history for the boundary temperature estimation problem. . . . . . . . 1457.5 Gradient norm history for the boundary temperature estimation problem. . . . . . 1467.6 Neural network results for the boundary and corresponding center temperatures. . 1467.7 The thermal conductivity estimation problem statement. . . . . . . . . . . . . . . . 1477.8 Network architecture for the thermal conductivity estimation problem. . . . . . . . 1487.9 Evaluation history for the thermal conductivity estimation problem. . . . . . . . . 1507.10Gradient norm history for the thermal conductivity estimation problem. . . . . . . 1507.11Neural network results to the thermal conductivity estimation problem. . . . . . . 1517.12Vickers hardness test for aluminium alloy 2014-T6. . . . . . . . . . . . . . . . . . . 1547.13Vickers hardness test for aluminium alloy 7449-T79. . . . . . . . . . . . . . . . . . 1547.14Network architecture for aluminium alloys 2014-T6 and 7449-T79. . . . . . . . . . 1557.15Training history for aluminium alloy 2014-T6. . . . . . . . . . . . . . . . . . . . . . 1577.16Training history for aluminium alloy 7449-T79. . . . . . . . . . . . . . . . . . . . . 1587.17Dissolution model for aluminium alloy 2014-T6.. . . . . . . . . . . . . . . . . . . . 1597.18Dissolution model for aluminium alloy 7449-T79. . . . . . . . . . . . . . . . . . . . 1598.1 The minimum drag problem statement. . . . . . . . . . . . . . . . . . . . . . . . . 1648.2 Network architecture for the minimum drag problem. . . . . . . . . . . . . . . . . . 1658.3 Initial guess for the minimum drag problem. . . . . . . . . . . . . . . . . . . . . . . 1668.4 Evaluation history for the minimum drag problem. . . . . . . . . . . . . . . . . . . 1678.5 Gradient norm history for the minimum drag problem. . . . . . . . . . . . . . . . . 1678.6 Neural network results to the minimum drag problem. . . . . . . . . . . . . . . . . 1688.7 Network architecture for the designing of an optimal airfoil. . . . . . . . . . . . . . 1718.8 Baseline for the designing of an optimal airfoil. . . . . . . . . . . . . . . . . . . . . 1728.9 Evaluation history in the optimal airfoil design problem. . . . . . . . . . . . . . . . 1758.10Results for the camber in the optimal airfoil design problem. . . . . . . . . . . . . 1768.11Results for the thickness in the optimal airfoil design problem. . . . . . . . . . . . 1778.12Resultsfortheupperandlowersurfacecoordinatesintheoptimal airfoil designproblem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1778.13Pressure coecient distribution for the optimal airfoil design. . . . . . . . . . . . . 178A.1 A conceptual diagram for the multilayer perceptron. . . . . . . . . . . . . . . . . . 182A.2 Aggregation of associations to the conceptual diagram.. . . . . . . . . . . . . . . . 183A.3 Aggregation of derived classes to the association diagram. . . . . . . . . . . . . . . 184A.4 Attributes and operations of the Perceptron classes. . . . . . . . . . . . . . . . . . 185A.5 Attributes and operations of the class MultilayerPerceptron. . . . . . . . . . . . . . 186A.6 Attributes and operations of some ObjectiveFunctional classes. . . . . . . . . . . . 187viii LISTOFFIGURESA.7 Attributes and operations of the TrainingAlgorithm classes. . . . . . . . . . . . . . 188ListofTables4.1 Basic data set statistics in the yacht resistance problem. . . . . . . . . . . . . . . . 704.2 Training and validation errors in the yacht resistance problem. . . . . . . . . . . . 714.3 Training results for the yacht resistance problem. . . . . . . . . . . . . . . . . . . . 744.4 Linear regression analysis parameters for the yacht resistance problem. . . . . . . . 754.5 Basic input-target data set statistics in the airfoil noise problem. . . . . . . . . . . 774.6 Training and validation errors in the airfoil noise problem. . . . . . . . . . . . . . . 794.7 Training results for the airfoil self-noise problem. . . . . . . . . . . . . . . . . . . . 804.8 Linear regression analysis parameters for the airfoil self noise problem. . . . . . . . 815.1 Training results for the geodesic problem. . . . . . . . . . . . . . . . . . . . . . . . 905.2 Training results for the brachistochrone problem. . . . . . . . . . . . . . . . . . . . 965.3 Training results for the catenary problem. . . . . . . . . . . . . . . . . . . . . . . . 1015.4 Training results for the isoperimetric problem. . . . . . . . . . . . . . . . . . . . . . 1076.1 Training results for the car problem. . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.2 Minimum and maximum for pre and post-processing in the fermenter problem. . . 1216.3 Training results for the fed batch fermenter problem. . . . . . . . . . . . . . . . . . 1246.4 Initial Conditions for the aircraft landing problem. . . . . . . . . . . . . . . . . . . 1306.5 Weight factors in the aircraft landing problem. . . . . . . . . . . . . . . . . . . . . 1336.6 Training results for the aircraft landing problem. . . . . . . . . . . . . . . . . . . . 1347.1 Training results for the boundary temperature estimation problem. . . . . . . . . . 1457.2 Training results for the thermal conductivity estimation problem. . . . . . . . . . . 1497.3 Parameters for aluminium alloys 2014-T6 and 7449-T79. . . . . . . . . . . . . . . . 1557.4 Training results for aluminium alloys 2014-T6 and 7449-T79. . . . . . . . . . . . . 1578.1 Training results for the minimum drag problem. . . . . . . . . . . . . . . . . . . . . 1688.2 Training operators and parameters for the minimum Eulerian drag problem. . . . . 1748.3 Training results for the optimal airfoil design problem. . . . . . . . . . . . . . . . . 176B.1 Cash and Karp parameters for the Runge-Kutta-Fehlberg method. . . . . . . . . . 193ixThis page is intentionally left blank.ListofApplicationsApplicationslistedbyclass1. Function regression:Residuary resistance of sailing yachts modeling, see Section 4.2.Airfoil self-noise prediction, see Section 4.3.2. Optimal control:Fed batch fermenter, see Section 6.3.Aircraft landing systems, see Section 6.4.3. Inverse problems:Microstructural modeling of aluminium alloys, see Section 7.3.4. Optimal shape design:Optimal airfoil design, see Section 8.3.Applicationslistedbyarea1. Aeronautical:Airfoil self-noise prediction, see Section 4.3Optimal control of aircraft landing systems, see Section 6.4.Optimal airfoil design, see Section 8.3.2. Chemical:Maximum yield in a fed batch fermenter, see Section 6.3.3. Naval:Residuary resistance of sailing yachts prediction, see Section 4.2.4. Metallurgical:Microstructural modeling of aluminium alloys, see Section 7.3.xiThis page is intentionally left blank.Chapter1IntroductionQueen Dido of Carthage was apparently the rst person to attack a problem that can readily besolvedbyusingthecalculusofvariations. Dido,havingbeenpromisedallofthelandshecouldenclosewithabullshide, cleverlycutthehideintomanylengthsandtiedtheendstogether.Havingdonethis,herproblemwastondtheclosedcurvewithaxedperimeterthatenclosesthe maximum area [53]. The problem is based on a passage from Virgils Aeneid:The Kingdom you see is Carthage, the Tyrians, the town of Agenor;But the country around is Libya, no folk to meet in war.Dido, who left the city of Tyre to escape her brother,Rules here-a long a labyrinthine tale of wrongIs hers, but I will touch on its salient points in order...Dido, in great disquiet, organised her friends for escape.They met together, all those who harshly hated the tyrantOr keenly feared him: they seized some ships which chanced to be ready...They came to this spot, where to-day you can behold the mightyBattlements and the rising citadel of New Carthage,And purchased a site, which was named Bulls Hide after the bargainBy which they should get as much land as they could enclose with a bulls hide.DespitethecircleappearstobeanobvioussolutiontoDidosproblem, provingthisfactisratherdicult. Zenodorusprovedthattheareaofthecircleislargerthanthatofanypolygonhaving the same perimeter, but the problem was not rigorously solved until 1838 by Jakob Steiner[99].Thehistoryof variational calculus dates backtotheancient Greeks, but it was not untiltheseventeenthcenturyinwesternEuropethatsubstantial progresswasmade. Aproblemofhistorical interest is the brachistochrone problem,posed by Johann Bernoulli in 1696. The termbrachistochrone derives from the Greek brachistos (the shortest) and chronos (time):Given two pointsA andBin a vertical plane, what is the curve traced out by a particle actedon only by gravity, which starts atA and reachesBin the shortest time?Sir Isaac Newton was challenged to solve the problem, and did so the very next day. In fact,the solution to the brachistochrone problem, which is a segment of a cycloid, is credited to Johannand Jacob Bernoulli, Sir Isaac Newton and Guillaume de LHopital [53].12 CHAPTER1. INTRODUCTIONIn that context, the so called Didos problem was renamed as the isoperimetric problem, andit was stated as:Of all simple closed curves in the plane of a given length, which encloses the maximum area?Another important variational problem is the data modeling problem. A conventional approachis themethodof least squares, whichwas rst proposedbyAdrienMarieLegendreandCarlFriedrich Gauss in the early nineteenth century as a way of inferring planetary trajectories fromnoisy data [6].Notethat, inall theseproblems, curvesaresoughtwhichareoptimal insomesense. Morespecically, theaimof avariational problemistondafunctionwhichistheminimal orthemaximal value of a specied functional. By a functional, we mean a correspondence which assignsa number to each function belonging to some class [29]. The calculus of variations gives methodsforndingextremalsoffunctionals, andproblemsthatconsistinndingminimal andmaximalvalues of functionals are called variational problems [29].From an engineering point of view, variational problems can be classied according to the wayinwhichtheycanbeappliedforaparticularpurpose. Inthisway, someclassesof variationalproblems of practical interest are optimal control, inverse or optimal shape design. They are oftendened by integrals, ordinary dierential equations or partial dierential equations.Optimal control is playing an increasingly important role in the design of modern engineeringsystems. Theaimhereistheoptimization, insomedenedsense, ofaphysical process. Morespecically, theobjectiveof theseproblemsistodeterminethecontrol signalsthatwill causeaprocesstosatisfythephysical constraintsandatthesametimeminimizeormaximizesomeperformance criterion [53] [10]. As a simple example, consider the problem of a rocket launchinga satellite into an orbit around the earth. An associated optimal control problem is to choose thecontrols (the thrust attitude angle and the rate of emission of the exhaust gases) so that the rockettakes the satellite into its prescribed orbit with minimum expenditure of fuel or in minimum time.Inverseproblemscanbedescribedasbeingopposedtodirectproblems. Inadirectproblemthe cause is given, and the eect is determined. In an inverse problem the eect is given, and thecause is estimated [54] [90] [85]. There are two main types of inverse problems: input estimation,inwhichthesystemproperties andoutput areknownandtheinput is tobeestimated; andpropertiesestimation, inwhichthethesysteminputandoutputareknownandthepropertiesaretobeestimated. Inverseproblemscanbefoundinmanyareasof scienceandengineering.Atypicalinverseproblemingeophysicsistondthesubsurfaceinhomogeneitiesfromcollectedscattered elds caused by acoustic waves sent at the surface.Optimal shape design is a very interesting eld for industrial applications. The goal in theseproblems is to computerize the development process of some tool, and therefore shorten the timeit takes to create or to improve the existing one. Being more precise, in an optimal shape designprocess one wishes to optimize some performance criterium involving the solution of a mathematicalmodelwithrespecttoitsdomainofdenition[18]. Oneexampleisthedesignofairfoils,whichproceedsfromaknowledgeoftheboundarylayerpropertiesandtherelationbetweengeometryand pressure distribution [31] [74]. The performance goal here might vary: weight reduction, stressreinforcement, drag reduction and even noise reduction can be obtained. On the other hand, theairfoil may be required to achieve this performance with constraints on thickness, pitching moment,etc.However, whilesomesimplevariationalproblemscanbesolvedanalyticallybymeansoftheEuler-Lagrange equation [29], the only practical technique for general problems is to approximatethesolutionusingdirect methods [29]. Thefundamental ideaunderlyingthesocalleddirectmethods is to consider the variational problem as a limit problem for some function optimization3probleminmanydimensions[29]. Twodirect methodsof considerableconcernarethoseduetoEuler andRitz[29]. Alternativetechniques basedonLaguerrepolynomials [19], Legendrepolynomials [22], Chebyshev polynomials [48], or more recently wavelets [86] [51], for instance, havebeen also proposed. Unfortunately, all these problems are dicult to solve. Decient approximationor convergence properties of the base functions, the usually large dimension in the resulting functionoptimization problem, the presence of local minima or solutions presenting oscillatory behaviors aresome of the most typical complications. Therefore new numerical methods need to be developedin order to overcome that troubles.Unfortunately, variational problemsmightbeextremelydiculttobesolved. Decientap-proximationorconvergencepropertiesofthebasefunctions, theusuallylargedimensionintheresulting function optimization problem, the presence of local minima or solutions presenting os-cillatory behaviors are some of the most typical complications. Therefore, more eort is needed inorder to surmount these diculties.During its development, articial intelligence has been moving toward new methods of knowl-edgerepresentationandprocessingthatareclosertohumanreasoning. Inthisregard, anewcomputationalparadigmhasbeenestablishedwithmanydevelopmentsandapplications-arti-cial neural networks. An articial neural network, or simply a neural network, can be dened asabiologicallyinspiredcomputational model whichconsistsof anetworkarchitecturecomposedofarticialneurons[41]. Thisstructurecontainsasetofparameters, whichcanbeadjustedtoperform certain tasks.Even though neural networks have similarities to the human brain, they are not meant to modelit,buttobeusefulmodelsforproblem-solvingandknowledge-engineeringinahumanlikeway.The human brain is much more complex and unfortunately, many of its cognitive functions are stillnot well known. But the more about the human brain is learnt, the better computational modelscan be developed and put to practical use.One way to understand the ideas behind neural computation is to look at the history of thisscience [3]. McCulloch and Pitts described in 1943 a model of a neuron that is binary and has axed threshold [72]. A network of such neurons can perform logical operations, and it is capableof universal computation. Hebb, inhisbookpublishedin1949[42], proposedneural networkarchitecturesandthersttrainingalgorithm. Thisisusedtoformatheoryofhowcollectionsof cellsmightformaconcept. Rosenblatt, in1958, puttogethertheideasof McCulloch, PittsandHebbtopresenttheperceptronneuronmodel anditstrainingalgorithm[87]. MinskyandPapert demonstrated the theoretical limits of the perceptron in 1969 [73]. Many researchers thenabandoned neural networks and started to develop other articial intelligence methods and systems.New models, among them the associative memories [47], self-organizing networks [55], the mul-tilayerperceptronandtheback-propagationtrainingalgorithm[89], ortheadaptiveresonancetheory(ART)[20] weredevelopedlater, whichbroughtresearchersbacktotheeldof neuralnetworks. Now, many more types of neural networks have been designed and used. The bidirec-tional associative memory, radial basis functions, probabilistic RAM neural networks, fuzzy neuralnetworks and oscillatory neural networks are only a small number of models developed.From all that kinds of neural network, the multilayer perceptron is a very important one, andmuchof theliteratureintheareaisreferredtoit. Traditionally, thelearningproblemforthemultilayerperceptronhasbeenformulatedintermsoftheminimizationofanerrorfunctionofthe free parameters, in order to t the neural network outputs to an input-target data set [15]. Inthat way, the learning tasks allowed are data modeling type problems, such as function regression,pattern recognition or time series prediction.The function regression problem can be regarded as the problem of approximating a functionfrom noisy or corrupted data [41]. Here the neural network learns from knowledge represented by4 CHAPTER1. INTRODUCTIONa training data set consisting of input-target examples. The targets are a specication of what theresponse to the inputs should be [15].Thepatternrecognition(orclassication)problemcanbestatedastheprocesswherebyareceivedpattern, characterizedbyadistinctsetof features, isassignedtooneof aprescribednumber of classes [41]. Here the neural network learns from knowledge represented by a trainingdata set consisting of input-target examples. The inputs include a set of features which characterizea pattern. The targets specify the class that each pattern belongs to [15].Thetimeseriespredictionproblemcanberegardedastheproblemofpredictingthefuturestateof asystemfromasetof pastobservationstothatsystem[41]. Atimeseriespredictionproblem can also be solved by approximating a function from input-target data, where the inputsinclude data from past observations and the targets include corresponding future observations inrelation to that inputs [15].Regarding engineering applications, neural networks are nowadays being used to solve a wholerange of problems [58]. Most common case studies include modeling hitherto intractable processes,designingcomplexfeed-backcontrol signals, buildingmeta-modelsforposterioroptimizationordetectingdevicefaults. All that applications needfromaninput-target dataset inorder toevaluatetheerrorfunction, andthereforefall intooneofthethreecategoriesofdatamodelingenumerated above.In this work a variational formulation for the multilayer perceptron is presented. Within thisformulation, the learning problem lies in terms of solving a variational problem by minimizing anobjective functional of the function space spanned by the neural network. The choice of a suitableobjectivefunctional dependsontheparticularapplicationanditmightneedtheintegrationoffunctions, ordinary dierential equations or partial dierential equations in order to be evaluated.As we will see, neural networks are not only able to solve data modeling applications, but also awide range of mathematical and physical problems. More specically, a variational formulation forneural networks provides a direct method for solving variational problems.Inordertovalidatethisnumerical methodwetrainamultilayerperceptrontosolvesomeclassical problems in the calculus of variations, and compare the neural network results against theanalyticalresult. Thisvariationalformulationisalsoappliedtosometraditionallearningtasks,such as function regression. Finally validation examples and real applications of dierent classesof variational problems in engineering, such as optimal control, inverse analysis and shape designare also included.It is important to mention that some eorts have been already performed in this eld. Zoppoliet al. [103] discussed the approximation properties of dierent classes of neural networks as linearcombinationsofnonxedbasisfunctions, andappliedthistheorytosomestochasticfunctionaloptimizationproblems. In[91], SarkarandModakmakeuseof aneural networktodetermineseveral optimal control prolesfordierentchemical reactors. Also, Franco-LaraandWeuster-Botz [35] estimated optimal feeding strategies for bed-batch bioprocesses using neural networks.Thisworkcomplementsthatresearchandintroducesanumberofnovelissues. First,acon-ceptualtheoryforneuralnetworksfromavariationalpointofviewiswritten. Inparticular,weintroduce the idea of the function space spanned by a multilayer perceptron. This neural networkcan also be extended so as to include independent parameters, boundary conditions and bounds.On the other hand, learning tasks are here stated as variational problems. They are dened by anobjective functional, which has a parameterized function associated to it. In addition, we explainthe solution approach of the reduced function optimization problem by means of the training al-gorithm. Second, the range of engineering applications for neural networks is here augmented soas to include shape design, optimal control and inverse problems. The performance of this compu-tational tool is studied through the solution of a validation case for each of that type of problems.5Third, the embed of dierent numerical methods, such as the Simpson method, the Runge-Kuttamethod or the Finite Element Method for evaluating how well a neural network does an activityis here considered.In summary, the emphasis of this PhD Thesis is placed on the demonstration of applicability ofthe multilayer perceptron to various classes of variational problems in engineering. In this regard,asinglesoftwaretoolcanbeusedformanydierentapplications, savingtimeandeorttotheengineers and therefore reducing cost to the companies.All theexamplespresentedherearewritteninaself-containedfashion, andthemaincon-cepts are described for every single example, so that the reader needs only to look at the type ofapplications which are interesting for him.TogetherwiththisPhDThesisthereistheopensourceneuralnetworksC++libraryFlood[61]. It uses the same concepts and terminology as in here, and contains everything needed to solvemost of the applications included.In Chapter 2 the basic mathematical theory concerning the calculus of variations is presented.These are the most important concepts from which the rest of this work is developed.In Chapter 3 an extended class of multilayer perceptron is presented so as to include independentparameters, boundaryconditionsandlowerandupperbounds. Thelearningproblemforthatneural networkis thenformulatedfromtheperspectiveof functional analysis andvariationalcalculus.In Chapter 4 the task of data modeling is described from a variational point of view. In thisway, function regression and pattern recognition t in the variational formulation proposed. Severalpractical applications are also solved.InChapter 5someclassical problems inthecalculus of variations aresolvedusingneuralnetworks. Theyareintendedtobeusedforvalidationpurposes, butalsoasastartingpointtounderstand the solution approach of more complex applications.InChapter6themostimportantaspectsof optimal control arereviewedandthistypeofproblems is formulatedas alearningtaskfor themultilayer perceptron. Inorder tovalidatethis direct method an optimal control problem with analytical solution is solved. Two other realapplications are also approached with a neural network.In Chapter 7 the inverse problems theory is introduced and stated as a possible application forneural networks. This is also validated through two dierent case studies articially generated.In Chapter 8 the mathematical basis of optimal shape design is stated and the suitability for amultilayer perceptron to solve this problems enunciated. Validation examples and more complexapplications are also included here.InAnnexAthesoftwaremodel ofFlood[61] isconstructedfollowingatop-downapproach.This provides dierent views of this class library, from the highest conceptual level to the details.In Annex B, and for the sake of completeness, some basics of numerical methods for integrationof functions, ordinary dierential equations and partial dierential equations are included. Theyare meant to be utilities to be used by the neural network when needed, in order to evaluate theobjective functional.This page is intentionally left blank.Chapter2PreliminariesThecalculusofvariations, alsoknownasfunctional dierentiation, isabranchofmathematicswhich gives methods for nding minimal and maximal values of functionals. In this way, problemsthat consist in nding extrema of functionals are called variational problems.In this Chapter we introduce the main concepts concerning optimum values of functionals. Wealso cite some methods to solve problems in the calculus of variations.2.1 ExtremevaluesoffunctionalsVariational problemsinvolvedeterminingafunctionforwhichaspecicfunctional takesonaminimumoramaximumvalue. Hereweintroducesomeconceptsconcerningfunctionals, theirvariations and their extreme values.VectorspacesTo begin, let us introduce the concept of vector space.Denition1(Vectorspace). AvectorspaceV isasetthatisclosedundertheoperationsofelement addition and scalar multiplication. That is, for any two elements u, v Vand any scalar R1. u +v V .2. v V .Example 1. The Euclidean n-space, denoted Rn, and consisting of the space of all n-tuples of realnumbers,(u1, u2, . . . , un),is a vector space. Elements of Rnare calledn-vectors. The operationofelementadditioniscomponentwise; theoperationofscalarmultiplicationismultiplicationoneach term separately.NormedvectorspacesInavectorspace, besidestheoperationsof elementadditionandscalarmultiplication, usuallythere is introduced a norm [56].Denition 2 (Normed vector space). A vector space Vis said to be normed if each element u Vis assigned a nonnegative number |u|, called the norm of u, such that78 CHAPTER2. PRELIMINARIES1. |u| = 0 if and only if u = 0.2. |u| = [[|u|.3. |u +v| |u| +|v|.Example 2. The Euclideann-space is anormedvector space. Anormof a n-vector u=(u1, u2, . . . , un) can be|u| =

ni=1u2i. (2.1)FunctionspacesHere we introduce the concept of function space, which is directly related to the concept of normedvector space [56].Denition3(Functionspace). Afunctionspaceisanormedvectorspacewhoseelementsarefunctions. Function spaces might be of innite dimension.Example3. ThespaceCn(xa, xb), consistingofall functionsy(x)denedonaclosedinterval[xa, xb]whichhaveboundedcontinuousderivativesuptoordern, isafunctionspaceofinnitedimensions. By addition of elements in Cnand multiplication of elements in Cnby scalars, we meanordinary addition of functions and multiplication of functions by numbers. A norm inCn(xa, xb)can be|y(x)|n =ni=0supx[xa,xb][y(i)(x)[, (2.2)where y(0)(x)denotesthefunctiony(x)itself andy(i)(x)itsderivativeof order i. Thus, twofunctions inCnare regarded as close together if the values of the functions themselves and of alltheirderivativesuptoorder nareclosetogether. Itiseasilyveriedthatall theaxiomsof anormed linear space are satised for the spaceCn.Two important Cnfunction spaces are C0, the space of continuous functions, and C1, the spaceof continuously dierentiable functions.Example 4. The spacePn(xa, xb), consisting of all polynomials of ordern dened on an interval[xa, xb], is a function space of dimensionn + 1. Elements inPnare of the formpn(x) =nk=0kxk. (2.3)The operations of element addition and scalar multiplication inPn(xa, xb) are dened just asin Example 3. Thel-norm is dened as|Pn|l =

nk=0[k[l

1/l, (2.4)2.1. EXTREMEVALUESOFFUNCTIONALS 9forl 1. This formula gives the special cases|pn|1 =nk=0[k[, (2.5)|pn|2 =

nk=0[k[2, (2.6)|pn| =max0kn([k[). (2.7)It is easily veried that all the axioms of a normed linear space are satised for the spacePn.FunctionalsBy a functional, we mean a correspondence which assigns a number to each function belonging tosome class [56].Denition 4 (Functional).Let Vbe some function space. A functional F[y(x)] is a correspondencewhich assigns a numberF R to each functiony(x) VF: V Ry(x) F[y(x)]Vis called the domain of the functional.Example5(Arclength). Let A=(xa, ya)andB=(xb, yb)betwopointsontheplaneandconsiderthecollectionof all functionsy(x) C1(xa, xb)whichconnectAtoB, i.e., suchthaty(xa) = yaandy(xb) = yb. The arc-lengthL of a curvey(x) is a functional. The valueL[y(x)] isgiven by the integralL[y(x)] =

BAds=

BA

dx2+dy2=

xbxa

1 + [y

(x)]2dx. (2.8)Figure 2.1 represents graphically the arc length functional.Example6(Sum squared error). Lety(x) be a given function, and consider a collection of datapoints(x1, y1), . . . , (xn, yn). Thesquarederror Eof thecurvey(x)withrespecttothepoints(x1, y1), . . . , (xn, yn) is a functional. The valueE[y(x)] is given byE[y(x)] =ni=1(y(xi) yi)2. (2.9)10 CHAPTER2. PRELIMINARIES0 0.2 0.4 0.6 0.8 100.20.40.60.81xyABL[y(x)]Figure2.1: Illustrationofthearclengthfunctional.0 0.2 0.4 0.6 0.8 100.20.40.60.81xyE[y(x)]Figure2.2: Illustrationofthesumsquarederrorfunctional.Figure 2.1 shows the sum squared error functional.Bothof theseexampleshaveonepropertyincommonthatisacharacteristicfeatureof allfunctionals: Givenafunctional F, toeachfunctiony(x) therecorresponds auniquenumberF[y(x)], just as when we have a functiony, to each numberx there corresponds a unique numbery(x).ContinuityoffunctionalsThe concept of continuity of functionals naturally appears in functional analysis [56].2.1. EXTREMEVALUESOFFUNCTIONALS 11Denition 5 (Continuous functional).Let Vbe some function space. A functional F[y(x)] denedonVis said to be continuous aty(x) Vif for any> 0 there is a> 0 such that, if for some y(x) V|y(x) y(x)| < (2.10)then[F[y(x)] F[ y(x)][ < . (2.11)In the sequel we shall consider only continuous functionals and, for brevity, we shall omit theword continuous.LinearityoffunctionalsLet us introduce the concept of linearity of functionals, which will be useful to us later [56].Denition 6 (Linear functional). Let Vbe some function space. A functional F[y(x)] dened onVis said to be linear if1. F[y(x)] = F[y(x)] for any R and anyy(x) V .2. F[y1(x) +y2(x)] = F[y1(x)] +F[y2(x)] for anyy1(x), y2(x) V .3. F[y(x)] is continuous for anyy(x) V .TheincrementofafunctionalInordertoconsiderthevariationofafunctional, wemustdenersttheconceptofincrement[53].Denition 7 (Increment of a functional). Let Vbe some function space and let Fbe a functionaldened onV . The increment ofF, denoted by F, is dened asF(y(x), y(x)) = F[y(x) +y(x)] F[y(x)], (2.12)for anyy(x), y(x) V .ThevariationofafunctionalWe are now ready for considering the variation (or dierential) of a functional [53].Denition 8 (Variation of a functional). Let Vbe some function space and let Fbe a functionaldened onV . Let write the increment ofFin the formF[y(x), y(x)] = F[y(x), y(x)] +G[y(x), y(x)]|y(x)|, (2.13)wherey(x), y(x) VandFis a linear functional. Iflimy(x)0G[y(x), y(x)] = 0, (2.14)thenFis said to be dierentiable ony(x) andFis called the variation (or dierential) ofFaty(x).12 CHAPTER2. PRELIMINARIESExtremaoffunctionalsNext we use the concept of variation to establish a necessary condition for a functional to have anextremum. We begin by introducing the concept of extrema of functionals [53].Denition 9 (Extremum of a functional). Let Vbe some function space and let Fbe a functionaldened onV . The functiony(x) is said to yield a relative minimum (maximum) forFif there isan > 0 such thatF[y(x)] F[y(x)] 0 ( 0), (2.15)for ally(x) Vfor which |y(x) y(x)| < .If Equation (2.15) is satised for arbitrarily large, thenF[y(x)] is a global minimum (maxi-mum). The functiony(x) is called an extremal, andF[y(x)] is referred to as an extremum.The fundamental theorem in the calculus of variations provides the necessary condition for afunction to be an extremal of a functional [53].Theorem1(Thefundamental theoreminthecalculusofvariations). LetV besomefunctionspace and letFbe a functional dened onV . Ify(x) Vis an extremal ofFthenF[y(x), y(x)] = 0, (2.16)for anyy(x) V .2.2 ThesimplestvariationalproblemWe shall now consider what might be called the simplest variational problem [36], which can beformulated as follows:Problem1(The simplest variational problem). LetVbe the space of all smooth functionsy(x)dened on an interval [xa, xb] which satisfy the boundary conditionsy(xa) =yaandy(xb) =yb.Find a functiony(x) Vfor which the functionalF[y(x)] =

xbxaF[x, y(x), y

(x)]dx, (2.17)dened onV , takes on a minimum or maximum value.In other words, the simplest variational problem consists of nding an extremum of a functionaloftheform(2.17), wheretheclassofadmissiblefunctionsconsistsofall smoothcurvesjoiningtwo points. Such variational problems can be solved analytically by means of the Euler-LagrangeEquation [36].Theorem2(Euler-Lagrange equation). LetVbe the space of all smooth functionsy(x) denedon an interval [xa, xb] which satisfy the boundary conditionsy(xa) =yaandy(xb) =yb,and letF[y(x)] be a functional dened onVof the formF[y(x)] =

xbxaF[x, y(x), y

(x)]dx (2.18)2.2. THESIMPLESTVARIATIONALPROBLEM 13Then, a necessary condition for the functiony(x) Vto be an extremum ofF[y(x)] is thatit satises the dierential equationFy ddxFy

= 0. (2.19)The Euler-Lagrange equation plays a fundamental role in the calculus of variations and it is,ingeneral, asecondorderdierentialequation. Thesolutionthenwilldependontwoarbitraryconstants, which can be determined from the boundary conditionsy(xa) = yaandy(xb) = yb.Example7(The brachistochrone problem). The statement of this problem is:Given two pointsA = (xa, ya) andB = (xb, yb) in a vertical plane, what is the curve traced outby a particle acted on only by gravity, which starts atA and reachesBin the shortest time?Figure 7 is a graphical statement of the brachistochrone problem.0 0.2 0.4 0.6 0.8 100.20.40.60.81xyABy(x)dsFigure2.3: Thebrachistochroneproblemstatement.The time to travel from pointA to pointBis given by the integralt =

BAdsv(s), (2.20)where s is the arc length and v is the speed. The speed at any point is given by a simple applicationof conservation of energy,mgya = mgy + 12mv2, (2.21)whereg = 9.81 is the gravitational acceleration. This equation gives14 CHAPTER2. PRELIMINARIESv =

2g(yay). (2.22)Plugging this into Equation (2.20), together with the identityds = dx2+dy2= 1 + [y

(x)]2dx, (2.23)yieldst =12g

xbxa

1 + [y

(x)]2yay(x)dx. (2.24)Inthisway,thebrachistochroneproblemcanbeformulatedfromavariationalpointofviewas:Let Vbe the space of all smooth functions y(x) dened on an interval [xa, xb], which are subjectto the boundary conditionsy(xa) =yaandy(xb) =yb. Find a functiony(x) Vfor which thefunctionalT[y(x)] =12g

xbxa

1 + [y

(x)]2yay(x)dx, (2.25)dened onV , takes on a minimum value.The functional to be varied is thusT[y(x), y

(x)] =

1 +y

(x)2yay(x) dx. (2.26)Making use of the Euler-Lagrange equation, the set of parametric equations (x, y)() for thebrachistochrone are given byx() = xa +r( sin ()), (2.27)y() = yar(1 cos ()), (2.28)for [a, b]. These are the equations of a cycloid, which is the locus of a point on the rim of acircle rolling along a horizontal line. By adjustments of the constants a, b and r, it is possible toconstruct one and only one cycloid which passes trough the points (xa, ya) and (xb, yb).Equations (2.27) and (2.28) provide a descent time which is an absolute minimum when com-pared with all other arcs. This descent time can be obtained asT[(x, y)()] =

12g

b0

1 +sin2()(1 cos ())2r(1 cos ())r(1 cos ())d=

rgb. (2.29)2.3. THESIMPLESTCONSTRAINEDVARIATIONALPROBLEM 15Taking, for example, the points A and B to be A = (0, 1) and B = (1, 0), Equations (2.27) and(2.28) becomex() = 0.583( sin ()), (2.30)y() = 1 0.583(1 cos ()), (2.31)for [0, 2.412]. Figure 2.4 shows the shape of the brachistochrone for this particular example.0 0.2 0.4 0.6 0.8 100.20.40.60.81xyFigure2.4: Analyticalsolutiontothebrachistochroneproblem.Finally, Equation (2.29) gives the descent time for this chuteT[(x, y)()] = 0.577. (2.32)2.3 ThesimplestconstrainedvariationalproblemIn the simplest variational problem we have considered so far, the class of admissible curves wasspecied, apart from smoothness requirements, by boundary conditions. However, there are manyapplications of the calculus of variations in which constraints are imposed on the admissible curves.Such constraints are expressed as functionals.Problem2(Thesimplestconstrainedvariational problem). LetV bethespaceofall smoothfunctions y(x) dened on an interval [xa, xb] which satisfy the boundary conditions y(xa) = ya andy(xb) = yb, and suchC[y(x)] =

xbxaC[x, y(x), y

(x)]dx= 0. (2.33)16 CHAPTER2. PRELIMINARIESFind a functiony(x) Vfor which the functional dened onVF[y(x)] =

xbxaF[x, y(x), y

(x)]dx (2.34)takes on a minimum or a maximum value.In other words, the simplest constrained variational problem consists of nding an extremumofafunctionaloftheform(2.34), wheretheclassofadmissiblefunctionsconsistsofallsmoothcurves joining two points and satisfying some constraint of the form (2.33).Themostcommonprocedureforsolvingthesimplestconstrainedvariational problemistheuseof Lagrangemultipliers [36]. Inthis method, theconstrainedproblemis convertedtoanunconstrained problem. The resulting unconstrained problem can then be solved using the Euler-Lagrange equation.Theorem3(Lagrange multipliers). LetVbe the space of all smooth functionsy(x) dened onan interval [xa, xb] which satisfy the boundary conditionsy(a) = ya,y(b) = yb, and suchC[y(x)] =

xbxaC[x, y(x), y

(x)]dx= 0. (2.35)Lety(x) be an extremal for the functional dened onVF[y(x)] =

xbxaF[x, y(x), y

(x)]dx. (2.36)Then, there exists a constant such thaty(x) is also an extremal of the functionalF[y(x)] =

xbxa

F[x, y(x), y

(x)] +C[x, y(x), y

(x)]

dx=

xbxaF[x, y(x), y

(x)]dx. (2.37)The constant is called the Lagrange multiplier.The simplest constrainedvariational problemcanthenbe solvedbymeans of the Euler-Lagrange equation for the functionalF,Fy ddxFy

= 0. (2.38)Example 8 (The isoperimetric problem).Didos problem, which is also known as the isoperimetricproblem, is be solved by means of Lagrange multipliers and the Euler-Lagrange equation:Of all simple closed curves in the plane of a given lengthl, which encloses the maximum area?Figure 8 states graphically this problem.2.3. THESIMPLESTCONSTRAINEDVARIATIONALPROBLEM 170.3 0.2 0.1 00.20.150.10.0500.050.10.150.2xylFigure2.5: Theisoperimetricproblemstatement.Herewecannotuseafunctiony(x)tospecifythecurvesinceclosedcurveswillnecessarilymake the function multi-valued. Instead, we use the parametric equationsx = x(t) andy = y(t),fort [0, 1], and such thatx(0) = x(1),y(0) = y(1) (where no further intersections occur).For a plane curve specied in parametric equations as (x, y)(t), the arc length is given byl =

10

[x

(t)]2+ [y

(t)]2dt. (2.39)On the other hand, Greens theorem gives the signed area asa =12

10(x(t)y

(t) x

(t)y(t)) dt. (2.40)Thus, the isoperimetric problem can be formulated from a variational point of view as follows:LetVbe the space of all smooth functions (x, y)(t) dened on the interval [0, 1], which satisfythe boundary conditionsx(0) = 0,x(1) = 0,y(0) = 0,y(1) = 0, and suchL[(x, y)(t)] =

10

[x

(t)]2+ [y

(t)]2dt l= 0. (2.41)Find a function (x, y)(t) Vfor which the functional dened onVA[(x, y)(t)] =12

10(x(t)y

(t) x

(t)y(t)) dt (2.42)takes on a maximum value.18 CHAPTER2. PRELIMINARIESWe can reformulate this constrained variational problem as an unconstrained variational prob-lem by means of Lagrange multipliers:LetVbe the space of all smooth functions (x, y)(t) dened on an interval [0, 1], which satisfythe boundary conditionsx(0) = 0,x(1) = 0,y(0) = 0,y(1) = 0. Find a function (x, y)(t) Vforwhich the functional dened onVF[(x, y)(t)] =

10

[x

(t)]2+ [y

(t)]2+(x(t)y

(t) x

(t)y(t))

dt (2.43)takes on a maximum value.The functional to be varied is thereforeF[x(t), x

(t), y(t), y

(t)] = [x

(t)]2+ [y

(t)]2+ (x(t)y

(t) x

(t)y(t)). (2.44)IntegratingtheEuler-Lagrangeequationhereacirclewithradius r is obtained. Aset ofparametric equations (x, y)(t) for the circle is given byx(t) = xa +r cos (2t), (2.45)y(t) = xb +r sin (2t), (2.46)fort [0, 1]. By adjustments of the constantsa, b andrit is possible to construct a circle withperimeterl which satises the boundary conditions (x(0), y(0)) = (0, 0) and (x(1), y(1)) = (0, 0).Taking, for instance, the perimeter of the circle to be l = 1, Equations (2.45) and (2.46) becomex(t) = 12 +12 cos (2t), (2.47)y(t) =12 sin (2t), (2.48)fort [0, 1]. Figure 8 shows the shape of the circle for this particular example.The area of such a circle isA[(x, y)(t)] =14. (2.49)2.4 DirectmethodsinvariationalproblemsWhile many variational problems can be solved analytically by means of the Euler-Lagrange equa-tion, the only practical technique for general variational problems is to approximate the solutionusing direct methods [29].Thefundamental ideaunderlyingthesocalleddirect methods is toconsider avariationalproblem as a limit problem for some function optimization problem in many dimensions. This newproblem is then solved by usual methods. In this way, the dierence between variational problemsand function optimization problems is in the number of dimensions. Variational problems entailinnite dimensions, while function optimization problems involve nite dimensions.In this section we describe two direct methods, the Euler method and the Ritz method.2.4. DIRECTMETHODSINVARIATIONALPROBLEMS 190.3 0.2 0.1 00.20.150.10.0500.050.10.150.2xyFigure2.6: Analyticalsolutiontotheisoperimetricproblem.TheEulermethodThe rst direct method studied here is the Euler method, also called the nite dierences method[29]. In order to understand the Euler method, consider the simplest variational problem:LetVbe the space of all smooth functionsy(x) dened on an interval [a, b] which satisfy theboundary conditionsy(xa) =ya, y(xb) =yb. Find a functiony(x) Vfor which the functionaldened onVF[y(x)] =

xbxaF[x, y(x), y

(x)]dx (2.50)takes on a minimum or a maximum value.The main idea of the Euler method is that the values of the functionalF[y(x)] are considerednotalongalladmissiblecurvesy(x) V ,butalongpolygonalcurveswhichconsistofanumbern 1 of line segments with vertices at points(xa, ya), (x1, y1), . . . , (xn, yn), (xb, yb)wherexi = xa +ih, fori = 1, . . . , n and beingh = (xbxa)/(n + 1), see Figure 2.7.Alongsuchpolygonal curvesthefunctional F[y(x)] turnsintoafunctionf(y1, . . . , yn). Theproblemisthentochoosey1, . . . , ynsothatthefunctionf(y1, . . . , yn)hasanextremum. Thenecessary condition for the ordinatesy1, . . . , ynto be an extremal offis that they satisfyf(y1, . . . , yn) =

fy1, . . . ,fyn

= 0. (2.51)By doing so we shall have a polygonal curve which is an approximate solution to the variationalproblem in question.20 CHAPTER2. PRELIMINARIES0 0.2 0.4 0.6 0.8 100.20.40.60.81xyABy(x;y1,...yn)(x1,y1)(xn,yn)Figure2.7: IllustrationoftheEulermethod.TheRitzmethodTheseconddirectmethodstudiedinthissectionistheRitzmethod[29].Asbefore, inordertounderstand the Ritz method, consider the simplest variational problem:LetVbe the space of all smooth functionsy(x) dened on an interval [a, b] which satisfy theboundary conditionsy(xa) =ya, y(xb) =yb. Find a functiony(x) Vfor which the functionaldened onVF[y(x)] =

xbxaF[x, y(x), y

(x)]dx (2.52)takes on a minimum or a maximum value.TheideaoftheRitzmethodisthatthevaluesofthefunctional F[y(x)] areconsiderednotalong all admissible curvesy(x) V , but only along all possible linear combinations of a certainsequence ofn functionsg1(x), . . . , gn(x),y(x) =ni=1igi(x). (2.53)Theelements inthefunctionspacedenedbyEquation(2.53) must satisfytheboundaryconditionsforagivenproblem, whichisarestrictiononthechoiceofthesequenceoffunctionsgi(x). Along such linear combination the functionalF[y(x)] becomes a function of the coecientsf(1, . . . , n). The problem is the to choose1, . . . , nso that the functionf(1, . . . , n) has anextremum. The necessary condition for the coecients1, . . . , nto be an extremal offis thatthey satisfyf(1, . . . , n) =

f1, . . . ,fn

= 0. (2.54)2.4. DIRECTMETHODSINVARIATIONALPROBLEMS 21By doing so we shall obtain a curve which is an approximate solution to the variational problemin question. The initial choice of the functions g1, . . . , gn, which are called coordinate functionsisof greetimportance, andthereforeasuccessful applicationof Ritzsmethoddependsonanadequate choice of the coordinate functions. Figure 2.8 illustrates this method.0 0.2 0.4 0.6 0.8 100.20.40.60.81xyBAy(x;1,,n)Figure2.8: IllustrationoftheRitzmethod.This page is intentionally left blank.Chapter3AvariationalformulationforthemultilayerperceptronTherearemanydierenttypesof neural networks. Themultilayerperceptronisanimportantone, and most of the literature in the eld is referred to this neural network. In this Chapter, thelearning problem in the multilayer perceptron is formulated from a variational point of view. Inthis way, learning tasks lie in terms of nding a function which causes some functional to assumeanextremevalue. Asweshall see, themultilayerperceptronprovidesageneral frameworkforsolving variational problems.3.1 IntroductionThemultilayerperceptronischaracterizedbyaneuronmodel, anetworkarchitectureandas-sociatedobjectivefunctionalsandtrainingalgorithms. Thisfourconceptsarebrieydescribednext.1. Neuron model . A neuron model is a mathematical model of the behavior of a single neuronin a biological nervous system. The characteristic neuron model in the multilayer perceptronis the so called perceptron. The perceptron neuron model receives information in the formofasetofnumerical inputsignals. Thisinformationisthenintegratedwithasetoffreeparameters to produce a message in the form of a single numerical output signal.2. Networkarchitecture. Inthesamewayabiological nervoussystemiscomposedof inter-connected biological neurons, an articial neural network is built up by organizing articialneuronsinanetworkarchitecture. Inthisway,thearchitectureofanetworkreferstothenumberofneurons, theirarrangementandconnectivity. Thecharacteristicnetworkarchi-tecture in the multilayer perceptron is the so called feed-forward architecture.3. Objective functional . The objective functional plays an important role in the use of a neuralnetwork. Itdenesthetasktheneural networkisrequiredtodoandprovidesameasureofthequalityoftherepresentationthatthenetworkisrequiredtolearn. Thechoiceofasuitable objective functional depends on the particular application.4. Training algorithm. The procedure used to carry out the learning process is called trainingalgorithm, orlearningalgorithm. Thetrainingalgorithmisappliedtothenetworktoin2324 CHAPTER3. AVARIATIONALFORMULATIONFORTHEMULTILAYERPERCEPTRONordertoobtainadesiredperformance. Thetypeoftrainingisdeterminedbythewayinwhich the adjustment of the free parameters in the neural network takes place.Figure 3.1 depicts an activity diagram for the learning problem in the multilayer perceptron.The solving approach here consists of three steps. The rst step is to choose a suitable parame-terized function space in which the solution to the problem is to be approximated. The elementsof this family of functions are those spanned by a multilayer perceptron. In the second step thevariational problem is formulated by selecting an appropriate objective functional, dened on thefunction space chosen before. The third step is to solve the reduced function optimization problem.This is performed with a training algorithm capable of nding an optimal set of parameters.define function spaceMultilayer Perceptronformulate variational problemObjective Functionalsolve reduced function optimization problemTraining AlgorithmFigure3.1: Activitydiagramforthelearningprobleminthemultilayerperceptron.3.2 TheperceptronneuronmodelAswehavesaid, aneuronmodel isthebasicinformationprocessingunitinaneural network.The perceptron is the characteristic neuron model in the multilayer perceptron. Following currentpractice [98], the term perceptron is here applied in a more general way than by Rosenblatt, andcovers the types of units that were later derived from the original perceptron.3.2.1 TheelementsoftheperceptronThe block diagram in Figure 3.2 is a graphical representation of the perceptron. Here we identifythree basic elements, which transform the input signals (x1, . . . , xn) in a single output signal y [13]:Aset of freeparameters , whichconsists of abias bandavector of synapticweights(w1, . . . , wn).Acombinationfunctionh, whichcombinestheinput signalsandthefreeparameterstoproduce a single net input signalu.3.2. THEPERCEPTRONNEURONMODEL 25An activation function or transfer functiong, which takes as argument the net input signaland produces the output signal.hw1x1xnb1...yw2wngx2uFigure3.2: Perceptronneuronmodel.Next we describe in more detail the three basic elements of this neuron model [41]:FreeparametersThe free parameters allow a neuron model to be trained to perform a task. In the perceptron, theset of free parameters is: = (b, w) RRn(3.1)wherebiscalledthebiasandw= (w1, . . . , wn)iscalledthesynapticweightvector. Notethenthat the number of free parameters of this neuron model is 1 +n, where n is the number of inputsin the neuron.CombinationfunctionIntheperceptron, thecombinationfunctionhcomputestheinnerproductof theinputvectorx = (x1, . . . , xn) and the synaptic weight vectorw = (w1, . . . , wn) to produce a net input signalu. Thismodelalsoincludesabiasexternallyapplied, denotedbyb, whichincreasesorreducesthenetinputsignal totheactivationfunction, dependingonwhetheritispositiveornegative,respectively. The bias is often represented as a synaptic weight connected to an input xed to +1,h(x; b, w) = b +ni=1wixi. (3.2)ActivationfunctionTheactivationfunctiongdenestheoutputsignal yfromtheneuronintermsofitsnetinputsignal u. In practice we can consider many useful activation functions [27]. Three of the most usedactivation functions are the threshold function, the linear function and the sigmoid function [41].26 CHAPTER3. AVARIATIONALFORMULATIONFORTHEMULTILAYERPERCEPTRONThresholdfunction. For this activation function, represented in Figure 3.3 we haveg(u) =

1 u < 01 u 0(3.3)1 0.5 0 0.5 110.500.51uyFigure3.3: Athresholdactivationfunction.Thatis,thethresholdactivationfunctionlimitstheoutputoftheneuronto 1ifthenetinputu is less than 0; or 1 ifu is equal to or greater than 0.There might be some cases when we need to compute the activation derivative of the neuron,g

(u) dg(u)du. (3.4)or its second derivative,g

(u) d2g(u)du2. (3.5)The problem with the threshold activation function is that it is not dierentialbe at the pointu = 0.Sigmoidfunction. The sigmoid function is the most used activation function when constructingneural networks. Is an monotonous crescent function which exhibits a good balance betweenalinearandanon-linearbehavior. Anexampleof asigmoidfunctionisthehyperbolictangent function, dened byg(u) = tanh (u). (3.6)3.2. THEPERCEPTRONNEURONMODEL 272 1 0 1 210.500.51uyFigure3.4: Asigmoidactivationfunction.The hyperbolic tangent function is represented in Figure 3.4.For this sigmoid function, the activation derivative is given byg

(u) = 1 tanh2(u), (3.7)and the second derivative of the activation is given byg

(u) = 2 tanh (u)(1 tanh2(u)). (3.8)Linearfunction. For the linear activation function, described in Figure 3.5 we haveg(u) = u. (3.9)Thus, the output signal of a neuron model with linear activation function is equal to its netinput.For the linear function, the activation derivative is given byg

(u) = 1, (3.10)and the second derivative of the activation is given byg

(u) = 0. (3.11)28 CHAPTER3. AVARIATIONALFORMULATIONFORTHEMULTILAYERPERCEPTRON1 0.5 0 0.5 110.500.51uyFigure3.5: Alinearactivationfunction.3.2.2 TheperceptronfunctionspaceMathematically, a perceptron neuron model may be viewed as a parameterized function spaceVfrom an input X Rnto an output Y R. Elements of Vare parameterized by the bias and thevector of synaptic weights of the neuron. In this way the dimension ofVisn + 1.Wecanwritedownanexpressionfortheelementsofthefunctionspacewhichaperceptroncan dene [15]. The net input to the neuron is obtained by rst forming a linear combination ofthe input signals and the synaptic weights, and then adding the bias, to giveu = b +ni=1wixi. (3.12)The output of the neuron is obtained transforming the linear combination in Equation (3.12)with an activation functiong to givey(x; b, w) = g

b +ni=1wixi

. (3.13)Distinct activation functions cause distinct families of functions which a perceptron can dene.Similarly, distinctsetsoffreeparameterscausedistinctelementsinthefunctionspacewhichaspecic perceptron denes. The concepts of linear and sigmoid perceptron are implemented withinthe classes LinearPerceptron and SigmoidPerceptron of the C++ library Flood [61].Although a single perceptron can solve some simple learning tasks,the power of neural com-putation comes from connecting many neurons in a network architecture [98].3.3 ThemultilayerperceptronnetworkarchitectureNeurons can be combined to form a neural network. The architecture of a neural network refersto the number of neurons, their arrangement and connectivity. Any network architecture can be3.3. THEMULTILAYERPERCEPTRONNETWORKARCHITECTURE 29symbolizedasadirectedandlabeledgraph, wherenodesrepresentneuronsandedgesrepresentconnectivities among neurons. An edge label represents the free parameter of the neuron for whichthe ow goes in [13].Most neural networks, even biological neural networks, exhibit a layered structure. In this worklayers are the basis to determine the architecture of a neural network [98]. Thus, a neural networktypically consists on a set of sensorial nodes which constitute the input layer, one or more hiddenlayers of neurons and a set of neurons which constitute the output layer.As it was said above, the characteristic neuron model of the multilayer perceptron is the per-ceptron. On the other hand, the multilayer perceptron has a feed-forward network architecture.3.3.1 Feed-forwardarchitecturesFeed-forward architectures contain no cycles, i.e., the architecture of a feed-forward neural networkcan then be represented as an acyclic graph. Hence, neurons in a feed-forward neural network aregrouped into a sequence of c +1 layers L(1), . . . , L(c+1), so that neurons in any layer are connectedonlytoneuronsinthenextlayer. TheinputlayerL(0)consistsof nexternalinputsandisnotcounted as a layer of neurons;the hidden layersL(1), . . . , L(c)containh1, . . . , hchidden neurons,respectively; andtheoutput layer L(c+1)is composedof moutput neurons. Communicationproceedslayerbylayerfromtheinputlayerviathehiddenlayersuptotheoutputlayer. Thestates of the output neurons represent the result of the computation [98].Figure 3.6 shows a feed-forward architecture,withn inputs, c hidden layers withhineurons,fori = 1, . . . , c, andm neurons in the output layer.y1(1)yh1(1)y1(c+1)ym(c+1)x1xny1...ymy1(c)yhc(c)y2(1)y2(c)x2...y2(c+1).... . .. . .. . .y2......Figure3.6: Thefeed-forwardnetworkarchitecture.Inthisway,inafeed-forwardneuralnetwork,theoutputofeachneuronisafunctionoftheinputs. Thus, given an input to such a neural network, the activations of all neurons in the outputlayer can be computed in a deterministic pass [15].30 CHAPTER3. AVARIATIONALFORMULATIONFORTHEMULTILAYERPERCEPTRON3.3.2 ThemultilayerperceptronfunctionspaceIn Section 3.2 we considered the space of functions that a perceptron neuron model can dene. Asit happens with a single perceptron, a multilayer perceptron neural network may be viewed as aparameterized function spaceVfrom an inputX Rnto an outputY Rm. Elements ofVareparameterizedbythe biasses and synapticweights intheneuralnetwork,whichcan be groupedtogetherinas-dimensional vector=(1, . . . , s). ThedimensionofthefunctionspaceV istherefores.Figure 3.7 shows a multilayer perceptron, with n inputs, one hidden layer with h1 neurons andm neurons in the output layer. Biases in the hidden layer are represented as synaptic weights froman extra input with a xed value ofx0 = 1. Similarly, biases in the output layer are representedas synaptic weights from an extra hidden neuron, with a xed activation also xed toy(1)0= 1.y1(1)yh1(1)y1(2)ym(2)w11(1)wh11(1)w1n(1)wh1 n(1)w11(2)wm1(2)wm h1(2)w1 h1(2)x1xn...b1(1)bh1(1)b1(2)bm(2)y11111......ymy2(1)b2(1)1y2(2)y1x2w21(1)w21(1)w21(1)wh11(1)w11(2)w11(2)w11(2)w11(2)b1(2)1Figure3.7: Atwo-layerperceptron.We can write down the analytical expression for the elements of the function space which themultilayerperceptronshowninFigure3.7candene[15]. Indeed, thenetinputtothehiddenneuronjis obtained rst by forming a linear combination of then input signals, and adding thebias, to give3.3. THEMULTILAYERPERCEPTRONNETWORKARCHITECTURE 31u(1)j= b(1)j+ni=1w(1)jixi=ni=0(1)jixi, (3.14)for j =1, . . . , h1. (1)j0denotesthebiasof neuronj inlayer(1), forwhichx0=1; (1)ji, fori = 1, . . . , n, denotes the synaptic weight of neuronjin layer (1) which comes from inputi.TheoutputofhiddenneuronjisobtainedtransformingthelinearcombinationinEquation(3.14) with an activation functiong(1)to givey(1)j= g(1)

u(1)j

, (3.15)forj=1, . . . , h1. Theneural networkoutputsareobtainedtransformingtheoutputsignalsofthe neurons in the hidden layer by the neurons in the output layer. Thus, the net input for eachoutput neuronkis obtained forming a linear combination of the output signals from the hiddenlayer neurons of the formu(2)k= b(2)k+h1j=1w(2)kjy(1)j=h1j=0(2)kj y(1)j, (3.16)fork=1, . . . , m. Thevalue(2)k0denotesthebiasofneuronkinlayer(2), forwhichy(1)0=1;similarly, the value(2)kj ,j = 1, . . . , h1, denotes the synaptic weight of neuronk in layer (2) whichcomes from inputj.Theactivationof theoutput neuronkis obtainedtransformingthelinear combinationinEquation (3.16) with an activation functiong(2)to giveyk = g(2)

u(2)k

, (3.17)fork= 1, . . . , m. Notethattheactivationfunctionintheoutputlayerdoesnotneedtobethesame than for that in the hidden layer.If we combine (3.14), (3.15), (3.16) and (3.17) we obtain an explicit expression for the functionrepresented by the neural network diagram in Figure 3.7 of the formyk = g(2)

h1j=0(2)kj g(1)

ni=0(1)jixi

, (3.18)fork = 1, . . . , m. In this way, the multilayer perceptron can be considered as a function of manyvariables composed by superposition and addition of functions of one variable. The neural network32 CHAPTER3. AVARIATIONALFORMULATIONFORTHEMULTILAYERPERCEPTRONshowninFigure3.7corresponds toatransformationof theinput variables bytwosuccessivedierentnetworkswithasinglelayer. Thisclassof networkscanbeextendedbyconsideringnew successive transformation of the same kind, which corresponds to networks with more hiddenlayers.Distinct activation functions cause distinct families of functions which a multilayer perceptroncan dene. Similarly, distinct sets of free parameters cause distinct elements in the function spacewhich a specic multilayer perceptron denes.A multilayer perceptron with a sigmoid hidden layer and a linear output layer is implementedwithin the C++ class MultilayerPerceptron of Flood [61].3.3.3 TheJacobianmatrixThere are some cases when, in order to evaluate the objective functional of a multilayer perceptron,weneedtocomputethederivativesof thenetworkoutputswithrespecttotheinputs. Thatderivatives can be grouped together in the Jacobian matrix, whose elements are given byJki ykxi, (3.19)fori=1, . . . , n, k=1, . . . , mandwhereeachsuchderivativeisevaluatedwithallotherinputsheld xed.In Sections 5.2 and 5.4 we make use of the Jacobian matrix to evaluate two dierent objectivefunctionals for the multilayer perceptron.TheJacobianmatrixcanbeevaluatedeitherbyusingaback-propagationprocedure, orbymeans of numerical dierentiation.Theback-propagationalgorithmforthecalculusoftheJacobianmatrixHereweevaluatetheJacobianmatrixforthemultilayerperceptronusingtheback-propagationalgorithm. Results are derived for the one hidden layer perceptron shown in Figure 3.7, but theyare easily generalized to networks with several hidden layers of neurons.We start by writing the elementJkiin the formJki=ykxi=h1j=1yku(1)ju(1)jxi, (3.20)fori = 1, . . . , n andk = 1, . . . , m. In order to determine the derivativeu(1)j/xilet us considerthe net input signal for each neuron in the hidden layeru(1)j=ni=0(1)jixi, (3.21)forj = 1, . . . , h1. Thus, the mentioned derivative yields3.3. THEMULTILAYERPERCEPTRONNETWORKARCHITECTURE 33u(1)jxi= (1)ji, (3.22)fori = 1, . . . , n andj = 1, . . . , h1. Equation (3.20) becomesJki =h1j=1(1)ijyku(1)j, (3.23)fori = 1, . . . , nandk= 1, . . . , m. Wenowwritedownarecursiveback-propagationformulatodetermine the derivativesyk/u(1)j.yku(1)j=ml=1yku(2)lu(2)lu(1)j, (3.24)for j = 1, . . . , h1 and k = 1, . . . , m. The derivative u(2)l/u(1)jcan be calculated by rst consideringthe net input signal of each neuron in the output layeru(2)l=h1j=0(2)ljy(1)j, (3.25)forl = 1, . . . , m. The activation of each neuron in the hidden layer is given byy(1)j= g(1)(u(1)j), (3.26)forj=1, . . . , h1. Sowecanwriteanexpressionforthenetinputsignal ofeachneuronintheoutput layer in the formu(2)l=h1j=0(2)ljg(1)(u(1)j), (3.27)forl = 1, . . . , m. Thus, the derivativeu(2)l/u(1)jyieldsu(2)lu(1)j= w(2)ljg(1)(u(1)j), (3.28)forj = 1, . . . , h1andl = 1, . . . , m. Equation (3.21) becomesyku(1)j= g

(u(1)j)ml=1(2)ljy(2)ku(2)l, (3.29)34 CHAPTER3. AVARIATIONALFORMULATIONFORTHEMULTILAYERPERCEPTRONforj = 1, . . . , h1andk = 1, . . . , m. For the output neurons we haveyk = g(2)(u(2)k), (3.30)fork = 1, . . . , m, and from whichyku(2)k

= g(2)(u(2)k)kk . (3.31)kk is the Kronecker delta symbol, and equals 1 ifk =k

and 0 otherwise. We can thereforesummarize the procedure for evaluating the Jacobian matrix as follows [15]:1. ApplytheinputvectorcorrespondingtothepointininputspaceatwhichtheJacobianmatrix is to be found, and forward propagate in the usual way to obtain the activations ofall of the hidden and output neurons in the network.2. For each row k of the Jacobian matrix, corresponding to the output neuron k, back-propagateto the hidden units in the network using the recursive relation (3.29), and starting with (3.31).3. Use(3.23)todotheback-propagationtotheinputs. Thesecondandthirdstepsaretherepeated for each value ofk, corresponding to each row of the Jacobian matrix.A C++ software implementation of this algorithm can be found inside the class MultilayerPerceptronof Flood [61].NumericaldierentiationforthecalculusoftheJacobianmatrixTheJacobianmatrixfor themultilayer perceptronJkicanalsobeevaluatedusingnumericaldierentiation[15]. Thiscanbedonebyperturbingeachinputinturn, andapproximatingthederivatives by using forward dierences,ykxi=yk(xi +) yk(xi)

+O(), (3.32)fori = 1, . . . , n,k = 1, . . . , m and for some small numerical value of.Theaccuracyofthenitedierencesmethodcanbeimprovedsignicantlybyusingcentraldierences of the formykxi=yk(xi +) yk(xi)2+O(2), (3.33)also fori = 1, . . . , n,k = 1, . . . , m and for some small numerical value of.In a software implementation the Jacobian matrix for the multilayer perceptronJkishould beevaluated using the back-propagation algorithm, since this gives the greatest accuracy and numer-ical eciency [15]. The implementation of such algorithm should be chec