[lecture notes in computer science] artificial intelligence and computational intelligence volume...

Multilayer Perceptron Network with ModifiedSigmoid Activation Functions

Tobias Ebert, Oliver Banfer, and Oliver Nelles

University of Siegen,Department of Mechanical Engineering,

D-57068 Siegen, [email protected]

Abstract. Models in today’s microcontrollers, e.g. engine control units,are realized with a multitude of characteristic curves and look-up ta-bles. The increasing complexity of these models causes an exponentialgrowth of the required calibration memory. Hence, neural networks, e.g.multilayer perceptron networks (MLP), which provide a solution for thisproblem, become more important for modeling. Usually sigmoid func-tions are used as membership functions. The calculation of the thereforenecessary exponential function is very demanding on low performancemicrocontrollers. Thus in this paper a modified activation function forthe efficient implementation of MLP networks is proposed. Their advan-tages compared to standard look-up tables are illustrated by the appli-cation of an intake manifold model of a combustion engine.

Keywords: nonlinear system identification, neural network, multilayerperceptron network, modified activation function.

1 Introduction

The requirements of modern vehicles with respect to fuel consumption, emissionsand driveability can be fulfilled only with increasing variability of the combus-tion engine. The corresponding actuators are driven by the engine control unit(ECU), which requires models of growing complexity. For the model of the cylin-der air mass an exemplary comparison of manifold absolute pressure (map) basedmethods and a neural network is performed. Both approaches benefit from tak-ing into account prior knowledge about the physics of the modelled system forthe model generation. While map-based methods are the best choice for simplesystems with few degrees of freedom, the modeling effort grows exponentiallywith increasing complexity for this method. Thus, for complex models, i.e. formodels with more than five inputs, neural networks such as multilayer percep-tron (MLP) networks are superior. Furthermore, some functions in the ECU,e.g. the intake manifold model, face the challenge that the outputs of the modelhave to be computed in real-time, because the results are necessary for eachcycle of the combustion engine. Thus an efficient implementation of the MLPon the inbuilt microcontroller is of particular importance. The computation of

F.L. Wang et al. (Eds.): AICI 2010, Part I, LNAI 6319, pp. 414–421, 2010.c© Springer-Verlag Berlin Heidelberg 2010

Multilayer Perceptron Network with Modified Sigmoid Activation Functions 415

the standard activation function (AF) is not possible, because the calculationof an exponential function is too demanding for the microcontroller. The onlyway to implement this standard AF could be realized by storing this functionin a look-up table (LUT). A more promising method is to replace the sigmoidwith a polynom of similar form. This has the crucial advantage that the complexcomputation of the exponential function is reduced to a simple multiplicationfor the polynomial function.

This article is organized as follows. Section 2 gives an overview on multilayerperceptron networks and specifies their advantages. In Sect. 3 the standard sig-moid activation function and a modified activation function, consisting of piece-wise polynomial functions, are introduced and their properties are discussed.The modified activation function is implemented in an intake manifold modeland the results are discussed in Sect. 4. This paper ends by summarizing theimportant conclusions.

2 Multilayer Perceptron Networks

A MLP network is an universal approximator [5] and consists of simple cal-culation units, the neurons, and directed, weighted connections between them.These neurons are arranged in so called layers. Usually a perceptron networkconsists of a layer inputs and one or more neuron layers with trainable weights.A perceptron network with a single trainable layer can divide the input spaceby means of a hyperplane. A two-stage perceptron can classify convex polygons.Finally a three-stage perceptron can classify sets of any form by combining andseparating arbitrarily many convex polygons. Hence a MLP with one input, twohidden and one output layer, as shown in Fig. 1, further described in [6], is acommon form.

Fig. 1. A feedforward MLP Network with three trainable layers: Two hidden layersand one output layer

Figure 2 shows a neuron of the hidden layer of a multilayer perceptron.This single neuron is called a perceptron. The operation of this neuron can besplited into two parts. First, ridge construction is used to project the inputsu = [u1 u2 · · · up]T on the weights. In the second part, the nonlinear acti-vation function g(x) transforms the projection result. The perceptron depicted

416 T. Ebert, O. Banfer, and O. Nelles

Fig. 2. A perceptron: The ith hidden neuron of a multilayer perceptron

in Fig. 2 depends on nonlinear hidden layer parameters. These parameters arecalled hidden layer weights :

θ(nl)i = [wi0 wi1 wi2 · · · wip]T . (1)

The weights wi0 realize an offset, sometimes called “bias” or “threshold”. Thesehidden layer weights determine the shape of the basis functions.

With M1 and M2 as the number of neurons in the first and second hiddenlayer, respectively, wi as output layer weights, and w

(1)jl and w

(2)ij as weights of

the first and second hidden layer, the basis function formulation becomes

y =M2∑

i=0

wiΦi

⎛

⎝M1∑

j=0

w(2)ij ξj

⎞

⎠ with Φ0(·) = 1 , (2)

and with the outputs of the first hidden layer neurons

ξj = Ψj

(p∑

l=0

w(1)jl ul

)with Ψ0(·) = 1 and u0 = 1 . (3)

Usually the activation functions of both hidden layer Φi and Ψj are chosen tobe off saturation type, see Sect 3. With p as the number of inputs and q as thenumber of outputs, the number of trainable weights of a MLP with two hiddenlayers is

M1 (p + 1) + M2 (M1 + 1) + (M2 + 1) q . (4)

Owing to the term M1M2 the number of weights grows quadratically with anincreasing number of hidden layer neurons.

3 Activation Function

Typically, the activation function of hidden layer neurons is chosen to be ofsaturation type. Common choices are sigmoid functions such as the hyperbolictangent

g(x) = tanh(x) =1 − exp(−2x)1 + exp(−2x)

. (5)


This function exhibts the interesting property that the derivative can be ex-pressed as a simple function of its output:

dΦi

dx=

d tanh(x)dx

=1

cosh2(x)=

cosh2(x) − sinh2(x)cosh2(x)

= 1 − tanh2(x) = 1 − Φ2i . (6)

These derivatives are required in any gradient-based optimization technique ap-plied for training of a MLP network; see Sect 4.1. The gradient with respectto the weights of the second hidden layer and tanh(·) activation function is(i = 1, . . . , M2, j = 0, . . . , M1):

∂y

∂w(2)ij

= wi(1 − Φ2i )Ψj with Ψ0 = 1 . (7)

The gradient with respect to the weights of the first hidden layer and tanh(·)activation function is (j = 1, . . . , M1, l = 0, . . . , p)

∂y

∂w(1)jl

=M2∑

i=0

wi(1 − Φ2i )w

(2)ij (1 − Ψ2

j )ul with u0 = 1 . (8)

The gradient with respect to the ith output layer weight of an MLP is (i =0, . . . , M):

∂y

∂wi= Φi with Φ0 = 1 . (9)

Note that the basis functions Φi in the above equations depend on the networkinputs and the hidden layer weights. These arguments are omitted for betterreadability.

3.1 Modified Activation Function

The problem in the transition from a standard MLP network to a MLP on amicrocontroller (μC) is to overcome the limitations μC impose on MLP calcula-tions. It is important to distinguish two different problems: The minor problemis the limited computational performance due to the architecture of the μC.Thus calculations may have to be done with a reduced accuracy e.g. 16-bit.Early results show, that this limitation has an observable impairing effect on theresulting model, but it does not render the method useless.

The focus of this work and the major problem however is how to computethe network output in real time on a μC with low performance. To calculate aMLP output a series of calculations have to be done. While basic calculationse.g. addition, subtraction, multiplication and division are uncomplicated, thereis one bottleneck, which cannot be solved in real time on a low performanceμC: The calculation of the exponential function in the hyperbolic tangent. Toovercome this, a suitable approximation has to be found. Traditionally a look-up table (LUT) with linear interpolation would be used in an ECU to do this.


An accompanying flaw of this is that a high-resolution LUT would be neededto achieve a satisfying accuracy and smooth model-transition, which demandsa corresponding high consumption of data memory. Therefore as a trade-off be-tween computing time and accuracy an approximation with piecewise polynomialmodels (PPM) is explored.

A simple approximation of the sigmoid function is possible by the applicationof two quadratic functions, given by

g(x) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

−1 for: x ≤ −3,(3 m−1)

9 x2 + m x for: − 3 < x ≤ 0,(1−3 m)

9 x2 + m x for: 0 < x ≤ 3 and1 for: x ≥ 3.

(10)

The aim is to minimize the quadratic difference between the hyperbolic tangentand the approximation above. Numeric optimization gives us m ≈ 0.875. SeeFigure 6a.

4 Application Intake Manifold Model

As a complementary work to [2] the realization of a multilayer perceptron net-work in a resource-friendly algorithm has been investigated for the applicationof the intake manifold model, see Fig. 3. For the abidance by the ambitiousemission laws, the ECU has to model an exact air mass flow depending on theengine operating point. The needed injection fuel mass is calculated with the aidof the model in such a way that an optimal air fuel ratio (λ = 1) is reached forthe exhaust aftertreatment in the catalyzer. Today the modeling of the air massflow (MAF) depending on the manifold air pressure (MAP) occurs in a linearizedform with the ordinate value OFFSET and the gradient SLOPE [4]. These twoparameters are stored for steady state in the ECU with a set of numerous look-up tables subject to the engine speed N, the intake camshaft position CAM IN,the exhaust camshaft position CAM EX and the two-stage actuator settings forthe swirl flap PORT and for the variable intake manifold VIM, see Fig. 4 [1].

4.1 Multilayer Perceptron Network Training

On the engine test bench, 360 operating points are measured by systematiccombination of all inputs to calibrate the look-up table:

– 10 values for engine speed (N)– 3 values for intake camshaft position (CAM IN)– 3 values for exhaust camshaft position (CAM EX)– 2 values for swirl flap position (PORT)– 2 values for variable intake manifold length (VIM)


pim pex

Tex , Vex

pim pex

Tex , Vex

air cleaner

air mass flow meterexhaustsystem

EGR-valve

pamb

pthr

pex

p

(MAP)im

mthr

.megr

.

mcps

.

m (MAF)cyl

.

Fig. 3. Intake Manifold Model

NNeural

Network

or

Look-Up

Table

Neural

Network

or

Look-Up

Table X

OFFSET

MAF

MAP

CAM_IN

CAM_EX SLOPE

Fig. 4. Signal flow for one PORT-VIM-combination

Figure 5 shows the measured set values for one PORT-VIM-combination. Thetwo-stage actuator settings PORT and VIM are binary values and therefore theyare either zero or one. Hence, it proves successfully not to define both binaryset values as an input for the neural network, but to train one network for eachPORT-VIM-combination. The three inputs (N, CAM IN and CAM EX) andthe two outputs (OFFSET and SLOPE) build the training data. With the aid ofthese data, a look-up table can be generated or a MLP network can be trained.After initialization of the weights recommended by Nguyen and Widrow [7] theMLP network was trained using the Levenberg-Marquardt algorithm [3].

The MLP network is extremely sensitive to its activation function. Hence therewill be a problem if the approximation of the AF allows overshooting outside ofthe bounds of the standard hyperbolic tangent [-1,1]. To sort out this problem amore complex approximation has to be used, which strictly forbids overshooting.A solution to this is to increase the number and order of the polynomial functions.A sufficient increase consists of three polynomials of third order; see Fig. 6b.

4.2 Results

In Fig. 7a the overfitting problem is clearly distinguishable, since the measureddata samples for the engine speed (1000 rpm, 2000 rpm, 3000 rpm, 4000 rpm,


0 10 20 30 40 50 60 70 80 900

2000400060008000

data samples [ ]

0 10 20 30 40 50 60 70 80 908898

108118128

data samples [ ]

0 10 20 30 40 50 60 70 80 90-125-115-105-95-85

data samples [ ]

N[R

PM

]C

AM

_IN

[°C

RK

]C

AM

_E

X[°

CR

K]

Fig. 5. Training input data for one PORT-VIM-combination

a) b)

Fig. 6. a) Modified sigmoid function consisting of two polynoms of second order. b)Approximation of the activation function without overshooting.

5000 rpm, 6000 rpm) are approximated very well. The interpolation behaviourshows no satisfying results. Due to the fact that no validation data are obtain-able, the analysis of the neural network interpolation behaviour is done by thecomparison of interpolated values of MLP network and look-up table.

If the MLP is allowed to use the bounded approximation, the interpola-tion behaviour of the MLP in Fig. 7b will be almost equal to that of theLUT. Thus the bounded approximation is helpful to improve the interpola-tion behaviour. As a drawback in comparison to [2] a MLP network needsmany more training attempts until a reliable solution is found. This leads toan extreme time consuming network training compared to a local model treetechnique.


a) b)

Fig. 7. a) Interpolation behavior with CAM IN = 90◦CRK, CAM EX = −90◦CRKand unbounded approximation of the sigmoid function. b) Interpolation behavior withCAM IN = 90◦CRK, CAM EX = −90◦CRK and bounded approximation of the sig-moid function.

5 Conclusion

With the aid of modified activation functions, it is achieved to replace all look-up tables of the intake manifold model with four different MLP networks, onefor each combination of the two-stage actuator settings for the port flap andthe variable intake manifold. The interpolation behaviour of the MLP networkshows no indication of overfitting in comparison with the LUT interpolation. Therealization in a series-ECU yields an obvious reduction of the necessary read-only memory at the expense of a moderately higher computational effort. Thenumber of required parameters of a MLP does not grow exponentially, but as agood approximation linearly with the number of inputs. Hence, this alternativeway of modeling avoids the so-called ”curse of dimensionality”.

References

1. Banfer, O., Nelles, O., Kainz, J., Beer, J.: Local model networks - the prospectivemethod for modeling in electronic control units? AT Zelektronik 3(6), 36–39 (2008)

2. Banfer, O., Nelles, O., Kainz, J., Beer, J.: Local Model Networks with ModifiedParabolic Membership Functions. In: 2009 International Conference on Artificial In-telligence and Computational Intelligence, pp. 179–183. IEEE, Los Alamitos (2009)

3. Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardtalgorithm. IEEE Transactions on Neural Networks 5(6), 989–993 (1994)

4. Heywood, J.B.: Internal Combustion Engine Fundamentals. McGraw-Hill, Inc., NewYork (1988)

5. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are uni-versal approximators. Neural Networks 2(5), 359–366 (1989)

6. Nelles, O.: Nonlinear System Identification. Springer, Berlin (2001)7. Nguyen, D., Widrow, B.: Improving the learning speed of 2-layer neural networks by

choosing initial values of the adaptive weights. In: Proceedings of the InternationalJoint Conference on Neural Networks, Washington, vol. 3, pp. 21–26 (1990)

[lecture notes in computer science] artificial intelligence and computational intelligence volume...

Documents