chapter 12 training recurrent networks for filtering and control · nonrecurrent (feedforward)...

Chapter 12

TRAINING RECURRENT NETWORKSFOR FILTERING AND CONTROL

Martin T. Hagan, Orlando De Jesús, Roger Schultz

School of Electrical and Computer EngineeringOklahoma State University, Stillwater, Oklahoma

ries.ut isrentt alsoson,

have

N), thetiontainsd to thented

rrent fortheing.amic aitter.

of a

areat weigitalredack

I. INTRODUCTION

Neural networks can be classified into recurrent and nonrecurrent categoNonrecurrent (feedforward) networks have no feedback elements; the outpcalculated directly from the input through feedforward connections. In recurnetworks the output depends not only on the current input to the network, buon the current or previous outputs or states of the network. For this rearecurrent networks are more powerful than nonrecurrent networks and important uses in control and signal processing applications.

This chapter introduces the Layered Digital Recurrent Network (LDRdevelops a general training algorithm for this network, and demonstratesapplication of the LDRN to problems in controls and signal processing. In SecII we present the notation necessary to represent the LDRN. Section III cona discussion of the dynamic backpropagation algorithms that are requirecompute training gradients for recurrent networks. The concepts underlyingbackpropagation-through-time and forward perturbation algorithms are presein a unified framework and are demonstrated for a simple, single-loop recunetwork. In Section IV we describe a general forward perturbation algorithmcomputing training gradients for the LDRN. Two application sections follow discussion of dynamic backpropagation: neurocontrol and nonlinear filterThese sections demonstrate the implementation of the general dynbackpropagation algorithm. The control section (Section V) appliesneurocontrol architecture to the automatic equalization of an acoustic transmThe nonlinear filtering section (Section VI) demonstrates the application recurrent filtering network to a noise-cancellation application.

II. PRELIMINARIES

In this section we want to introduce the types of neural networks thatdiscussed in the remainder of this chapter. We also present the notation thuse to represent these networks. The networks we use are Layered DRecurrent Networks (LDRN). They are a generalization of the LayeFeedforward Network (LFFN), which has been modified to include feedb

325

Training Recurrent Networks for Filtering and Control

then

this.). Then one

,rscript. The

nsfer thehetputtput

ome), 2)N, ar.) Inwithfrom

n weers; haveould

,

connections and delays. We begin here with a description of the LFFN andshow how it can be generalized to obtain the LDRN.

A. LAYERED FEEDFORWARD NETWORKFigure 1 is an example of a layered feedforward network (two layers in

case). (See Demuth et al. [1998] for a full description of the notation used hereThe input vector to the network is represented by , which has elementssuperscript represents the input number, since it is possible to have more thainput vector. The input is connected to Layer 1 through the input weight where the first superscript represents the layer number and the second superepresents the input number. The bias for the first layer is represented by net input to Layer 1 is denoted by , and is computed as

(1)

The output of Layer 1, , is computed by passing the net input through a trafunction, according to . The output has elements. The output offirst layer is input to the second layer through the layer weight . Toverall output of the network is labeled . This is typically chosen to be the ouof the last layer in the network, as it is in Figure 1, although it could be the ouof any layer in the network.

Figure 1. Example of a Layered Feedforward Network

Each layer in the LFFN is made up of 1) a set of weight matrices that cinto that layer (which may connect from other layers or from external inputsa bias vector, 3) a summing junction, and 4) a transfer function. (In the LDRset of tapped delay lines may also be included in a layer, as we will see latethe example given in Figure 1, there is only one weight matrix associated each layer, but it is possible to have weight matrices that are connected several different input vectors and layer outputs. This will become clear wheintroduce the LDRN network. Also, the example in Figure 1 has only two layour general LFFN can have an arbitrary number of layers. The layers do notto be connected in sequence from Layer 1 to Layer M. For example, Layer 1 cbe connected to both Layer 3 and Layer 4, by weights and

p1R

1

IW 1 1,

b1

n1

n1 IW 1 1, p1 b1+=

a1

a1 f1 n1( )= S1

LW 2 1,

y

p1 a1 a2 = y

1 1

n1 n2R1 x 1

S1 x1

S1 x 1

S2x 1

S2x 1

S2x 1

Input

S2x S1AAAALW2,1

S1 x 1AAb1

S1 x RAAAAIW1,1

AAb2

Layer 1 Layer 2

R1 S 1 S 2

a2 =f2 (LW2,1a1 +b2)a1 = f1 (IW1,1p1 +b1)

AAAAAA

f1

AAAAAA

f2

LW 3 1, LW 4 1,

326

uenceby an thed in

FFN.n beialainwork

theputput.e onctor.tput.

upsferlso

(Anyple,

thate.

t iscated firstdelay

respectively. Although the layers do not have to be connected in a linear seqby layer number, it must be possible to compute the output of the network simple sequence of calculations. There cannot be any feedback loops inetwork. The order in which the individual layer outputs must be computeorder to obtain the correct network output is called the simulation order.

B. LAYERED DIGITAL RECURRENT NETWORKWe now introduce a class of recurrent networks that are based on the L

The LFFN is a static network, in the sense that the network output cacomputed directly from the network input, without the knowledge of initnetwork states. A Layered Digital Recurrent Network (LDRN) can contfeedback loops and time delays. The network response is a function of netinputs, as well as initial network states.

The components of the LDRN are the same as those of the LFFN, withaddition of the tapped delay line (TDL), which is shown in Figure 2. The outof the TDL is a vector containing current and previous values of the TDL inIn Figure 2 we show two abbreviated representations for the TDL. In the casthe left, the undelayed value of the input variable is included in the output veIn the case on the right, only delayed values of the input are included in the ou

Figure 2. Tapped Delay Line

Figure 3 is an example of an LDRN. Like the LFFN, the LDRN is madeof layers. In addition to the weight matrices, bias, summing junction, and tranfunction, which make up the layers of the LFFN, the layers of the LDRN ainclude any tapped delay lines that appear at the input of a weight matrix. weight matrix in an LDRN can be proceeded by a tapped delay line.) For examLayer 1 of Figure 3 contains the weight and the TDL at its input. Note all of the layer outputs and net inputs in the LDRN are explicit functions of tim

The output of the TDL in Figure 3 is labeled . This indicates that ia composite vector made up of delayed values of the output of Subnet 2 (indiby the second superscript) and is an input to Subnet 2 (indicated by thesuperscript). (A subnet is a series of layers which have no internal tapped

a(t)

AAAAD

AAD

AAD

a(t - 1)

a(t - d)

a(t)

a(t)

d

TDL

x(t)

Sx1 Sdx1

a(t)

d

TDL

x(t)

S(d+1)x1Sx1

Undelayed ValueIncluded

Undelayed ValueDiscarded

Abbreviated Notation

LW 1 2,

a2 2,

t( )

327


layer.tputs

theork

ts inf thein then the the

also the

theient

tion

et of

lines. The number of the subnet is the same as the number of its output These concepts are defined more carefully in a later section.) These TDL ouare important variables in our training algorithm for the LDRN.

Figure 3. Layered Digital Recurrent Network Example

In the LDRN, feedback is added to an LFFN. Therefore, unlike the LFFN,output of the network is a function not only of the weights, biases, and netwinput, but also of the outputs of some of the network layers at previous pointime. For this reason, it is not a simple matter to calculate the gradient onetwork output with respect to the weights and biases (which is needed to tranetwork). This is because the weights and biases have two different effects onetwork output. The first is the direct effect, which can be calculated usingstandard backpropagation algorithm [Hagan et al., 1996]. The second is anindirect effect, since some of the inputs to the network, such as , arefunctions of the weights and biases. In the next section we briefly describegradient calculations for the LFFN and show how they must be modified forLDRN. The main development of the next two sections is a general gradcalculation for arbitrary LDRN’s.

III. PRINCIPLES OF DYNAMIC LEARNING

Consider again the multilayer network of Figure 1. The basic simulaequation of such a network is

, (2)

where k is incremented through the simulation order. The task of the network is to learn associations between a specified s

input/output pairs: {(p1, t1), (p2, t2), ... , (pQ, tQ)}. The performance index for thenetwork is

p1(t)

a1(t)

a2 (t)

1 1

n1(t) n2(t)

Input

AAAALW2,1

AAAAAA

b1AAAAAAIW1,1

AAAA

b2

Layer 1 Layer 2

AAAAAA

AAAAAA

f1

AAAAAA

f2

TDLd1,2AAAA

LW1,2

a2,2(t)

y(t)

a1 2,

t( )

ak fk IW k i, pi

i

∑ LW k j, aj

j

∑ bk+ +

=

328

use

input/

it

tion,

(3)

where is the output of the network when the qth input, , is presented, and is a vector containing all of the weights and biases in the network. (Later we

to represent the weights and biases in Layer i.) The network should learn the vector that minimizes .

For the standard backpropagation algorithm [Hagan et al., 1996] we use asteepest descent learning rule. The performance index is approximated by:

, (4)

where the total sum of squares is replaced by the squared errors for a singleoutput pair. The approximate steepest (gradient) descent algorithm is then:

, (5)

where α is the learning rate. Define

(6)

as the sensitivity of the performance index to changes in the net input of uni inlayer k. Using the chain rule, we can show that

, ,

(7)

It can also be shown that the sensitivities satisfy the following recurrence relain which m is incremented through the backpropagation order, which is thereverse of the simulation order:

(8)

where

F x( ) tq yq–( )T tq yq–( )

q 1=

Q

∑ eqTeq

q 1=

Q

∑= =

yq pq x

xi xF

F̂ eqTeq=

∆wi j,k l, α F̂∂

wi j,k l,∂

------------–= ∆bik α F̂∂

bik∂

--------–=

sik F̂∂

nik∂

--------≡

F̂∂

iwi j,m l,∂

----------------F̂∂

nim∂

---------ni

m∂

iwi j,m l,∂

----------------× sim

pjl

= =F̂∂

lwi j,m l,∂

----------------F̂∂

nim∂

---------ni

m∂

lwi j,m l,∂

----------------× sim

ajl

= =

F̂∂

bim∂

---------F̂∂

nim∂

---------ni

m∂

bim∂

---------× sim

= =

sm F·m

nm( ) LW i m,( )

Tsi

i

∑=

329


the (11). (7).at wendard

finderentd for thecount

FNh is thed the

(9)

and

(10)

This recurrence relation is initialized at the output layer:

. (11)

The overall learning algorithm now proceeds as follows: first, propagateinput forward using Eq. (2); next, propagate the sensitivities back using Eq.and Eq. (8); and finally, update the weights and biases using Eq. (5) and Eq

Now consider an LDRN, such as the one shown in Figure 3. Suppose thuse the same gradient descent algorithm, Eq. (5), that is used in the stabackpropagation algorithm. The problem in this case is that when we try tothe equivalent of Eq. (7) we note that the weights and biases have two diffeffects on the network output. The first is the direct effect, which is accounteby Eq. (7). The second is an indirect effect, since some of the inputs tonetwork, such as , are also functions of the weights and biases. To acfor this indirect effect we must use dynamic backpropagation.

To illustrate dynamic backpropagation [Yang et al., 1993, Yang, 1994],consider Figure 4, which is a simple recurrent network. It consists of an LFwith a single feedback loop added from the output of the network, whicconnected to the input of the network through a single delay. In this figurevector represents all of the network parameters (weights and biases) anvector represents the output of the LFFN at time step t.

Figure 4. Simple Recurrent Network

F·m

nm( )

f·m

n1m( ) 0 … 0

0 f·m

n2m( ) … 0

0 0 … f·m

nS

mm( )

= … … …f·

mnj

m( )fm

njm( )∂

njm∂

--------------------=

sM2F

·MnM

( ) tq yq–( )–=

a1 2,

t( )

xa t( )

p(t)a(t)

LFFN

a(t) = NN(p(t),a(t-1),x)

AD

330

They

ctdardt are

that

TT)

Now suppose that we want to minimize

(12)

In order to use gradient descent, we need to find the gradient of F with respect tothe network parameters. There are two different approaches to this problem.both use the chain rule, but are implemented in different ways:

(13)

or

(14)

where the superscript e indicates an explicit derivative, not accounting for indireeffects through time. The explicit derivatives can be obtained with the stanbackpropagation algorithm, as in Eq. (8). To find the complete derivatives tharequired in Eq. (13) and Eq. (14), we need the additional equations:

(15)

and

(16)

Eq. (13) and Eq. (15) make up the forward perturbation (FP) algorithm. Notethe key term is

(17)

which must be propagated forward through time.Eq. (14) and Eq. (16) make up the backpropagation-through-time (B

algorithm. Here the key term is

(18)

F x( ) t t( ) a t( )–( )T t t( ) a t( )–( )

t 1=

Q

∑=

F∂x∂

------a t( )∂x∂

-------------T ∂e

Fa t( )∂

-------------×

t 1=

Q

∑=

F∂x∂

------∂ea t( )

x∂---------------

T ∂Fa t( )∂

-------------×

t 1=

Q

∑=

a t( )∂x∂

-------------∂ea t( )

x∂---------------

∂ea t( )a t 1–( )∂

----------------------a t 1–( )∂

x∂----------------------×+=

F∂a t( )∂

-------------∂e

Fa t( )∂

-------------∂ea t 1+( )

a t( )∂-------------------------

F∂a t 1+( )∂

----------------------×+=

a t( )∂x∂

-------------

F∂a t( )∂

-------------

331


theme,ativeswellt each

d toic

the

, andights

d e ther layering oforalled

islay

on ofnet tothe

il you. Thelayerer infines thelayerthe

which must be propagated backward through time.In general, the FP algorithm requires somewhat more computation than

BTT algorithm. However, the BTT algorithm cannot be implemented in real tisince the outputs must be computed for all time steps, and then the derivmust be backpropagated back to the initial time point. The FP algorithm is suited for real time implementation, since the derivatives can be calculated atime step.

IV. DYNAMIC BACKPROP FOR THE LDRN

In this section, we generalize the FP algorithm, so that it can be appliearbitrary LDRN’s. This is followed by applications of the LDRN and dynambackpropagation to problems in filtering and control.

A. PRELIMINARIESTo explain this algorithm, we must create certain definitions related to

LDRN. We do that in the following paragraphs.First, as we stated earlier, a layer consists of a set of weights, associated

tapped delay lines, a summing function, and a transfer function. The network hasinputs that are connected to special weights, called input weights, and denoted by

, where denotes the number of the input vector that enters the weight denotes the number of the layer to which the weight is connected. The we

connecting one layer to another are called layer weights and are denoted by, where denotes the number of the layer coming into the weight an

denotes the number of the layer at the output of weight. In order to calculatnetwork response in stages, layer by layer, we need to proceed in the propeorder, so that the necessary inputs at each layer will be available. This orderlayers is called the simulation order. In order to backpropagate the derivatives fthe gradient calculations, we must proceed in the opposite order, which is cthe backpropagation order.

In order to simplify the description of the training algorithm, the LDRN divided into subnets. A subnet is a section of a network that has no tapped delines, except at the subnet input. Every LDRN can be organized as a collectisubnets. We define the subnets by proceeding backwards from the last subthe first subnet. To locate the last subnet, start at the first layer in backpropagation order and proceed through the backpropagation order untfind a layer containing delays, which becomes the first layer in the last subnetlast subnet is then defined as containing all of the layers beginning at the containing delays and continuing through the simulation order to the first laythe backpropagation order (or the last layer in the simulation order). This dethe last subnet. To find the preceding subnet, start with the next layer inbackpropagation order and proceed in the same way until you find the next with delays. This process continues until you reach the last layer in backpropagation order, at which time the first subnet is defined. As with the layer

IW i j, ji

LW i j, j i

332

nets.yer.-2-1

13)ave

f thect ofndardhichviousack(15),ate theave

put toationsq. (8),ies anhich the

simulation order, we can also define a subnet simulation order that starts at thefirst subnet and continues until the last subnet.

For example, the LDRN shown in Figure 5 has thee layers and two subTo simplify the algorithm, the subnet is denoted by the number of its output laFor this network the simulation order is 1-2-3, the backpropagation order is 3and the subnet simulation order is 1-3.

Figure 5. Three-layer LDRN with Two Subnets

B. EXPLICIT DERIVATIVESWe want to generalize the forward perturbation (FP) algorithm of Eq. (

and Eq. (15) so that it can be applied to an arbitrary LDRN. Notice that we htwo terms on the right-hand side of Eq. (15). We have an explicit derivative operformance with respect to the weights, which accounts for the direct effethe weights on the performance and can be computed through the stabackpropagation algorithm, as in Eq. (8). We also have a second term, waccounts for the fact that the weights have a secondary effect through the prenetwork output. In the general LDRN, we may have many different feedbloops, and therefore there could be many different terms on the right of Eq. and each of those terms would have a separate equation like Eq. (15) to updtotal derivative through time. For our development of the FP algorithm, we hone term (and an additional equation) for every place where one subnet is inanother subnet. Recall that the subnet boundaries are determined by the locof the tapped delay lines. Within a subnet, standard backpropagation, as in Ecan be used to propagate the explicit derivatives, but at the subnet boundarequation like Eq. (15) must be used to calculate the total derivative, wincludes both direct and indirect effects. In this subsection we describe

p1(t) a1(t)a2 (t)

1 1

n1(t) n2(t)

AAAALW2,1

AA

b1

AAAAIW1,1

AAAA

b2

Layer 1 Layer 2

AAAAAA

f1

AAA

f2

TDL

d 1,1AAAA

LW1,1

a1,3(t)

y(t)

TDL

d 1,3AAAALW1,3

a1,1(t)

TDL

d 2,3 AAAALW2,3

a3,3(t)

TDLd

2,1 AAAALW3,2

AAAA

b3

AAA

f3

1

a3 (t)

Layer 3

Subnet 3Subnet 1

a3,1(t)n3(t)

333


we

d forbnet

et

er,

trary(13)15),e Eq.xplicit

ngplicitamicubnets the3).

computation of the explicit derivatives, and then in the following subsectionexplain the total derivative computation.

A backpropagation process to calculate the explicit derivatives is needeeach subnet. These equations involve calculating the derivative of the suoutput with respect to each layer output in the subnet. The basic equation is

(19)

where represents a subnet output, is the output of a layer in subnjz, is the net input of layer k, which has a connection from layer i. The index i

is incremented through the backpropagation order. If we define

, and note that , (20)

then we can write Eq. (19) as

. (21)

This recursion, where i is incremented along the backpropagation ordbegins at the last layer in the subnet:

, (22)

where is an identity matrix whose dimension is the size of layer .

C. COMPLETE FP ALGORITHM FOR THE LDRNWe are now ready to describe a generalized FP algorithm for the arbi

LDRN. There are two parts to the FP algorithm: Eq. (13) and Eq. (15). Eq. remains the same for the LDRN as for the simple network in Figure 4. Eq. (however, must be modified. For the general case we have one equation lik(15) for each subnet output. Each of these equations has a term for the ederivative and one additional term for each subnet output.

The complete FP algorithm for the LDRN network is given in the followiflowchart. It contains three major sections. The first section computes the ex(static) derivatives, as in Eq. (21), which are needed as part of the dynequations. The second section computes the dynamic derivatives of the soutputs with respect to the weights, as in Eq. (15). The final section computedynamic derivatives of performance with respect to the weights, as in Eq. (1

ae jzt( )∂

∂ai t( )-------------------

ne kt( )∂

∂ai t( )------------------

ae jzt( )∂

∂nk t( )-------------------×

k

∑=

ajz t( ) ai t( )nk t( )

Sk jz, ae jzt( )∂

∂nk t( )-------------------≡ ne k

t( )∂∂ai t( )------------------ LW k i,=

ae jzt( )∂

∂ai t( )------------------- LW k i, Sk jz,×

k

∑=

ae jzt( )∂

∂ajz t( )------------------- I=

I jz

334

Start

Select i = next layer in backpropagation order

Is ioutput ofa subnet?

Yes

Set jz = i

ae jzt( )∂

∂ajz t( )------------------- I=Set

No

for all k layers connected

ae jzt( )∂

∂ai t( )------------------- LW k i, Sk jz,×

k

∑=

Calculate ajz t( )

e∂

ni t( )∂-------------------

Calculate for weights and biases in layer i ( )

YesFor each layer j that

ajz t( )e

∂ajz j, t( )∂

--------------------

Endbackprop-agation ?

No

No Yesα

in the network (x); Set initial time t = 1

δ

Select i = last layer in backpropagation order

connects to layer icalculate

to layer i

ajz t( )e

∂xi∂

-------------------

Initialize for all weights and biasesF∂x∂

------ 0=

xi

Is iinput ofa subnet?

Explicit (Static) Derivatives

335


jz = First subnet; jj = First subnet

Yes

Increment jj

Calculate:

for all weights and biases in network (x).

Isjj > lastNo

Yes

α

Is the

ajz t( )∂x∂

----------------ajz t( )∂

x∂----------------

ajzet( )∂

ajz j j, t( )∂----------------------

ajz j j, t( )∂x∂

----------------------+=

Yes

No

subnet ?

Increment jz; jj = First subnet

Isjz > lastNo

subnet ?β

output of subnet jj an input to subnet jz

ajz t( )∂x∂

----------------ajze

t( )∂x∂

-------------------=Initialize for all weights and biases (x)

Dynamic Derivatives of Network Outputs with Respect to Weights

336

ic an

n tousticustic

rentovidedoftenuch aserent

mitted

V. NEUROCONTROL APPLICATION

In this section we illustrate the application of the LDRN and dynambackpropagation to a control problem. We wish to control the output ofacoustic transmitter, whose schematic diagram is shown in Figure 6.

Binary data is transmitted via acoustic signals from one pipeline locatioanother. Acoustic stress waves are imparted into the pipeline by the acotransmitter. The stress waves propagate through the pipeline to the acoreceiver, which receives the transmitted signal. Tone bursts of diffefrequencies are used to represent either a 1 or a 0. The acoustic channel prby the pipeline causes heavy distortion in the transmitted signal. There are extraneous unwanted acoustic stress waves created by external sources, sengines, geartrains, and pumps, that are imparted into the pipeline. The inhchannel distortion and the unwanted external noise can degrade the trans

jz = Last subnet

Yes

Calculate

Calculate for the output layer jz

β

Increment time t = t+ 1

Ist > last timeNoin training

Fe∂

ajz t( )∂----------------

x∂∂F

x∂∂F ajz t( )∂

x∂----------------

T Fe∂

ajz t( )∂----------------+=

δ End

set ?

for all weights and biases in the network (x)

Dynamic Derivatives of Performance with Respect to Weights

337


ble oralize that theignalsured, the

ctiveh, andtwork

for witharedsed,

.encetheth forts arebackat the

ller

signal such that signal detection and interpretation at the receiver are unreliaimpossible. One method of enhancing communication performance is to equthe effects of the transmission channel by adjusting the transmitter output sothe measured signal imparted to the pipeline at a short distance fromtransmitter is the desired signal. Ideally, the feedback measurement of this sis taken at the receiver. Of course, in practice, the feedback signal is meanear the transmitter. To alleviate the effects of unwanted disturbance noisetransmitter can actively cancel the disturbance noise by way of destruinterference. The transmitter creates stress waves that are out of phase witof equal magnitude to, the undesired signals. In this example, a neural necontroller is used as both a channel equalizer and an active noise canceller.

Figure 6. Schematic of Acoustic Transmitter/Receiver System.

In addition to illustrating dynamic backpropagation on the neurocontrollerthe acoustic transmitter, this section also demonstrates the effect of trainingapproximations to true dynamic derivatives. The evaluation is based on squerror performance and floating point operations. When approximations are uthe computational burden can be reduced, but the errors generally increase

Figure 7 is a schematic of the control system. In this system, model referadaptive control (MRAC) [Narendra, 1990] is applied to the control of acoustic transmitter. The plant model is used only as a backpropagation pathe derivatives needed to adjust the controller weights; the plant model weighnot adjusted. The plant model is a 2-layer LDRN, with 10 input taps, 50 feedtaps, and 15 hidden neurons. The controller weights are adjusted such therror ec(t), between a delayed reference input r (t) and the actual plant output c(t),is minimized. The controller structure consists of 40 input taps, 50 controfeedback taps, 75 plant output feedback taps, and 15 hidden neurons.

Neurocontroller +-

Transmitter Receiver

Amplifier

Pipeline

Accelerometer

Acoustic Acoustic

338

339

r

r(

a3 (t)n3(t)

f2

AAAAAA

f3 y(t)AAAALW4,3

AAb4 AAAAAA

f4

1

a4 (t)

l Network Plant Model

n4(t)AAAAAAAAAousticsmitter

ep(t)

ec(t)

c(t)

Figure 7. Self-Equalizing Acoustic Transmitte

t) a1(t)

1

1

n1(t)

AAAALW3,2A

Ab1

AAAA

IW1,1

AAb3

AAAAAAAA

f1

TDL50

AALW1,2

a2,4(t)

TDL75AAAA

LW1,4

a2,2(t)

TDL50 AAAALW3,4

a4,4(t)

TDL10

NeuraNeural Network Controller

a4,2(t)

a2 (t)

1

AAAA

LW2,1

AAb2

f2

AAAAAAAA

f2

AAAAAAAAAAc

TranTDL40

n2(t)


stemidedtworkr 2)The

al FPstart

If we apply the concepts described in the previous sections to the syshown in Figure 7, we notice that the total system is an LDRN that can be divinto two subnets. The last subnet (number 4) corresponds to the Neural NePlant Model (with as the subnet output). The first subnet (numbecorresponds to the Neural Controller (with as the subnet output). subnets are fully connected, so we have two sets of training equations:

(23)

and

(24)

We now show how these equations can be developed using our generalgorithm, which was described in the flowchart in the previous section. We in the last layer in the backpropagation order (layer 4), obtaining:

;

;

Layer 3 is not the output of a subnet, so we apply:

a4t( )

a2t( )

a4 t( )∂x∂

---------------a4 t( )

e∂

x∂-----------------

a4et( )∂

a4 2, t( )∂--------------------

a4 2, t( )∂x∂

--------------------a4e

t( )∂a4 4, t( )∂

--------------------a4 4, t( )∂

x∂--------------------+ +=

a2 t( )∂x∂

---------------a2 t( )

e∂

x∂-----------------

a2et( )∂

a2 4, t( )∂--------------------

a2 4, t( )∂x∂

--------------------a2e

t( )∂a2 2, t( )∂

--------------------a2 2, t( )∂

x∂--------------------+ +=

ae 4t( )∂

a4 t( )----------------- I= S4 4, ae 4

t( )∂n4 t( )

-----------------≡ F·4

n4( )=

a4 t( )e

∂b4∂

----------------- F·4

n4( )=a4 t( )

e∂LW 4 3,∂

--------------------- F·4

n4( ) a3 t( )[ ]T⋅=

ae 4t( )∂

a3 t( )----------------- LW 4 3, S4 4,×=

S3 4, ae 4t( )∂

n3 t( )-----------------≡ LW 4 3, S4 4,× F

·3n3( )×=

a4 t( )e

∂b3∂

----------------- LW 4 3, S4 4,× F·3

n3( )×=

a4 t( )e

∂LW 3 2,∂

--------------------- LW 4 3, S4 4,× F·3

n3( )× a4 2, t( )[ ]T×=

340

bnet,

the

Layer 3 has two inputs with delays, therefore it is the beginning of the last suand we calculate:

Layer 2 in the neural controller is the end of the first subnet. So we applyequations:

;

;

Layer 1 is not the output of a subnet, so we apply:

a4 t( )e

∂LW 3 4,∂

--------------------- LW 4 3, S4 4,× F·3

n3( )× a4 4, t( )[ ]T×=

a4 t( )e∂a4 2, t( )∂

-------------------- LW 4 3, S4 4,× F·3

n3( )× LW 3 2,×=

a4 t( )e

∂a4 4, t( )∂

-------------------- LW 4 3, S4 4,× F·3

n3( )× LW 3 4,×=

ae 2t( )∂

a2 t( )----------------- I= S2 2, ae 2

t( )∂n2 t( )

-----------------≡ F·2

n2( )=

a2 t( )e

∂b2∂

----------------- F·2

n2( )=a2 t( )

e∂LW 2 1,∂

--------------------- F·2

n2( ) a1 t( )[ ]T⋅=

ae 2t( )∂

a1 t( )----------------- LW 2 1, S2 2,×=

S1 2, ae 2t( )∂

n1 t( )-----------------≡ LW 2 1, S2 2,× F

·1n1( )×=

a2 t( )e

∂b1∂

----------------- LW 2 1, S2 2,× F·1

n1( )×=

a2 t( )e

∂LW 1 1,∂

--------------------- LW 2 1, S2 2,× F·1

n1( )× a2 2, t( )[ ]T×=

a2 t( )e∂LW 4 1,∂

--------------------- LW 2 1, S2 2,× F·1

n1( )× a2 4, t( )[ ]T×=

341


d we

thenow

med

thees of

Layer 1 has two inputs with delays, so it is the end of the first subnet, ancalculate

Now that we have finished with the backpropagation step, we haveexplicit derivatives for all of the weights and biases in the system. We are ready to calculate the dynamic derivatives. We initialize

and calculate

and

.

This gives us Eq. (24) for all weights and biases. A similar process is perforfor subnet 4 to obtain Eq. (23).

We have now computed all of the dynamic derivatives of the outputs ofsubnets with respect to the weights. The next step is to compute the derivativthe performance function with respect to the weights. We must first calculate

,

to obtain

a2 t( )e

∂IW 1 1,∂

------------------- LW 2 1, S2 2,× F·1

n1( )× r t( )[ ]T×=

a2 t( )e∂a2 2, t( )∂

-------------------- LW 2 1, S2 2,× F·1

n1( )× LW 1 2,×=

a2 t( )e

∂a2 4, t( )∂

-------------------- LW 2 1, S2 2,× F·1

n1( )× LW 1 4,×=

a2 t( )∂x∂

---------------a2e

t( )∂x∂

-----------------=

a2 t( )∂x∂

---------------a2 t( )∂

x∂---------------

a2et( )∂

a2 2, t( )∂--------------------

a2 2, t( )∂x∂

--------------------+=

a2 t( )∂x∂

---------------a2 t( )∂

x∂---------------

a2et( )∂

a2 4, t( )∂--------------------

a2 4, t( )∂x∂

--------------------+=

Fe∂

a4 t( )∂--------------- 2 r t( ) a4 t( )–( )–=

342

ed for

traterenceulses,mly at

ller, aneousponse, the

wasaticlicitan thehown

te a plantlicit plot

for all the weights and biases in the neural controller. The process is repeateach sample time in the training set.

The LDRN was trained using the preceding equations. Now we demonsthe performance of the closed-loop acoustic transmitter system. The refeinput used in the simulation consisted of a random sequence of tone burst pas shown in Figure 8. The tone bursts are evenly spaced and appear randoone of two frequencies. In order to investigate the robustness of the controperiodic disturbance noise was added to the plant input, representing extraacoustic stress waves created by external sources. The open-loop plant reswith no controller, is shown in Figure 9. The closed-loop response, aftercontroller was trained to convergence, is shown in Figure 10.

Figure 8. Tone Burst Reference Input

In the next test, dynamic backpropagation was used in plant model, butnot used in backpropagating derivatives in the controller. Only stbackpropagation was used in the controller. (In Eq. (24) only the expderivative terms are calculated.) This procedure requires less computation thfull dynamic backpropagation but may not be as accurate. The results are sin Figure 11.

For the last test, dynamic backpropagation was only used to compudynamic derivative across the first delay in the tapped-delay line between themodel and the controller. All other derivatives were computed using only exp(static) derivatives. The controller weights never converged in this case, so aof the results is not shown. The results are summarized in Table 1.

x∂∂F

x∂∂F a4 t( )∂

x∂---------------

T Fe∂

a4 t( )∂---------------+=

2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0

- 1

- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

0 . 8

1

343


Figure 9. Open-loop Plant Response

Figure 10. Closed-Loop System Response (Full Dynamic Training)

Figure 11. Response without Dynamic Controller Training

2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0

- 1

- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

0 . 8

1

2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0

- 1

- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

0 . 8

1

0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0- 1

- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

0 . 8

1

344

ests.atingple, usedthemic8%.

andt anctive, dueoundt filter

stic activecation

Table 1 provides a summary of the performance results for all three tThese results highlight the increased computational burden when calculdynamic derivatives instead of simply static derivatives. In this examreasonable performance was possible even when dynamic derivatives wereonly in the plant model. This derivative approximation decreased computational burden by approximately 65%. Using essentially no dynaderivatives in training reduced the computational burden by approximately 9However, performance in this case was unacceptable.

VI. RECURRENT FILTER

This section provides a second example of the application of the LDRNdynamic backpropagation. We use a multi-loop recurrent network to predicexperimental acoustic signal. The prediction of acoustic signals is crucial in asound cancellation systems. Acoustic environments are often very complexto the complexity of typical sound sources and the presence of reflected swaves. The dynamic nature of acoustical systems makes the use of recurrenstructures of great interest for prediction and control of this type of system.

Figure 12. Active Noise Control System

Figure 12 depicts a typical Active Noise Control (ANC) system. An acounoise source creates undesirable noise in a surrounding area. The goal of thenoise suppression system is to reduce the undesirable noise at a particular lo

Derivative Method

Flops/SampleSum Squared

Error

Full Dynamic 9.83 x 105 43.44

Plant Only Dynamic 3.48 x 105 55.53

No Dynamic 1.85 x 104 127.88

Table 1. Simulation Results for the Neural Controller

UNDESIREDNOISE

NEURAL NETWORK

Noise Control Zone

MIC 2

MIC 1

ERROR

-

ACOUSTIC NOISEFILTER PATH

+

345


noiseely,nted

forction

stic the

three

re is

al FP last

by using a loudspeaker to produce “anti-noise” that attenuates the unwantedby destructive interference. In order for a system of this type to work effectivit is critical that the ANC system be able to predict (and then cancel) unwasound in the noise control zone.

In the first part of this section we develop the dynamic training equationsthe ANC system. Then we present experimental results showing the prediperformance.

Figure 13 shows the structure of the LDRN used for predicting the acoudata. In this network there are 3 cascaded recurrent structures. If we followmethods described in Section IV, we see that the system is composed ofsubnets. Therefore, we have three sets of training equations:

(25)

(26)

(27)

Notice that in Eq. (27) there is only one dynamic term. This is because theonly one tapped-delay input that comes from a subnet.

We now show how these equations can be developed using our generprocedure, which was described in the flowchart of Section IV. We start in thelayer in the backpropagation order (Layer 6) to get the following equations:

,

,

a6 t( )∂x∂

---------------a6 t( )

e∂

x∂-----------------

a6et( )∂

a6 4, t( )∂--------------------

a6 4, t( )∂x∂

--------------------a6e

t( )∂a6 6, t( )∂

--------------------a6 6, t( )∂

x∂--------------------+ +=

a4 t( )∂x∂

---------------a4 t( )

e∂

x∂-----------------

a4et( )∂

a4 2, t( )∂--------------------

a4 2, t( )∂x∂

--------------------a4e

t( )∂a4 4, t( )∂

--------------------a4 4, t( )∂

x∂--------------------+ +=

a2 t( )∂x∂

---------------a2 t( )

e∂

x∂-----------------

a2et( )∂

a2 2, t( )∂--------------------

a2 2, t( )∂x∂

--------------------+=

ae 6t( )∂

∂a6 t( )----------------- I= S6 6, ae 6

t( )∂∂n6 t( )-----------------≡ F

·6n6( )=

a6 t( )e

∂b6∂

----------------- F·6

n6( )=a6 t( )

e∂LW 6 5,∂

--------------------- F·6

n6( ) a5 t( )[ ]T⋅=

346

Figure 13. Cascaded Recurrent Neural Network

a1(t)

1

n1(t)

AAAA

b1AAAAIW1,1

AAA

f1

a2,2(t)

TDL10AAAAAALW1,2

1AAAALW2,1

AAAA

b2

AAAAAA

f2

TDL10

n2(t)

a3(t)

1

n3(t)

AAb3

AAAAIW3,1

AAA

f3

TDL5AAAAAA

LW3,2

a4,4(t)

TDL10AAAAAA

LW3,4

a4,2(t)

1

AAAALW2,1

AAb4 AAAAAA

f4

TDL10

n4(t)

a5(t)

1

n5(t)

AAAAb5

AAAAIW5,1

AAAA

f5

TDL5AAAAAALW5,4

a6,6(t)

TDL10

AAALW5,6

a6,4(t)

1

AAAA

LW2,1

AAAAb6

AAAAAAAA

f6

TDL10

n6(t)

r(t)

e (t)

a6(t)

a4(t)

a2(t)

347


e the

plicit

Layer 5 is not the output of a subnet, so the resulting equations are:

Layer 5 has two tapped-delay inputs from subnets, so we must calculatexplicit derivatives of the subnet output with respect to these inputs to yield:

Layer 4 is the end of the second subnet, so we now calculate the exderivatives with respect to the second subnet output.

;

ae 6t( )∂

∂a5 t( )----------------- LW 6 5, S6 6,×=

S5 6, ae 6t( )∂

∂n5 t( )-----------------≡ LW 6 5, S6 6,× F

·5n5( )×=

a6 t( )e

∂b5∂

----------------- LW 6 5, S6 6,× F·5

n5( )×=

a6 t( )e∂LW 5 4,∂

--------------------- LW 6 5, S6 6,× F·5

n5( )× a6 4, t( )[ ]T×=

a6 t( )e

∂IW 5 1,∂

------------------- LW 6 5, S6 6,× F·5

n5( )× r t( )[ ]T×=

a6 t( )e∂LW 5 6,∂

--------------------- LW 6 5, S6 6,× F·5

n5( )× a6 6, t( )[ ]T×=

a6 t( )e

∂a6 4, t( )∂

-------------------- LW 6 5, S6 6,× F·5

n5( )× LW 5 4,×=

a6 t( )e

∂a6 6, t( )∂

-------------------- LW 6 5, S6 6,× F·5

n5( )× LW 5 6,×=

ae 4t( )∂

∂a4 t( )----------------- I= S4 4, ae 4

t( )∂∂n4 t( )-----------------≡ F

·4n4( )=

348

e the

;

Layer 3 is not the output of a subnet, so the resulting equations are:

Layer 3 has two delayed inputs from other subnets, so we must computfollowing explicit derivatives:

Layer 2 is the end of the first subnet, so we apply the equations

;

a4 t( )e

∂b4∂

----------------- F·4

n4( )=a4 t( )

e∂LW 4 3,∂

--------------------- F·4

n4( ) a3 t( )[ ]T⋅=

ae 4t( )∂

∂a3 t( )----------------- LW 4 3, S4 4,×=

S3 4, ae 4t( )∂

∂n3 t( )-----------------≡ LW 4 3, S4 4,× F

·3n3( )×=

a4 t( )e∂b3∂

----------------- LW 4 3, S4 4,× F·3

n3( )×=

a4 t( )e

∂LW 3 2,∂

--------------------- LW 4 3, S4 4,× F·3

n3( )× a4 2, t( )[ ]T×=

a4 t( )e∂IW 3 1,∂

------------------- LW 4 3, S4 4,× F·3

n3( )× r t( )[ ]T×=

a4 t( )e

∂LW 3 4,∂

--------------------- LW 4 3, S4 4,× F·3

n3( )× a4 4, t( )[ ]T×=

a4 t( )e∂a4 2, t( )∂

-------------------- LW 4 3, S4 4,× F·3

n3( )× LW 3 2,×=

a4 t( )e

∂a4 4, t( )∂

-------------------- LW 4 3, S4 4,× F·3

n3( )× LW 3 4,×=

ae 2t( )∂

∂a2 t( )----------------- I= S2 2, ae 2

t( )∂∂n2 t( )-----------------≡ F

·2n2( )=

349


s inputetion

Thetep.ep. t to

l the

;

Layer 1 is not the output of a subnet, so we apply

Layer 1 has one delayed input from another subnet, so we calculate

At this point, we have the explicit derivatives for all the weights and biasethe system. These explicit derivatives are used with Eq. (25) – Eq. (27) to comthe dynamic derivatives we need for training the network. Notice that the soluto Eq. (27) is an input to Eq. (26) and Eq. (27) on the following time step. solution to Eq. (26) is an input to Eq. (25) and Eq. (26) on the following time sFinally, the solution to Eq. (25) is an input to Eq. (25) on the following time st

After all the dynamic derivatives of the output of the system with respecthe weights and biases have been computed, we must calculate

We can then compute the derivative of the cost function with respect to alweights and biases using

a2 t( )e

∂b2∂

----------------- F·2

n2( )=a2 t( )

e∂LW 2 1,∂

--------------------- F·2

n2( ) a1 t( )[ ]T⋅=

ae 2t( )∂

∂a1 t( )----------------- LW 2 1, S2 2,×=

S1 2, ae 2t( )∂

∂n1 t( )-----------------≡ LW 2 1, S2 2,× F

·1n1( )×=

a2 t( )e∂b1∂

----------------- LW 2 1, S2 2,× F·1

n1( )×=

a2 t( )e

∂IW 1 1,∂

------------------- LW 2 1, S2 2,× F·1

n1( )× r t( )[ ]T×=

a2 t( )e

∂LW 1 2,∂

--------------------- LW 2 1, S2 2,× F·1

n1( )× a2 2, t( )[ ]T×=

a2 t( )e∂a2 2, t( )∂

-------------------- LW 2 1, S2 2,× F·1

n1( )× LW 1 2,×=

Fe∂

a6 t( )∂--------------- 2 r t( ) a6 t( )–( )–=

350

rdedticallyultswo withystemictede 15

tivesFigure

The process is repeated for each sample time in the training set.After the network was trained, it was used to predict experimentally reco

noise. The result is shown in Figure 14. The data was collected in an acous“live” environment that was conducive to sound reflection. The prediction resfor the LDRN, trained with full dynamic backpropagation, is compared to tother systems. The first comparison system is an LDRN that is trained onlystatic derivatives. The second comparison system is a non-recurrent LFFN swith a tapped-delay line at the input. Figure 14 shows the actual and predsignals when full dynamic backpropagation is used to train the LDRN. Figuris a plot of the errors between actual and predicted signals.

Figure 14. Prediction Results for LDRN with Full Dynamic Training

Figure 15. Errors for LDRN with Full Dynamic Training

The next experiment uses the same data, but only explicit (static) derivawere used. The errors between actual and predicted signals are shown in 16.

x∂∂F

x∂∂F a6 t( )∂

x∂---------------

T Fe∂

a6 t( )∂---------------+=

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0- 1

- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

0 . 8

1

S a m p l e I n d e x

predicted actual

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0- 0 . 5

- 0 . 4

- 0 . 3

- 0 . 2

- 0 . 1

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

S a m p l e N u m b e r

351


someund

be

DLwasrrors

rse. Aatic

when

Figure 16. Prediction Errors for Static Training

The results shown in Figure 16 are reasonably good. We can see degradation in performance, which might be critical in certain situations. In socancellation applications, for example, the differences would certainlydetectable by the human ear.

As a final experiment, the data was processed using an LFFN, with Tinput. The network size was adjusted so that the number of weights comparable to the LDRN used in the previous experiment. The prediction efor the LFFN are shown in Figure 17.

Figure 17. LFFN Prediction Results

The LFFN prediction performance is not too bad, but is significantly wothan the LDRN performance, when trained with full dynamic backpropagationsummary of the simulation results is provided in Table 2. Notice the dramincrease in floating point operations required to process each sample dynamic training is used.

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0- 0 . 5

- 0 . 4

- 0 . 3

- 0 . 2

- 0 . 1

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5


1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0- 0 . 5

- 0 . 4

- 0 . 3

- 0 . 2

- 0 . 1

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5


352

forrrent is ats onffect,iousthis

N),ing to

st be fortaticiningm.

VII. SUMMARY

This chapter has discussed the training of recurrent neural networkscontrol and signal processing. When computing training gradients in recunetworks, there are two different effects that we must account for. The firstdirect effect, which explains the immediate impact of a change in the weighthe output of the network at the current time. The second is an indirect ewhich accounts for the fact that some of the inputs to the network are prevnetwork outputs, which are also functions of the weights. To account for indirect effect we must use dynamic backpropagation.

This chapter has introduced the Layered Digital Recurrent Network (LDRwhich is a general class of recurrent network. A universal dynamic trainalgorithm for the LDRN was also developed. The LDRN was then appliedproblems in control and signal processing. A number of practical issues muaddressed when applying dynamic training. Computational requirementsdynamic training can be much higher than those for static training, but straining is not as accurate and may not converge. The appropriate form of trato use (dynamic, static, or some combination) varies from problem to proble

REFERENCES

Demuth, H. B. and Beale, M., Users’ Guide for the Neural Network Toolbox for

MATLAB, The Mathworks, Natick, MA, 1998.

Hagan, M.T., Demuth, H. B., Beale, M., Neural Network Design, PWS

Publishing Company, Boston, 1996.

PredictionMethod

Flops/SampleMean Squared

Prediction Error

LDRN Full Dynamic

Training4.32 x 104 .0050

LDRNStatic

Training5.19 x 103 .0087

LFFN 1.85 x 103 .0120

Table 2. Simulation Results

353


ic
Narendra, K. S. and Parthasrathy, A. M., Identification and control for dynam
systems using neural networks, IEEE Transactions on Neural Networks, 1(1), 4,

1990.

Yang, W., Neurocontrol Using Dynamic Learning, Doctoral Thesis, Oklahoma

State University, Stillwater, 1994.

Yang, W. and Hagan, M.T., Training recurrent networks, Proceedings of the 7th

Oklahoma Symposium on Artificial Intelligence, Stillwater, 226, 1993.

354

chapter 12 training recurrent networks for filtering and control · nonrecurrent (feedforward)...

Documents