chapter 12 training recurrent networks for filtering and control · nonrecurrent (feedforward)...
TRANSCRIPT
Chapter 12
TRAINING RECURRENT NETWORKSFOR FILTERING AND CONTROL
Martin T. Hagan, Orlando De Jesús, Roger Schultz
School of Electrical and Computer EngineeringOklahoma State University, Stillwater, Oklahoma
ries.ut isrentt alsoson,
have
N), thetiontainsd to thented
rrent fortheing.amic aitter.
of a
areat weigitalredack
I. INTRODUCTION
Neural networks can be classified into recurrent and nonrecurrent categoNonrecurrent (feedforward) networks have no feedback elements; the outpcalculated directly from the input through feedforward connections. In recurnetworks the output depends not only on the current input to the network, buon the current or previous outputs or states of the network. For this rearecurrent networks are more powerful than nonrecurrent networks and important uses in control and signal processing applications.
This chapter introduces the Layered Digital Recurrent Network (LDRdevelops a general training algorithm for this network, and demonstratesapplication of the LDRN to problems in controls and signal processing. In SecII we present the notation necessary to represent the LDRN. Section III cona discussion of the dynamic backpropagation algorithms that are requirecompute training gradients for recurrent networks. The concepts underlyingbackpropagation-through-time and forward perturbation algorithms are presein a unified framework and are demonstrated for a simple, single-loop recunetwork. In Section IV we describe a general forward perturbation algorithmcomputing training gradients for the LDRN. Two application sections follow discussion of dynamic backpropagation: neurocontrol and nonlinear filterThese sections demonstrate the implementation of the general dynbackpropagation algorithm. The control section (Section V) appliesneurocontrol architecture to the automatic equalization of an acoustic transmThe nonlinear filtering section (Section VI) demonstrates the application recurrent filtering network to a noise-cancellation application.
II. PRELIMINARIES
In this section we want to introduce the types of neural networks thatdiscussed in the remainder of this chapter. We also present the notation thuse to represent these networks. The networks we use are Layered DRecurrent Networks (LDRN). They are a generalization of the LayeFeedforward Network (LFFN), which has been modified to include feedb
325
Training Recurrent Networks for Filtering and Control
then
this.). Then one
,rscript. The
nsfer thehetputtput
ome), 2)N, ar.) Inwithfrom
n weers; haveould
,
connections and delays. We begin here with a description of the LFFN andshow how it can be generalized to obtain the LDRN.
A. LAYERED FEEDFORWARD NETWORKFigure 1 is an example of a layered feedforward network (two layers in
case). (See Demuth et al. [1998] for a full description of the notation used hereThe input vector to the network is represented by , which has elementssuperscript represents the input number, since it is possible to have more thainput vector. The input is connected to Layer 1 through the input weight where the first superscript represents the layer number and the second superepresents the input number. The bias for the first layer is represented by net input to Layer 1 is denoted by , and is computed as
(1)
The output of Layer 1, , is computed by passing the net input through a trafunction, according to . The output has elements. The output offirst layer is input to the second layer through the layer weight . Toverall output of the network is labeled . This is typically chosen to be the ouof the last layer in the network, as it is in Figure 1, although it could be the ouof any layer in the network.
Figure 1. Example of a Layered Feedforward Network
Each layer in the LFFN is made up of 1) a set of weight matrices that cinto that layer (which may connect from other layers or from external inputsa bias vector, 3) a summing junction, and 4) a transfer function. (In the LDRset of tapped delay lines may also be included in a layer, as we will see latethe example given in Figure 1, there is only one weight matrix associated each layer, but it is possible to have weight matrices that are connected several different input vectors and layer outputs. This will become clear wheintroduce the LDRN network. Also, the example in Figure 1 has only two layour general LFFN can have an arbitrary number of layers. The layers do notto be connected in sequence from Layer 1 to Layer M. For example, Layer 1 cbe connected to both Layer 3 and Layer 4, by weights and
p1R
1
IW 1 1,
b1
n1
n1 IW 1 1, p1 b1+=
a1
a1 f1 n1( )= S1
LW 2 1,
y
p1 a1 a2 = y
1 1
n1 n2R1 x 1
S1 x1
S1 x 1
S2x 1
S2x 1
S2x 1
Input
S2x S1AAAALW2,1
S1 x 1AAb1
S1 x RAAAAIW1,1
AAb2
Layer 1 Layer 2
R1 S 1 S 2
a2 =f2 (LW2,1a1 +b2)a1 = f1 (IW1,1p1 +b1)
AAAAAA
f1
AAAAAA
f2
LW 3 1, LW 4 1,
326
uenceby an thed in
FFN.n beialainwork
theputput.e onctor.tput.
upsferlso
(Anyple,
thate.
t iscated firstdelay
respectively. Although the layers do not have to be connected in a linear seqby layer number, it must be possible to compute the output of the network simple sequence of calculations. There cannot be any feedback loops inetwork. The order in which the individual layer outputs must be computeorder to obtain the correct network output is called the simulation order.
B. LAYERED DIGITAL RECURRENT NETWORKWe now introduce a class of recurrent networks that are based on the L
The LFFN is a static network, in the sense that the network output cacomputed directly from the network input, without the knowledge of initnetwork states. A Layered Digital Recurrent Network (LDRN) can contfeedback loops and time delays. The network response is a function of netinputs, as well as initial network states.
The components of the LDRN are the same as those of the LFFN, withaddition of the tapped delay line (TDL), which is shown in Figure 2. The outof the TDL is a vector containing current and previous values of the TDL inIn Figure 2 we show two abbreviated representations for the TDL. In the casthe left, the undelayed value of the input variable is included in the output veIn the case on the right, only delayed values of the input are included in the ou
Figure 2. Tapped Delay Line
Figure 3 is an example of an LDRN. Like the LFFN, the LDRN is madeof layers. In addition to the weight matrices, bias, summing junction, and tranfunction, which make up the layers of the LFFN, the layers of the LDRN ainclude any tapped delay lines that appear at the input of a weight matrix. weight matrix in an LDRN can be proceeded by a tapped delay line.) For examLayer 1 of Figure 3 contains the weight and the TDL at its input. Note all of the layer outputs and net inputs in the LDRN are explicit functions of tim
The output of the TDL in Figure 3 is labeled . This indicates that ia composite vector made up of delayed values of the output of Subnet 2 (indiby the second superscript) and is an input to Subnet 2 (indicated by thesuperscript). (A subnet is a series of layers which have no internal tapped
a(t)
AAAAD
AAD
AAD
a(t - 1)
a(t - d)
a(t)
a(t)
d
TDL
x(t)
Sx1 Sdx1
a(t)
d
TDL
x(t)
S(d+1)x1Sx1
Undelayed ValueIncluded
Undelayed ValueDiscarded
Abbreviated Notation
LW 1 2,
a2 2,
t( )
327
Training Recurrent Networks for Filtering and Control
layer.tputs
theork
ts inf thein then the the
also the
theient
tion
et of
lines. The number of the subnet is the same as the number of its output These concepts are defined more carefully in a later section.) These TDL ouare important variables in our training algorithm for the LDRN.
Figure 3. Layered Digital Recurrent Network Example
In the LDRN, feedback is added to an LFFN. Therefore, unlike the LFFN,output of the network is a function not only of the weights, biases, and netwinput, but also of the outputs of some of the network layers at previous pointime. For this reason, it is not a simple matter to calculate the gradient onetwork output with respect to the weights and biases (which is needed to tranetwork). This is because the weights and biases have two different effects onetwork output. The first is the direct effect, which can be calculated usingstandard backpropagation algorithm [Hagan et al., 1996]. The second is anindirect effect, since some of the inputs to the network, such as , arefunctions of the weights and biases. In the next section we briefly describegradient calculations for the LFFN and show how they must be modified forLDRN. The main development of the next two sections is a general gradcalculation for arbitrary LDRN’s.
III. PRINCIPLES OF DYNAMIC LEARNING
Consider again the multilayer network of Figure 1. The basic simulaequation of such a network is
, (2)
where k is incremented through the simulation order. The task of the network is to learn associations between a specified s
input/output pairs: {(p1, t1), (p2, t2), ... , (pQ, tQ)}. The performance index for thenetwork is
p1(t)
a1(t)
a2 (t)
1 1
n1(t) n2(t)
Input
AAAALW2,1
AAAAAA
b1AAAAAAIW1,1
AAAA
b2
Layer 1 Layer 2
AAAAAA
AAAAAA
f1
AAAAAA
f2
TDLd1,2AAAA
LW1,2
a2,2(t)
y(t)
a1 2,
t( )
ak fk IW k i, pi
i
∑ LW k j, aj
j
∑ bk+ +
=
328
use
input/
it
tion,
(3)
where is the output of the network when the qth input, , is presented, and is a vector containing all of the weights and biases in the network. (Later we
to represent the weights and biases in Layer i.) The network should learn the vector that minimizes .
For the standard backpropagation algorithm [Hagan et al., 1996] we use asteepest descent learning rule. The performance index is approximated by:
, (4)
where the total sum of squares is replaced by the squared errors for a singleoutput pair. The approximate steepest (gradient) descent algorithm is then:
, (5)
where α is the learning rate. Define
(6)
as the sensitivity of the performance index to changes in the net input of uni inlayer k. Using the chain rule, we can show that
, ,
(7)
It can also be shown that the sensitivities satisfy the following recurrence relain which m is incremented through the backpropagation order, which is thereverse of the simulation order:
(8)
where
F x( ) tq yq–( )T tq yq–( )
q 1=
Q
∑ eqTeq
q 1=
Q
∑= =
yq pq x
xi xF
F̂ eqTeq=
∆wi j,k l, α F̂∂
wi j,k l,∂
------------–= ∆bik α F̂∂
bik∂
--------–=
sik F̂∂
nik∂
--------≡
F̂∂
iwi j,m l,∂
----------------F̂∂
nim∂
---------ni
m∂
iwi j,m l,∂
----------------× sim
pjl
= =F̂∂
lwi j,m l,∂
----------------F̂∂
nim∂
---------ni
m∂
lwi j,m l,∂
----------------× sim
ajl
= =
F̂∂
bim∂
---------F̂∂
nim∂
---------ni
m∂
bim∂
---------× sim
= =
sm F·m
nm( ) LW i m,( )
Tsi
i
∑=
329
Training Recurrent Networks for Filtering and Control
the (11). (7).at wendard
finderentd for thecount
FNh is thed the
(9)
and
(10)
This recurrence relation is initialized at the output layer:
. (11)
The overall learning algorithm now proceeds as follows: first, propagateinput forward using Eq. (2); next, propagate the sensitivities back using Eq.and Eq. (8); and finally, update the weights and biases using Eq. (5) and Eq
Now consider an LDRN, such as the one shown in Figure 3. Suppose thuse the same gradient descent algorithm, Eq. (5), that is used in the stabackpropagation algorithm. The problem in this case is that when we try tothe equivalent of Eq. (7) we note that the weights and biases have two diffeffects on the network output. The first is the direct effect, which is accounteby Eq. (7). The second is an indirect effect, since some of the inputs tonetwork, such as , are also functions of the weights and biases. To acfor this indirect effect we must use dynamic backpropagation.
To illustrate dynamic backpropagation [Yang et al., 1993, Yang, 1994],consider Figure 4, which is a simple recurrent network. It consists of an LFwith a single feedback loop added from the output of the network, whicconnected to the input of the network through a single delay. In this figurevector represents all of the network parameters (weights and biases) anvector represents the output of the LFFN at time step t.
Figure 4. Simple Recurrent Network
F·m
nm( )
f·m
n1m( ) 0 … 0
0 f·m
n2m( ) … 0
0 0 … f·m
nS
mm( )
= … … …f·
mnj
m( )fm
njm( )∂
njm∂
--------------------=
sM2F
·MnM
( ) tq yq–( )–=
a1 2,
t( )
xa t( )
p(t)a(t)
LFFN
a(t) = NN(p(t),a(t-1),x)
AD
330
They
ctdardt are
that
TT)
Now suppose that we want to minimize
(12)
In order to use gradient descent, we need to find the gradient of F with respect tothe network parameters. There are two different approaches to this problem.both use the chain rule, but are implemented in different ways:
(13)
or
(14)
where the superscript e indicates an explicit derivative, not accounting for indireeffects through time. The explicit derivatives can be obtained with the stanbackpropagation algorithm, as in Eq. (8). To find the complete derivatives tharequired in Eq. (13) and Eq. (14), we need the additional equations:
(15)
and
(16)
Eq. (13) and Eq. (15) make up the forward perturbation (FP) algorithm. Notethe key term is
(17)
which must be propagated forward through time.Eq. (14) and Eq. (16) make up the backpropagation-through-time (B
algorithm. Here the key term is
(18)
F x( ) t t( ) a t( )–( )T t t( ) a t( )–( )
t 1=
Q
∑=
F∂x∂
------a t( )∂x∂
-------------T ∂e
Fa t( )∂
-------------×
t 1=
Q
∑=
F∂x∂
------∂ea t( )
x∂---------------
T ∂Fa t( )∂
-------------×
t 1=
Q
∑=
a t( )∂x∂
-------------∂ea t( )
x∂---------------
∂ea t( )a t 1–( )∂
----------------------a t 1–( )∂
x∂----------------------×+=
F∂a t( )∂
-------------∂e
Fa t( )∂
-------------∂ea t 1+( )
a t( )∂-------------------------
F∂a t 1+( )∂
----------------------×+=
a t( )∂x∂
-------------
F∂a t( )∂
-------------
331
Training Recurrent Networks for Filtering and Control
theme,ativeswellt each
d toic
the
, andights
d e ther layering oforalled
islay
on ofnet tothe
il you. Thelayerer infines thelayerthe
which must be propagated backward through time.In general, the FP algorithm requires somewhat more computation than
BTT algorithm. However, the BTT algorithm cannot be implemented in real tisince the outputs must be computed for all time steps, and then the derivmust be backpropagated back to the initial time point. The FP algorithm is suited for real time implementation, since the derivatives can be calculated atime step.
IV. DYNAMIC BACKPROP FOR THE LDRN
In this section, we generalize the FP algorithm, so that it can be appliearbitrary LDRN’s. This is followed by applications of the LDRN and dynambackpropagation to problems in filtering and control.
A. PRELIMINARIESTo explain this algorithm, we must create certain definitions related to
LDRN. We do that in the following paragraphs.First, as we stated earlier, a layer consists of a set of weights, associated
tapped delay lines, a summing function, and a transfer function. The network hasinputs that are connected to special weights, called input weights, and denoted by
, where denotes the number of the input vector that enters the weight denotes the number of the layer to which the weight is connected. The we
connecting one layer to another are called layer weights and are denoted by, where denotes the number of the layer coming into the weight an
denotes the number of the layer at the output of weight. In order to calculatnetwork response in stages, layer by layer, we need to proceed in the propeorder, so that the necessary inputs at each layer will be available. This orderlayers is called the simulation order. In order to backpropagate the derivatives fthe gradient calculations, we must proceed in the opposite order, which is cthe backpropagation order.
In order to simplify the description of the training algorithm, the LDRN divided into subnets. A subnet is a section of a network that has no tapped delines, except at the subnet input. Every LDRN can be organized as a collectisubnets. We define the subnets by proceeding backwards from the last subthe first subnet. To locate the last subnet, start at the first layer in backpropagation order and proceed through the backpropagation order untfind a layer containing delays, which becomes the first layer in the last subnetlast subnet is then defined as containing all of the layers beginning at the containing delays and continuing through the simulation order to the first laythe backpropagation order (or the last layer in the simulation order). This dethe last subnet. To find the preceding subnet, start with the next layer inbackpropagation order and proceed in the same way until you find the next with delays. This process continues until you reach the last layer in backpropagation order, at which time the first subnet is defined. As with the layer
IW i j, ji
LW i j, j i
332
nets.yer.-2-1
13)ave
f thect ofndardhichviousack(15),ate theave
put toationsq. (8),ies anhich the
simulation order, we can also define a subnet simulation order that starts at thefirst subnet and continues until the last subnet.
For example, the LDRN shown in Figure 5 has thee layers and two subTo simplify the algorithm, the subnet is denoted by the number of its output laFor this network the simulation order is 1-2-3, the backpropagation order is 3and the subnet simulation order is 1-3.
Figure 5. Three-layer LDRN with Two Subnets
B. EXPLICIT DERIVATIVESWe want to generalize the forward perturbation (FP) algorithm of Eq. (
and Eq. (15) so that it can be applied to an arbitrary LDRN. Notice that we htwo terms on the right-hand side of Eq. (15). We have an explicit derivative operformance with respect to the weights, which accounts for the direct effethe weights on the performance and can be computed through the stabackpropagation algorithm, as in Eq. (8). We also have a second term, waccounts for the fact that the weights have a secondary effect through the prenetwork output. In the general LDRN, we may have many different feedbloops, and therefore there could be many different terms on the right of Eq. and each of those terms would have a separate equation like Eq. (15) to updtotal derivative through time. For our development of the FP algorithm, we hone term (and an additional equation) for every place where one subnet is inanother subnet. Recall that the subnet boundaries are determined by the locof the tapped delay lines. Within a subnet, standard backpropagation, as in Ecan be used to propagate the explicit derivatives, but at the subnet boundarequation like Eq. (15) must be used to calculate the total derivative, wincludes both direct and indirect effects. In this subsection we describe
p1(t) a1(t)a2 (t)
1 1
n1(t) n2(t)
AAAALW2,1
AA
b1
AAAAIW1,1
AAAA
b2
Layer 1 Layer 2
AAAAAA
f1
AAA
f2
TDL
d 1,1AAAA
LW1,1
a1,3(t)
y(t)
TDL
d 1,3AAAALW1,3
a1,1(t)
TDL
d 2,3 AAAALW2,3
a3,3(t)
TDLd
2,1 AAAALW3,2
AAAA
b3
AAA
f3
1
a3 (t)
Layer 3
Subnet 3Subnet 1
a3,1(t)n3(t)
333
Training Recurrent Networks for Filtering and Control
we
d forbnet
et
er,
trary(13)15),e Eq.xplicit
ngplicitamicubnets the3).
computation of the explicit derivatives, and then in the following subsectionexplain the total derivative computation.
A backpropagation process to calculate the explicit derivatives is needeeach subnet. These equations involve calculating the derivative of the suoutput with respect to each layer output in the subnet. The basic equation is
(19)
where represents a subnet output, is the output of a layer in subnjz, is the net input of layer k, which has a connection from layer i. The index i
is incremented through the backpropagation order. If we define
, and note that , (20)
then we can write Eq. (19) as
. (21)
This recursion, where i is incremented along the backpropagation ordbegins at the last layer in the subnet:
, (22)
where is an identity matrix whose dimension is the size of layer .
C. COMPLETE FP ALGORITHM FOR THE LDRNWe are now ready to describe a generalized FP algorithm for the arbi
LDRN. There are two parts to the FP algorithm: Eq. (13) and Eq. (15). Eq. remains the same for the LDRN as for the simple network in Figure 4. Eq. (however, must be modified. For the general case we have one equation lik(15) for each subnet output. Each of these equations has a term for the ederivative and one additional term for each subnet output.
The complete FP algorithm for the LDRN network is given in the followiflowchart. It contains three major sections. The first section computes the ex(static) derivatives, as in Eq. (21), which are needed as part of the dynequations. The second section computes the dynamic derivatives of the soutputs with respect to the weights, as in Eq. (15). The final section computedynamic derivatives of performance with respect to the weights, as in Eq. (1
ae jzt( )∂
∂ai t( )-------------------
ne kt( )∂
∂ai t( )------------------
ae jzt( )∂
∂nk t( )-------------------×
k
∑=
ajz t( ) ai t( )nk t( )
Sk jz, ae jzt( )∂
∂nk t( )-------------------≡ ne k
t( )∂∂ai t( )------------------ LW k i,=
ae jzt( )∂
∂ai t( )------------------- LW k i, Sk jz,×
k
∑=
ae jzt( )∂
∂ajz t( )------------------- I=
I jz
334
Start
Select i = next layer in backpropagation order
Is ioutput ofa subnet?
Yes
Set jz = i
ae jzt( )∂
∂ajz t( )------------------- I=Set
No
for all k layers connected
ae jzt( )∂
∂ai t( )------------------- LW k i, Sk jz,×
k
∑=
Calculate ajz t( )
e∂
ni t( )∂-------------------
Calculate for weights and biases in layer i ( )
YesFor each layer j that
ajz t( )e
∂ajz j, t( )∂
--------------------
Endbackprop-agation ?
No
No Yesα
in the network (x); Set initial time t = 1
δ
Select i = last layer in backpropagation order
connects to layer icalculate
to layer i
ajz t( )e
∂xi∂
-------------------
Initialize for all weights and biasesF∂x∂
------ 0=
xi
Is iinput ofa subnet?
Explicit (Static) Derivatives
335
Training Recurrent Networks for Filtering and Control
jz = First subnet; jj = First subnet
Yes
Increment jj
Calculate:
for all weights and biases in network (x).
Isjj > lastNo
Yes
α
Is the
ajz t( )∂x∂
----------------ajz t( )∂
x∂----------------
ajzet( )∂
ajz j j, t( )∂----------------------
ajz j j, t( )∂x∂
----------------------+=
Yes
No
subnet ?
Increment jz; jj = First subnet
Isjz > lastNo
subnet ?β
output of subnet jj an input to subnet jz
ajz t( )∂x∂
----------------ajze
t( )∂x∂
-------------------=Initialize for all weights and biases (x)
Dynamic Derivatives of Network Outputs with Respect to Weights
336
ic an
n tousticustic
rentovidedoftenuch aserent
mitted
V. NEUROCONTROL APPLICATION
In this section we illustrate the application of the LDRN and dynambackpropagation to a control problem. We wish to control the output ofacoustic transmitter, whose schematic diagram is shown in Figure 6.
Binary data is transmitted via acoustic signals from one pipeline locatioanother. Acoustic stress waves are imparted into the pipeline by the acotransmitter. The stress waves propagate through the pipeline to the acoreceiver, which receives the transmitted signal. Tone bursts of diffefrequencies are used to represent either a 1 or a 0. The acoustic channel prby the pipeline causes heavy distortion in the transmitted signal. There are extraneous unwanted acoustic stress waves created by external sources, sengines, geartrains, and pumps, that are imparted into the pipeline. The inhchannel distortion and the unwanted external noise can degrade the trans
jz = Last subnet
Yes
Calculate
Calculate for the output layer jz
β
Increment time t = t+ 1
Ist > last timeNoin training
Fe∂
ajz t( )∂----------------
x∂∂F
x∂∂F ajz t( )∂
x∂----------------
T Fe∂
ajz t( )∂----------------+=
δ End
set ?
for all weights and biases in the network (x)
Dynamic Derivatives of Performance with Respect to Weights
337
Training Recurrent Networks for Filtering and Control
ble oralize that theignalsured, the
ctiveh, andtwork
for witharedsed,
.encetheth forts arebackat the
ller
signal such that signal detection and interpretation at the receiver are unreliaimpossible. One method of enhancing communication performance is to equthe effects of the transmission channel by adjusting the transmitter output sothe measured signal imparted to the pipeline at a short distance fromtransmitter is the desired signal. Ideally, the feedback measurement of this sis taken at the receiver. Of course, in practice, the feedback signal is meanear the transmitter. To alleviate the effects of unwanted disturbance noisetransmitter can actively cancel the disturbance noise by way of destruinterference. The transmitter creates stress waves that are out of phase witof equal magnitude to, the undesired signals. In this example, a neural necontroller is used as both a channel equalizer and an active noise canceller.
Figure 6. Schematic of Acoustic Transmitter/Receiver System.
In addition to illustrating dynamic backpropagation on the neurocontrollerthe acoustic transmitter, this section also demonstrates the effect of trainingapproximations to true dynamic derivatives. The evaluation is based on squerror performance and floating point operations. When approximations are uthe computational burden can be reduced, but the errors generally increase
Figure 7 is a schematic of the control system. In this system, model referadaptive control (MRAC) [Narendra, 1990] is applied to the control of acoustic transmitter. The plant model is used only as a backpropagation pathe derivatives needed to adjust the controller weights; the plant model weighnot adjusted. The plant model is a 2-layer LDRN, with 10 input taps, 50 feedtaps, and 15 hidden neurons. The controller weights are adjusted such therror ec(t), between a delayed reference input r (t) and the actual plant output c(t),is minimized. The controller structure consists of 40 input taps, 50 controfeedback taps, 75 plant output feedback taps, and 15 hidden neurons.
Neurocontroller +-
Transmitter Receiver
Amplifier
Pipeline
Accelerometer
Acoustic Acoustic
338
339
r
r(
a3 (t)n3(t)
f2
AAAAAA
f3 y(t)AAAALW4,3
AAb4 AAAAAA
f4
1
a4 (t)
l Network Plant Model
n4(t)AAAAAAAAAousticsmitter
ep(t)
ec(t)
c(t)
Figure 7. Self-Equalizing Acoustic Transmitte
t) a1(t)
1
1
n1(t)
AAAALW3,2A
Ab1
AAAA
IW1,1
AAb3
AAAAAAAA
f1
TDL50
AALW1,2
a2,4(t)
TDL75AAAA
LW1,4
a2,2(t)
TDL50 AAAALW3,4
a4,4(t)
TDL10
NeuraNeural Network Controller
a4,2(t)
a2 (t)
1
AAAA
LW2,1
AAb2
f2
AAAAAAAA
f2
AAAAAAAAAAc
TranTDL40
n2(t)
Training Recurrent Networks for Filtering and Control
stemidedtworkr 2)The
al FPstart
If we apply the concepts described in the previous sections to the syshown in Figure 7, we notice that the total system is an LDRN that can be divinto two subnets. The last subnet (number 4) corresponds to the Neural NePlant Model (with as the subnet output). The first subnet (numbecorresponds to the Neural Controller (with as the subnet output). subnets are fully connected, so we have two sets of training equations:
(23)
and
(24)
We now show how these equations can be developed using our generalgorithm, which was described in the flowchart in the previous section. We in the last layer in the backpropagation order (layer 4), obtaining:
;
;
Layer 3 is not the output of a subnet, so we apply:
a4t( )
a2t( )
a4 t( )∂x∂
---------------a4 t( )
e∂
x∂-----------------
a4et( )∂
a4 2, t( )∂--------------------
a4 2, t( )∂x∂
--------------------a4e
t( )∂a4 4, t( )∂
--------------------a4 4, t( )∂
x∂--------------------+ +=
a2 t( )∂x∂
---------------a2 t( )
e∂
x∂-----------------
a2et( )∂
a2 4, t( )∂--------------------
a2 4, t( )∂x∂
--------------------a2e
t( )∂a2 2, t( )∂
--------------------a2 2, t( )∂
x∂--------------------+ +=
ae 4t( )∂
a4 t( )----------------- I= S4 4, ae 4
t( )∂n4 t( )
-----------------≡ F·4
n4( )=
a4 t( )e
∂b4∂
----------------- F·4
n4( )=a4 t( )
e∂LW 4 3,∂
--------------------- F·4
n4( ) a3 t( )[ ]T⋅=
ae 4t( )∂
a3 t( )----------------- LW 4 3, S4 4,×=
S3 4, ae 4t( )∂
n3 t( )-----------------≡ LW 4 3, S4 4,× F
·3n3( )×=
a4 t( )e
∂b3∂
----------------- LW 4 3, S4 4,× F·3
n3( )×=
a4 t( )e
∂LW 3 2,∂
--------------------- LW 4 3, S4 4,× F·3
n3( )× a4 2, t( )[ ]T×=
340
bnet,
the
Layer 3 has two inputs with delays, therefore it is the beginning of the last suand we calculate:
Layer 2 in the neural controller is the end of the first subnet. So we applyequations:
;
;
Layer 1 is not the output of a subnet, so we apply:
a4 t( )e
∂LW 3 4,∂
--------------------- LW 4 3, S4 4,× F·3
n3( )× a4 4, t( )[ ]T×=
a4 t( )e∂a4 2, t( )∂
-------------------- LW 4 3, S4 4,× F·3
n3( )× LW 3 2,×=
a4 t( )e
∂a4 4, t( )∂
-------------------- LW 4 3, S4 4,× F·3
n3( )× LW 3 4,×=
ae 2t( )∂
a2 t( )----------------- I= S2 2, ae 2
t( )∂n2 t( )
-----------------≡ F·2
n2( )=
a2 t( )e
∂b2∂
----------------- F·2
n2( )=a2 t( )
e∂LW 2 1,∂
--------------------- F·2
n2( ) a1 t( )[ ]T⋅=
ae 2t( )∂
a1 t( )----------------- LW 2 1, S2 2,×=
S1 2, ae 2t( )∂
n1 t( )-----------------≡ LW 2 1, S2 2,× F
·1n1( )×=
a2 t( )e
∂b1∂
----------------- LW 2 1, S2 2,× F·1
n1( )×=
a2 t( )e
∂LW 1 1,∂
--------------------- LW 2 1, S2 2,× F·1
n1( )× a2 2, t( )[ ]T×=
a2 t( )e∂LW 4 1,∂
--------------------- LW 2 1, S2 2,× F·1
n1( )× a2 4, t( )[ ]T×=
341
Training Recurrent Networks for Filtering and Control
d we
thenow
med
thees of
Layer 1 has two inputs with delays, so it is the end of the first subnet, ancalculate
Now that we have finished with the backpropagation step, we haveexplicit derivatives for all of the weights and biases in the system. We are ready to calculate the dynamic derivatives. We initialize
and calculate
and
.
This gives us Eq. (24) for all weights and biases. A similar process is perforfor subnet 4 to obtain Eq. (23).
We have now computed all of the dynamic derivatives of the outputs ofsubnets with respect to the weights. The next step is to compute the derivativthe performance function with respect to the weights. We must first calculate
,
to obtain
a2 t( )e
∂IW 1 1,∂
------------------- LW 2 1, S2 2,× F·1
n1( )× r t( )[ ]T×=
a2 t( )e∂a2 2, t( )∂
-------------------- LW 2 1, S2 2,× F·1
n1( )× LW 1 2,×=
a2 t( )e
∂a2 4, t( )∂
-------------------- LW 2 1, S2 2,× F·1
n1( )× LW 1 4,×=
a2 t( )∂x∂
---------------a2e
t( )∂x∂
-----------------=
a2 t( )∂x∂
---------------a2 t( )∂
x∂---------------
a2et( )∂
a2 2, t( )∂--------------------
a2 2, t( )∂x∂
--------------------+=
a2 t( )∂x∂
---------------a2 t( )∂
x∂---------------
a2et( )∂
a2 4, t( )∂--------------------
a2 4, t( )∂x∂
--------------------+=
Fe∂
a4 t( )∂--------------- 2 r t( ) a4 t( )–( )–=
342
ed for
traterenceulses,mly at
ller, aneousponse, the
wasaticlicitan thehown
te a plantlicit plot
for all the weights and biases in the neural controller. The process is repeateach sample time in the training set.
The LDRN was trained using the preceding equations. Now we demonsthe performance of the closed-loop acoustic transmitter system. The refeinput used in the simulation consisted of a random sequence of tone burst pas shown in Figure 8. The tone bursts are evenly spaced and appear randoone of two frequencies. In order to investigate the robustness of the controperiodic disturbance noise was added to the plant input, representing extraacoustic stress waves created by external sources. The open-loop plant reswith no controller, is shown in Figure 9. The closed-loop response, aftercontroller was trained to convergence, is shown in Figure 10.
Figure 8. Tone Burst Reference Input
In the next test, dynamic backpropagation was used in plant model, butnot used in backpropagating derivatives in the controller. Only stbackpropagation was used in the controller. (In Eq. (24) only the expderivative terms are calculated.) This procedure requires less computation thfull dynamic backpropagation but may not be as accurate. The results are sin Figure 11.
For the last test, dynamic backpropagation was only used to compudynamic derivative across the first delay in the tapped-delay line between themodel and the controller. All other derivatives were computed using only exp(static) derivatives. The controller weights never converged in this case, so aof the results is not shown. The results are summarized in Table 1.
x∂∂F
x∂∂F a4 t( )∂
x∂---------------
T Fe∂
a4 t( )∂---------------+=
2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0
- 1
- 0 . 8
- 0 . 6
- 0 . 4
- 0 . 2
0
0 . 2
0 . 4
0 . 6
0 . 8
1
343
Training Recurrent Networks for Filtering and Control
Figure 9. Open-loop Plant Response
Figure 10. Closed-Loop System Response (Full Dynamic Training)
Figure 11. Response without Dynamic Controller Training
2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0
- 1
- 0 . 8
- 0 . 6
- 0 . 4
- 0 . 2
0
0 . 2
0 . 4
0 . 6
0 . 8
1
2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0
- 1
- 0 . 8
- 0 . 6
- 0 . 4
- 0 . 2
0
0 . 2
0 . 4
0 . 6
0 . 8
1
0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0- 1
- 0 . 8
- 0 . 6
- 0 . 4
- 0 . 2
0
0 . 2
0 . 4
0 . 6
0 . 8
1
344
ests.atingple, usedthemic8%.
andt anctive, dueoundt filter
stic activecation
Table 1 provides a summary of the performance results for all three tThese results highlight the increased computational burden when calculdynamic derivatives instead of simply static derivatives. In this examreasonable performance was possible even when dynamic derivatives wereonly in the plant model. This derivative approximation decreased computational burden by approximately 65%. Using essentially no dynaderivatives in training reduced the computational burden by approximately 9However, performance in this case was unacceptable.
VI. RECURRENT FILTER
This section provides a second example of the application of the LDRNdynamic backpropagation. We use a multi-loop recurrent network to predicexperimental acoustic signal. The prediction of acoustic signals is crucial in asound cancellation systems. Acoustic environments are often very complexto the complexity of typical sound sources and the presence of reflected swaves. The dynamic nature of acoustical systems makes the use of recurrenstructures of great interest for prediction and control of this type of system.
Figure 12. Active Noise Control System
Figure 12 depicts a typical Active Noise Control (ANC) system. An acounoise source creates undesirable noise in a surrounding area. The goal of thenoise suppression system is to reduce the undesirable noise at a particular lo
Derivative Method
Flops/SampleSum Squared
Error
Full Dynamic 9.83 x 105 43.44
Plant Only Dynamic 3.48 x 105 55.53
No Dynamic 1.85 x 104 127.88
Table 1. Simulation Results for the Neural Controller
UNDESIREDNOISE
NEURAL NETWORK
Noise Control Zone
MIC 2
MIC 1
ERROR
-
ACOUSTIC NOISEFILTER PATH
+
345
Training Recurrent Networks for Filtering and Control
noiseely,nted
forction
stic the
three
re is
al FP last
by using a loudspeaker to produce “anti-noise” that attenuates the unwantedby destructive interference. In order for a system of this type to work effectivit is critical that the ANC system be able to predict (and then cancel) unwasound in the noise control zone.
In the first part of this section we develop the dynamic training equationsthe ANC system. Then we present experimental results showing the prediperformance.
Figure 13 shows the structure of the LDRN used for predicting the acoudata. In this network there are 3 cascaded recurrent structures. If we followmethods described in Section IV, we see that the system is composed ofsubnets. Therefore, we have three sets of training equations:
(25)
(26)
(27)
Notice that in Eq. (27) there is only one dynamic term. This is because theonly one tapped-delay input that comes from a subnet.
We now show how these equations can be developed using our generprocedure, which was described in the flowchart of Section IV. We start in thelayer in the backpropagation order (Layer 6) to get the following equations:
,
,
a6 t( )∂x∂
---------------a6 t( )
e∂
x∂-----------------
a6et( )∂
a6 4, t( )∂--------------------
a6 4, t( )∂x∂
--------------------a6e
t( )∂a6 6, t( )∂
--------------------a6 6, t( )∂
x∂--------------------+ +=
a4 t( )∂x∂
---------------a4 t( )
e∂
x∂-----------------
a4et( )∂
a4 2, t( )∂--------------------
a4 2, t( )∂x∂
--------------------a4e
t( )∂a4 4, t( )∂
--------------------a4 4, t( )∂
x∂--------------------+ +=
a2 t( )∂x∂
---------------a2 t( )
e∂
x∂-----------------
a2et( )∂
a2 2, t( )∂--------------------
a2 2, t( )∂x∂
--------------------+=
ae 6t( )∂
∂a6 t( )----------------- I= S6 6, ae 6
t( )∂∂n6 t( )-----------------≡ F
·6n6( )=
a6 t( )e
∂b6∂
----------------- F·6
n6( )=a6 t( )
e∂LW 6 5,∂
--------------------- F·6
n6( ) a5 t( )[ ]T⋅=
346
Figure 13. Cascaded Recurrent Neural Network
a1(t)
1
n1(t)
AAAA
b1AAAAIW1,1
AAA
f1
a2,2(t)
TDL10AAAAAALW1,2
1AAAALW2,1
AAAA
b2
AAAAAA
f2
TDL10
n2(t)
a3(t)
1
n3(t)
AAb3
AAAAIW3,1
AAA
f3
TDL5AAAAAA
LW3,2
a4,4(t)
TDL10AAAAAA
LW3,4
a4,2(t)
1
AAAALW2,1
AAb4 AAAAAA
f4
TDL10
n4(t)
a5(t)
1
n5(t)
AAAAb5
AAAAIW5,1
AAAA
f5
TDL5AAAAAALW5,4
a6,6(t)
TDL10
AAALW5,6
a6,4(t)
1
AAAA
LW2,1
AAAAb6
AAAAAAAA
f6
TDL10
n6(t)
r(t)
e (t)
a6(t)
a4(t)
a2(t)
347
Training Recurrent Networks for Filtering and Control
e the
plicit
Layer 5 is not the output of a subnet, so the resulting equations are:
Layer 5 has two tapped-delay inputs from subnets, so we must calculatexplicit derivatives of the subnet output with respect to these inputs to yield:
Layer 4 is the end of the second subnet, so we now calculate the exderivatives with respect to the second subnet output.
;
ae 6t( )∂
∂a5 t( )----------------- LW 6 5, S6 6,×=
S5 6, ae 6t( )∂
∂n5 t( )-----------------≡ LW 6 5, S6 6,× F
·5n5( )×=
a6 t( )e
∂b5∂
----------------- LW 6 5, S6 6,× F·5
n5( )×=
a6 t( )e∂LW 5 4,∂
--------------------- LW 6 5, S6 6,× F·5
n5( )× a6 4, t( )[ ]T×=
a6 t( )e
∂IW 5 1,∂
------------------- LW 6 5, S6 6,× F·5
n5( )× r t( )[ ]T×=
a6 t( )e∂LW 5 6,∂
--------------------- LW 6 5, S6 6,× F·5
n5( )× a6 6, t( )[ ]T×=
a6 t( )e
∂a6 4, t( )∂
-------------------- LW 6 5, S6 6,× F·5
n5( )× LW 5 4,×=
a6 t( )e
∂a6 6, t( )∂
-------------------- LW 6 5, S6 6,× F·5
n5( )× LW 5 6,×=
ae 4t( )∂
∂a4 t( )----------------- I= S4 4, ae 4
t( )∂∂n4 t( )-----------------≡ F
·4n4( )=
348
e the
;
Layer 3 is not the output of a subnet, so the resulting equations are:
Layer 3 has two delayed inputs from other subnets, so we must computfollowing explicit derivatives:
Layer 2 is the end of the first subnet, so we apply the equations
;
a4 t( )e
∂b4∂
----------------- F·4
n4( )=a4 t( )
e∂LW 4 3,∂
--------------------- F·4
n4( ) a3 t( )[ ]T⋅=
ae 4t( )∂
∂a3 t( )----------------- LW 4 3, S4 4,×=
S3 4, ae 4t( )∂
∂n3 t( )-----------------≡ LW 4 3, S4 4,× F
·3n3( )×=
a4 t( )e∂b3∂
----------------- LW 4 3, S4 4,× F·3
n3( )×=
a4 t( )e
∂LW 3 2,∂
--------------------- LW 4 3, S4 4,× F·3
n3( )× a4 2, t( )[ ]T×=
a4 t( )e∂IW 3 1,∂
------------------- LW 4 3, S4 4,× F·3
n3( )× r t( )[ ]T×=
a4 t( )e
∂LW 3 4,∂
--------------------- LW 4 3, S4 4,× F·3
n3( )× a4 4, t( )[ ]T×=
a4 t( )e∂a4 2, t( )∂
-------------------- LW 4 3, S4 4,× F·3
n3( )× LW 3 2,×=
a4 t( )e
∂a4 4, t( )∂
-------------------- LW 4 3, S4 4,× F·3
n3( )× LW 3 4,×=
ae 2t( )∂
∂a2 t( )----------------- I= S2 2, ae 2
t( )∂∂n2 t( )-----------------≡ F
·2n2( )=
349
Training Recurrent Networks for Filtering and Control
s inputetion
Thetep.ep. t to
l the
;
Layer 1 is not the output of a subnet, so we apply
Layer 1 has one delayed input from another subnet, so we calculate
At this point, we have the explicit derivatives for all the weights and biasethe system. These explicit derivatives are used with Eq. (25) – Eq. (27) to comthe dynamic derivatives we need for training the network. Notice that the soluto Eq. (27) is an input to Eq. (26) and Eq. (27) on the following time step. solution to Eq. (26) is an input to Eq. (25) and Eq. (26) on the following time sFinally, the solution to Eq. (25) is an input to Eq. (25) on the following time st
After all the dynamic derivatives of the output of the system with respecthe weights and biases have been computed, we must calculate
We can then compute the derivative of the cost function with respect to alweights and biases using
a2 t( )e
∂b2∂
----------------- F·2
n2( )=a2 t( )
e∂LW 2 1,∂
--------------------- F·2
n2( ) a1 t( )[ ]T⋅=
ae 2t( )∂
∂a1 t( )----------------- LW 2 1, S2 2,×=
S1 2, ae 2t( )∂
∂n1 t( )-----------------≡ LW 2 1, S2 2,× F
·1n1( )×=
a2 t( )e∂b1∂
----------------- LW 2 1, S2 2,× F·1
n1( )×=
a2 t( )e
∂IW 1 1,∂
------------------- LW 2 1, S2 2,× F·1
n1( )× r t( )[ ]T×=
a2 t( )e
∂LW 1 2,∂
--------------------- LW 2 1, S2 2,× F·1
n1( )× a2 2, t( )[ ]T×=
a2 t( )e∂a2 2, t( )∂
-------------------- LW 2 1, S2 2,× F·1
n1( )× LW 1 2,×=
Fe∂
a6 t( )∂--------------- 2 r t( ) a6 t( )–( )–=
350
rdedticallyultswo withystemictede 15
tivesFigure
The process is repeated for each sample time in the training set.After the network was trained, it was used to predict experimentally reco
noise. The result is shown in Figure 14. The data was collected in an acous“live” environment that was conducive to sound reflection. The prediction resfor the LDRN, trained with full dynamic backpropagation, is compared to tother systems. The first comparison system is an LDRN that is trained onlystatic derivatives. The second comparison system is a non-recurrent LFFN swith a tapped-delay line at the input. Figure 14 shows the actual and predsignals when full dynamic backpropagation is used to train the LDRN. Figuris a plot of the errors between actual and predicted signals.
Figure 14. Prediction Results for LDRN with Full Dynamic Training
Figure 15. Errors for LDRN with Full Dynamic Training
The next experiment uses the same data, but only explicit (static) derivawere used. The errors between actual and predicted signals are shown in 16.
x∂∂F
x∂∂F a6 t( )∂
x∂---------------
T Fe∂
a6 t( )∂---------------+=
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0- 1
- 0 . 8
- 0 . 6
- 0 . 4
- 0 . 2
0
0 . 2
0 . 4
0 . 6
0 . 8
1
S a m p l e I n d e x
predicted actual
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0- 0 . 5
- 0 . 4
- 0 . 3
- 0 . 2
- 0 . 1
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
S a m p l e N u m b e r
351
Training Recurrent Networks for Filtering and Control
someund
be
DLwasrrors
rse. Aatic
when
Figure 16. Prediction Errors for Static Training
The results shown in Figure 16 are reasonably good. We can see degradation in performance, which might be critical in certain situations. In socancellation applications, for example, the differences would certainlydetectable by the human ear.
As a final experiment, the data was processed using an LFFN, with Tinput. The network size was adjusted so that the number of weights comparable to the LDRN used in the previous experiment. The prediction efor the LFFN are shown in Figure 17.
Figure 17. LFFN Prediction Results
The LFFN prediction performance is not too bad, but is significantly wothan the LDRN performance, when trained with full dynamic backpropagationsummary of the simulation results is provided in Table 2. Notice the dramincrease in floating point operations required to process each sample dynamic training is used.
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0- 0 . 5
- 0 . 4
- 0 . 3
- 0 . 2
- 0 . 1
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
S a m p l e N u m b e r
1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0- 0 . 5
- 0 . 4
- 0 . 3
- 0 . 2
- 0 . 1
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
S a m p l e N u m b e r
352
forrrent is ats onffect,iousthis
N),ing to
st be fortaticiningm.
VII. SUMMARY
This chapter has discussed the training of recurrent neural networkscontrol and signal processing. When computing training gradients in recunetworks, there are two different effects that we must account for. The firstdirect effect, which explains the immediate impact of a change in the weighthe output of the network at the current time. The second is an indirect ewhich accounts for the fact that some of the inputs to the network are prevnetwork outputs, which are also functions of the weights. To account for indirect effect we must use dynamic backpropagation.
This chapter has introduced the Layered Digital Recurrent Network (LDRwhich is a general class of recurrent network. A universal dynamic trainalgorithm for the LDRN was also developed. The LDRN was then appliedproblems in control and signal processing. A number of practical issues muaddressed when applying dynamic training. Computational requirementsdynamic training can be much higher than those for static training, but straining is not as accurate and may not converge. The appropriate form of trato use (dynamic, static, or some combination) varies from problem to proble
REFERENCES
Demuth, H. B. and Beale, M., Users’ Guide for the Neural Network Toolbox for
MATLAB, The Mathworks, Natick, MA, 1998.
Hagan, M.T., Demuth, H. B., Beale, M., Neural Network Design, PWS
Publishing Company, Boston, 1996.
PredictionMethod
Flops/SampleMean Squared
Prediction Error
LDRN Full Dynamic
Training4.32 x 104 .0050
LDRNStatic
Training5.19 x 103 .0087
LFFN 1.85 x 103 .0120
Table 2. Simulation Results
353
Training Recurrent Networks for Filtering and Control
ic
Narendra, K. S. and Parthasrathy, A. M., Identification and control for dynamsystems using neural networks, IEEE Transactions on Neural Networks, 1(1), 4,
1990.
Yang, W., Neurocontrol Using Dynamic Learning, Doctoral Thesis, Oklahoma
State University, Stillwater, 1994.
Yang, W. and Hagan, M.T., Training recurrent networks, Proceedings of the 7th
Oklahoma Symposium on Artificial Intelligence, Stillwater, 226, 1993.
354