robustness quantification of recurrent neural network using unscented transform

Neurocomputing 74 (2010) 354–361

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/neucom

Robustness quantification of recurrent neural networkusing unscented transform

Xiaoyu Wang a, Yong Huang a,�, Nhan Nguyen b

a Department of Mechanical Engineering, Clemson University, Clemson, SC 29634-0921, USAb Intelligent Systems Division, NASA Ames Research Center, Moffett Field, CA 94035, USA

a r t i c l e i n f o

Article history:

Received 21 August 2009

Received in revised form

31 December 2009

Accepted 19 March 2010

Communicated by W. Luanalysis is performed to quantify the robustness of a recurrent neural network output due to

Available online 27 March 2010

Keywords:

Recurrent neural network

Robustness

Uncertainty propagation

Unscented transform

12/$ - see front matter & 2010 Elsevier B.V. A

016/j.neucom.2010.03.010

esponding author. Tel.: +1 864 656 5643; fax

ail address: [email protected] (Y. Huang).

a b s t r a c t

Artificial recurrent neural network has been proved to be a valuable tool in modeling nonlinear

dynamical systems. Robustness study of recurrent neural network is critical to its successful

implementation. The goal of robustness study is to reduce the sensitivity of modeling capability to

parametric uncertainties or make the network fault tolerant. In this study, an uncertainty propagation

perturbations in its trained weights. An uncertainty propagation analysis-based robustness measure

has been proposed accordingly and further compared with available performance loss-based and

sensitivity matrix-based approaches. Results show that the proposed robustness measure approach is

more efficient, generic, and flexible to quantify the robustness of a recurrent neural network.

& 2010 Elsevier B.V. All rights reserved.

1. Introduction

Most dynamical systems have a capability to tolerate varioussystem variations without exceeding predetermined tolerancebounds in the vicinity of nominal dynamic behaviors, which isgenerally called robustness. Robustness analysis is usually studiedto estimate the perturbation-induced performance variation or toquantify the system’s resilience to any possible perturbations.

Dynamical systems can be modeled using different ap-proaches, including the data-driven artificial neural network(ANN) method [1]. ANN, often abbreviated as neural network(NN), is a computational model emulating the informationprocessing capability of biological neural networks. When thesystem is represented by an ANN-based model, it is naturallyexpected that ANN should have certain robustness to variousperturbations [2–4]. For example, ANN should accurately describesystem behaviors even when its weights are altered due todifferent reasons such as hardware drifting over time [2],quantization and environmental noise in ANN implementation[4–8], software perturbations [9], and neural network connectionfaults [10]. While the effect of input uncertainties on ANNrobustness has been studied [11], the effect of weight alterationis usually of great interest in characterizing ANN robustness [4].As the network weights are easy to be altered during ANN

ll rights reserved.

: +1 864 656 4435.

implementation, robustness analysis on the effect of weightperturbations is critical for ANN physical realization [4].

Based on its architecture organization, ANN can be classifiedinto two main classes: single-layer or multilayer feedforward NN(FFNN) and recurrent neural network (RNN) [1], which has at leastone internal and/or external feedback loop. The presence offeedback loops has a profound impact on network learningcapability and its performance in modeling nonlinear dynamicalphenomena [1].

The objective of this study is to investigate the robustnessof RNN to weight perturbations in modeling applications usingan uncertainty propagation analysis-based robustness measure.RNN is used here as it is increasingly favored in engineeringapplications related to dynamical and non-stationary systems [1].While the effect of weight perturbations is of interest, theunderstanding on ANN modeling robustness is also applicable tothe effect of input as well as other parameter perturbations.

2. Background

Analysis of ANN robustness has been of great interest since thenetwork robustness information allows researchers to have abetter understanding of network behavior under uncertainties.ANN robustness has been studied for different applications,including associative memory, classification, and modeling.For associative memory applications, the robustness is usuallystudied by establishing sufficient conditions for valid memory

www.elsevier.com/locate/neucom

dx.doi.org/10.1016/j.neucom.2010.03.010

mailto:[email protected]

dx.doi.org/10.1016/j.neucom.2010.03.010

Input space

Sampling using Latin hypercube method

Input samplesInitial weights and

their perturbationANN

Sigma points

Computing ANN output covariancesand standard deviations

Taking L-1 norm

Local robustness measures

Averaging

Global robustness of ANN

Output covariancesstandard deviations

Fig. 1. Proposed procedure for robustness quantification.

X. Wang et al. / Neurocomputing 74 (2010) 354–361 355

functions under uncertainties of network parameters such asweight and bias [12–14]. For classification applications, robust-ness analysis is conducted by investigating the relationshipbetween permissible variations of inputs and the associatednetwork classification performance [11]. For modeling applica-tions, robustness is characterized by studying the effects ofperturbations in weights [4,15,16] or inputs [8] on networkoutputs.

ANN robustness in modeling applications has been studiedmainly using the performance loss-based approach [2,4,8] andthe sensitivity matrix-based approach [15,16]. While theseapproaches have been developed for general FFNN, they can alsobe extended to RNN.

The performance loss-based approach is usually realized bycomputing the network modeling capability degradation due toany perturbation to its parameters. The performance loss ischaracterized in terms of the mean square error (MSE) overavailable measurement data sets [2] by introducing certainperturbations to trained network parameters such as weights.Perturbations can be introduced using a constant scaling factor,which linearly changes the value of parameters of interest [2,4],or using a certain probability distribution function such as theGaussian distribution [8]. The upper boundary of performanceloss for all the input data indicates the network robustness. Thesensitivity matrix-based approach is typically implemented usinga differential analysis to compute the network sensitivity matrix(H), which is the Jacobian matrix containing derivatives of outputswith respect to parameters such as weights. A norm of thesensitivity matrix, such as the two-norm square [15] or thespectral norm [16], is usually calculated as the robustness index.Unfortunately, both approaches are implemented using a largeamount of measurement data, which are usually limited inpractical applications in order to cover the whole parameterspace. Furthermore, each sensitivity matrix reflects the sensitivityof only an infinitesimal range centered on the nominal weightvalues.

As the aforementioned two main approaches may be mainlylimited by the availability of measurement data, this studyaims to quantify the network robustness using a new uncertaintypropagation-based approach, which does not require anymeasurement data.

3. Robustness measure development

Based on the uncertainty propagation analysis, a new robust-ness measure is proposed here by a two-step procedure: (1) inputsample generation using a Latin hypercube sampling (LHS) methodand (2) robustness quantification using the unscented transform.This procedure is shown in Fig. 1 and elaborated as follows.

The first step is to uniformly generate n input samples�x,ð1Þ,x

,ð2Þ,:::,x

,ðnÞ�

from the whole input space. This is done byimplementing the LHS method, a type of stratified Monte Carlosampling methods [17], which can be nearly five times moreeffective than other traditional sampling methods [18]. During thefirst step n samples are generated, and they are to be fed into thetrained network, which may undergo certain weight perturbations.

The second step is to quantify the network robustness usingthe unscented transform method based on the generated n inputsamples. The proposed unscented transform-based robustnessquantification approach includes two measures: (1) local robust-ness for network robustness for a given input only and (2) globalrobustness to evaluate the network robustness by collectivelyconsidering all possible inputs from the whole input space. AsANN is a nonlinear function approximator, which maps bothinputs and weights to network outputs, outputs can be viewed as

a function of weights for given inputs. As such, the localrobustness can be interpreted as follows—for a specific input,how much do the outputs vary when the weights deviate from thetrained value? It can be seen that the local robustness is inputdependent. In this study, the local robustness for any input isdefined as the L-1 norm of the output standard deviation vector.

During the second step, the distribution of perturbed weightvector should be first predetermined. In this study, the trainedweights are assumed to be contaminated with zero mean finitevariance multivariate normal distributed noises [8], and the

contaminated weight vector w,

follows the normal distribution as

w,pNðw

,�

,SÞ, where w,�

is the trained weight vector, which is acolumn vector transformed from the trained weight matrix bycascading the rows of the matrix into a row vector and further

taking transposition, andP

is the covariance matrix of w,

. The

standard deviation of wi, the ith element (weight) of w,

, isdetermined as follows:

swi¼ Lwi, i¼ 1,2,:::,= ð1Þ

where swiis the square root of the ith diagonal element of

P, / the

dimension of the weight vector, and L the perturbation level,which is a constant specified based on application needs.

The unscented transform is usually used to compute thestatistics (mean and covariance) of a random vector that under-goes a nonlinear transformation. Here the unscented transform isused to compute the statistics of any ANN output due toperturbations introduced to the trained weights. With l as the

dimension of the trained weight vector, 2l+1 sigma vectors w,i

(i¼0, 1,y, 2l) are generated around w,

based on the mean (w,�

)and covariance (

P) of the contaminated weight vectors:

w,0 ¼w,�

ð2Þ

w,i ¼w,�þ

� ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðlþlÞS

p �i, i¼ 1,2,:::,l ð3Þ

m input neurons n output neuronsFeedforward loops (solid lines)

State feedback loops (dash lines)

h hidden neurons

Fig. 2. RNN architecture.

X. Wang et al. / Neurocomputing 74 (2010) 354–361356

w,i ¼w,��

� ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðlþlÞS

p �i�l

, i¼ lþ1,lþ2,:::,2l ð4Þ

where l¼a2(L+c)� l is a scaling parameter, a is a constant that

determines the spread of the sigma vectors around w,�

and it is setas 0.1 in this study [19], c is a secondary scaling parameter and is

set as 0 [19], and� ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðlþlÞS

p �i

is the ith column of square root of

matrix (l+l)P

. Accordingly, 2l+1 new contaminated ANNs areformed based on 2l+1 sigma vectors.

To compute the local robustness measure for the jth sampleinput x

,ðjÞ, the sample is fed into these 2l+1 networks, and the

corresponding outputs are obtained and called as outputs of thesigma vector c

,

iðjÞ (i¼0, 1,y, 2l). Then c,

iðjÞ is used to computethe mean and covariance of y

,ðjÞ, which is the network output

vector of x,ðjÞ. The mean of y

,ðjÞ can be obtained by weighting the

outputs of sigma vectors:

yðjÞ �X2l

i ¼ 0

wðmÞi c,

iðjÞ ð5Þ

where wðmÞi is a weight used in the unscented transform. Thecovariance of y

,ðjÞ, S

y,ðjÞ

, is obtained by

Sy,ðjÞ¼X2l

i ¼ 0

wðcÞi ðyðjÞ�c,

iðjÞÞðyðjÞ�c,

iðjÞÞT

ð6Þ

The weights in Eqs. (5) and (6) are given by

wðmÞ0 ¼ l=ðlþlÞ

wðcÞ0 ¼ l=ðlþlÞþð1�a2þbÞ

wðmÞi ¼wðcÞi ¼ 1=f2ðlþlÞg, i¼ 1,:::,2l ð7Þ

where b is a constant used to incorporate any prior knowledge of

the distribution of w,�

and is set as 2 for normal distributions [19].The standard deviation of the kth element of y

,ðjÞ can be written

as

sykðjÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiS

y,ðjÞðk,kÞ

rð8Þ

where Sy,ðjÞðk,kÞ

denotes the kth diagonal element of Sy,ðjÞ

. A vector

composed of the standard deviations of all the elements of output

vector y,ðjÞ can be written as

s,y,ðjÞ¼ ½sy1ðjÞ,sy2ðjÞ,:::,syny ðjÞ

� ð9Þ

where ny is the dimension of y,ðjÞ.

The L-1 vector norm, which computes the summation ofabsolute value of all elements of a vector, is used as the localrobustness measure for the sample input x

,ðjÞ:

RðjÞ ¼ :s,y,ðjÞ:1 ð10Þ

For finite dimensional vector spaces, all vector norms areequivalent [20], and the L-1 norm is selected here due to itsrobustness to outliers and its ease for implementation [21].

Finally, the global robustness measure, which accounts foreffects on all the input samples from input space is defined as the

average of the local robustness measures as R1 ¼ ð1=nÞPn

i ¼ 1

RðiÞ,

where n is the number of input samples. This proposed approachis further implemented to RNNs of interest in the followingsection.

4. Methodology validation

4.1. RNN architecture and training algorithm

RNNs have been favored in more and more engineeringapplications in modeling of dynamical and non-stationarysystems [1] such as cutting tool wear modeling [22]. The proposedmethod is applied to study the robustness of an RNN and anoptimized RNN as discussed in the following.

The RNN network structure of interest is introduced here. Thefully forward connected NN (FFCNN) [23] proposed by Werbos[24] is adopted as the backbone of RNN of interest [22]. The RNNis evolved from FFCNN by introducing the intra-neuron internalrecurrency inside its hidden section. As shown in Fig. 2, the RNNcomprises of m neurons in its input section, h neurons in itshidden section, and n neurons in its output section. All input andoutput neurons use a linear activation function, and hiddenneurons adopt a sigmoid activation function. In addition to theforward connections as in FFCNN, each neuron in its hiddensection of RNN takes feedback connections from neurons to theright of it (the dashed lines in Fig. 2). This internal recurrencymakes RNN fundamentally different from FFCNN since it not onlyoperates on the input space but also on the internal state space,which makes it more suitable for modeling of non-stationary anddynamical systems [1].

The RNN architecture here is trained with the extendedKalman filter (EKF) algorithm, which was first introduced to trainneural networks by Singhal and Wu [25]. Compared with the mostwidely adopted back-propagation (BP) algorithm, the EKF trainingalgorithm has the following advantages: (1) the EKF algorithmhelps to reach the training steady state much faster for non-stationary processes [26] and (2) the EKF algorithm excels the BPalgorithm when the training data are limited [27]. Hence the EKFlearning algorithm [1] is favored here, and the EKF-based networkweight updating at the time step k is introduced as follows:

w,

kþ1 ¼w,

k�Kkðy,

k�y,�

k Þ

y,

k ¼ hðw,

k,ukÞ

8><>: ð11Þ

Kk ¼ PkHk½RkþHTk PkHk�

�1 ð12Þ

Pkþ1 ¼Qkþ½I�KkHTk �Pk ð13Þ

where w,

k is the estimate of the optimal weight vector w,�

at

training time step k, w,

kþ1 the estimation of weight vector at the

next time step, Kk the Kalman gain matrix, y,

k the output of RNN,which is composed of the outputs of neurons in the output

section, y,�

k the target value for y,

k, h( � ) defines network output as a

function of the weight matrix w,

k and the input uk, H is the orderly

Stop criteriaNo

Yes

P0, Q0, R0, and w0

Apply Kalman filtering and update weights Equations (11-13)

Trained weights

Training data Compute orderly derivatives

and generate Jacobian matrix H ∂w∂+ y

Fig. 3. EKF-based RNN training flow chart.

0 200 400 600 800 1000-1

-0.5

0

0.5

1

Time step

Out

put

Fig. 4. Output of the benchmark system.


derivative matrix of the network outputs with respect to thetrainable network weights [23], R the covariance matrix ofmeasurement noise, Q the covariance matrix of process noise, P

the covariance matrix of the estimation error, I the identitymatrix, and k and k+1 represent the kth and (k+1)th (one stepafter) time steps, respectively.

The RNN training process is depicted in Fig. 3. First, theparameters of EKF, P, Q, R, and w

,are initialized as P0, Q0, R0, and

w0. Then the training data are fed into the network and theorderly derivatives q+Y/qwk of the network outputs with respectto the weight vectors are calculated [23] and form a Jacobianmatrix Hk. The orderly derivatives are computed using the chainrule for all possible connections contributing to Y via wk. Thederivatives take different forms depending on the specific weightconnections. Finally, the Kalman filter equations (11)–(13) areapplied to train the network weights until the specified stopcriteria are met.

This study also applies a topology destructive optimizationapproach to further optimize RNN. According to this approach, thenumber of hidden neurons is first chosen based on a trial anderror method, and the network topology is further optimized bydisconnecting some weights among the network neurons using amethod first proposed by KrishnaKumar [28]. This optimized RNNis also trained using the aforementioned EKF algorithm [22].

4.2. Benchmark nonlinear dynamical system and RNN to model the

system

A benchmark nonlinear dynamical system [29] that representsa single-input single-output (SISO) nonlinear plant, is modeledusing the RNN and optimized RNN, and then the robustness of thenetworks is studied by the proposed method. Such a benchmarksystem (Eq. (14)), as shown in Fig. 4, has been selected due to itsgenerality as well as analytical tractability:

ypðkþ1Þ ¼ f ðypðkÞ,ypðk�1Þ,ypðk�2Þ,uðkÞ,uðk�1ÞÞ ð14Þ

where f ðx1,x2,x3,x4,x5Þ ¼ ðx1x2x3x5ðx3�1Þþx4Þ=ð1þx22þx2

3Þ and

uðkÞ ¼

sinðpk=25Þ, 0rko250

1:0, 250rko500

�1:0, 500rko750

0:3sinðpk=25Þþ0:1sinðpk=32Þþ0:6sinðpk=10Þ, 750rko1000

8>>>><>>>>:

In modeling this benchmark system using the RNN, sixnetwork inputs are selected as follows: three outputs at thecurrent and previous steps yp(k), yp(k�1), and yp(k�2), twocontrol actions u(k) and u(k�1), and a constant bias 1, and thenetwork output is yp(k+1). The training pairs or patterns aregenerated using Eq. (14); and the zero initial condition is appliedto generate data yp(k), where k¼1, 2,y, 1000. The networkweights are first randomly initialized in the region [�1, 1]. Inthis study the error covariance matrix P, the process noisecovariance matrix Q, and the measurement noise covariancematrix R are all diagonal matrices, and all diagonal elements of P,Q, and R are initialized as 100, 0.01, and 100, respectively, as in[27]. The number of hidden neurons is determined by trial anderror to achieve the best modeling performance for aforemen-tioned matrices P, Q, and R, and the final network topology is6–9–1 with six input, nine hidden, and one output neurons.

Either of the following two stop criteria is adopted here: (1) thenumber of training epochs (also called the training step, which is acomplete training process covering all available 1000 trainingpatterns) should be not larger than 100 and (2) the training error isless than a predetermined case-dependent value (here 3%) and thedifference between the current training error and that of 20epochs before is less than another predetermined case-dependentvalue (here 0.03%). Training error accounts for the overallmodeling error of all available training patterns used in training,and the training error for a training epoch j is represented using anormalized sum of square error (SSE) as follows:

eðjÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðy,

oðjÞ�y,�ÞTðy,

oðjÞ�y,�Þ=y

,�T

y,�

r� 100% ð15Þ

where y,�

is the target data at an epoch and y,

oðjÞ the output of RNNat epoch j.

Similarly, in modeling this benchmark system using optimizedRNN, the same network structure (6–9–1) is selected, and thenetwork prone method [22,28] is applied to determine theconnectivity among neurons and optimize the network structurefor improved accuracy and robustness [28].

The trained weight matrices of both RNN and optimized RNNcan be found in the appendix, and RNN has 150 trainable weightswhile optimized RNN has 96 trainable weights.

4.3. Robustness quantification results and discussion

The proposed robustness quantification method is applied toquantify the robustness of the trained RNN and the optimized RNN.Here two robustness measures are considered: the local robustnessmeasure, which is input-dependent, for any specific input samplebased on Eq. (10), and the global robustness measure for overallnetwork robustness based on the average of local robustness

0 20 40 60 80 1000.01

0.012

0.014

0.016

0.018

0.02

nth input sample

Loca

l rob

ustn

ess

Fig. 5. Local robustness measures for 100 input samples.

0 20 40 60 80 1000

0.005

0.01

0.015

0.02

Number of input samples

Glo

bal r

obus

tnes

s m

easu

re

RNNOptimized RNN

Fig. 6. Robustness of RNN and optimized RNN (perturbation level¼1%).

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Perturbation level (%)

Glo

bal r

obus

tnes

s m

easu

re

RNNOptimized RNN

Fig. 7. Network robustness values under different perturbation levels.


measures. Generally the perturbation level should be determinedbased on experimental observations of hardware/software duringthe implementation of ANN. In this study, the perturbation level L

has been taken as 1% for simplicity and 100 input samples have beenused if not mentioned otherwise. Based on the proposed approacha smaller robustness value means higher system robustness toexternal perturbations.

Fig. 5 shows the varying local robustness measures for RNNusing 100 sample inputs, which were uniformly generated fromthe input space. Each point represents a local robustness measurefor an input sample. The global robustness measure is found to be0.0136 by averaging the local robustness measures. Similarvarying local robustness measure tendency has been observedwith optimized RNN, and its global robustness measure is foundto be 0.0067, which is smaller than that of RNN.

The proposed global robustness measure is dependent on twofactors: the number of input samples and the perturbation levelapplied. The following sections study the effects of the two factorson the global robustness value.

4.3.1. Effect of number of input samples on global robustness

measure

In general, the more the input samples used, the more reliablethe global robustness measure representing the system robustnessperformance over the whole input space. Unfortunately, it isimpossible to compute the global measure based on an exhaustiveway by sampling all possible inputs. Therefore, a minimum numberof input samples needed for global robustness quantification shouldbe determined first. To find this minimum number, differentnumbers of input samples have been selected in this study andtheir corresponding global robustness measures for both RNN andoptimized RNN are computed and shown in Fig. 6. The smaller therobustness measure, the more robust the network to perturbations.

It can be seen that for each network the robustness measuresconverge to a steady value quickly after more than 10 uniformlygenerated input samples used. Based on a conservative consideration,100 input samples are used here and in the following sections. Basedon the 100 input samples, optimized RNN has a global robustnessvalue of 0.0067, which is smaller than that of RNN (0.0136), therebyimplying that optimized RNN is more robust than RNN [28].

4.3.2. Effect of the perturbation level on global robustness measure

The perturbation levels ranging from 1% to 20% have beenapplied to the trained network weights to study the network

robustness under perturbed weights and the results are shown inFig. 7. It can be seen from Fig. 7 that optimized RNN is morerobust than a regular RNN for all the perturbation levels, whichindicates that the optimization process has improved the networkrobustness as observed before [28].

It can also be seen that the relationship between therobustness measure and the perturbation level can be approxi-mated as linear. This linear pattern is attributed to the followingreason. All the input and output neurons use a linear activationfunction. Only hidden neurons, which adopt a sigmoid activationfunction, may generate nonlinearity. However, most hiddenneurons work in the linear region of their activation functionsunder small perturbations, which may lead to a linear relationshipbetween the perturbation level and the robustness measure. Itshould be pointed out that this linear relationship will not bevalid in the nonlinear region of the sigmoid activation function.

5. Discussion

5.1. Comparison among robustness measures

The proposed robustness quantification approach is furthercompared with the performance loss-based and sensitivity

Table 1Comparison of robustness quantification approaches.

Robustness quantification approach RNN robustness (r1) Optimized RNN

robustness (r2)

Robustness

ratio (r1/r2)

Performance loss based 0.0395 0.0342 1.16

Sensitivity matrix based 2.7163 1.9988 1.36

Proposed 0.0138 0.007 1.97

Table 2Sensitivity matrix-based robustness of RNN and optimized RNN.

RNN robustness Optimized RNN

robustness

Robustness (H matrix based) 2.7163 1.9988

Robustness (S matrix based) 2.7274 1.9946


matrix-based approaches. The performance loss-based and sensi-tivity matrix-based measures are computed based on the trainingdata mentioned in the previous section. To fairly compare thethree approaches, the same training data (input data) are alsoused to compute the proposed robustness measure, and the LHSinput sampling process is not applied here to generate inputsamples for the proposed approach.

For both the performance loss-based approach and theproposed approach, 1% perturbation level is used. For theperformance loss-based approach, 10,000 networks are generatedbased on the perturbed weights, and the network output is thencompared with the corresponding desired outputs to computeMSE. The resulting maximum MSE is taken as the performanceloss-based measure. For the sensitivity matrix-based approach,the sensitivity matrices for all the training inputs are obtainedduring the training process and the spectral norm [16] for eachmatrix is computed as the local robustness measure. It should beemphasized that the output of the benchmark nonlinear dyna-mical system is a scalar, so the sensitivity matrix is actually avector in this case. It is known that the matrix norm correspond-ing to the Euclidean vector norm is the spectral norm [30];therefore the spectral norm of a vector is the same as its Euclideannorm. The average of the spectral norms of sensitivity matrices isused as the sensitivity matrix-based measure.

It should be pointed out that the three robustness measurescannot be directly compared against each other because they arecomputed using different criteria. Instead, the ratio of robustnessmeasure of RNN to that of optimized RNN is studied to indicatethe effectiveness of any quantification approach. A larger robust-ness ratio means that this quantification approach is moresensitive in quantifying the robustness difference.

Table 1 lists the comparison results. For all the threequantification approaches, optimized RNN is found to be morerobust than RNN. It is found that the proposed approach has thelargest robustness ratio (1.97) than those of the performance loss-based approach (1.16) and the sensitivity matrix-based approach(1.36). As the largest robustness ratio value is associated with themost effective quantification approach, it is concluded that theproposed robustness quantification approach is the most effectiveone among these three approaches. A mathematicalunderstanding of this sensitivity improvement due to theproposed approach is to be further developed in a future study.

5.2. Relation between proposed and sensitivity matrix-based

approaches

Under a small perturbation level the uncertainty propagationanalysis used in this proposed approach can also be related to thesensitivity matrix-based approach. Each element (Hij¼qyi/qwj) ofthe weight-output Jacobian sensitivity matrix H represents thederivative of an output (yi) with respect to a weight (wj). Using theuncertainty propagation analysis under a small weight perturba-tion such as 0.01%, this Hij can be approximated by the standard

deviation ratio Sij ¼ std�

yðjÞi

�=stdðwjÞ, where std( � ) is a standard

deviation operator and std�

yðjÞi

�represents the standard deviation

of output yi under a perturbation only in the weight wj. Thisstandard deviation ratio Sij describes the dependence of outputvariation on weight variation. Here each time perturbation is

added only to a specific weight to compute std�

yðjÞi

�while all the

other weights remain as their trained values. It should be pointedout that it is different from the proposed robustness quantifica-tion approach, where perturbations are added to all the weights

simultaneously to compute s,y,ðjÞ

.

Table 2 lists the sensitivity matrix-based robustness measuresbased on the spectral norms [16], which are computed using thetraditional method (Hij) and the uncertainty propagation analysis(Sij). It is found that the two results are closely matched; thereforethe proposed analysis can also be used to compute the sensitivitymatrix-based measure.

6. Conclusions

This study has proposed a new artificial neural network (ANN)robustness measure using the uncertainty propagation analysis.The weight perturbation-induced output uncertainties are eval-uated using the unscented transform. The proposed robustnessquantification approach has been further applied to recurrentneural network (RNN) and optimized RNN in modeling a bench-mark nonlinear dynamical system.

Compared with other existing robustness quantification ap-proaches, this approach has the following advantages: (1) it is moreefficient since the unscented transform can effectively compute theuncertainties of RNN outputs, and in addition, it needs only a smallnumber of input samples from the whole input space to quantify thenetwork robustness; (2) it is more effective to quantify therobustness; however, the mathematical understanding of thissensitivity improvement due to the proposed approach is to befurther investigated; and (3) it is easier to implement as it evaluatesthe network robustness directly based on the RNN model while theweight perturbation can be easily adjusted as needed, and in addition,the proposed approach does not require any training data.

It should be pointed out that the proposed robustness measureis developed based on the assumption that the same level ofperturbation is introduced to all weights. In real applications theperturbation level to weights can be specified individually basedon measurements, and the system robustness can be quantifiedusing the same procedure. While this study has focused onthe robustness of weight-related perturbations, the proposedapproach can also be used to study the system robustness due toperturbations in inputs or other parameters.


Acknowledgements

The authors would like to acknowledge the financial support fromthe South Carolina Space Grant Consortium and the NASA AmesResearch Center. The stimulating discussion with Dr. K. KrishnaKu-mar of NASA Ames was also crucial in this robustness study.

Appendix

The following matrices represent the weight information ofthe trained RNN and the trained optimized RNN, respectively.

Matrix 1. Weight matrix of trained RNN

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:05 0:61 1:91 �0:02 0:16 0:83 �0:19 0:00 0:53 0:45 0:19 �0:50 0:50 �0:42 �0:33 0:00

�1:02 �1:34 �0:12 �0:56 0:14 0:01 0:80 �0:95 0:86 �0:12 0:05 �0:03 �0:27 0:43 0:79 0:00

0:71 �0:70 0:20 0:24 0:40 �0:23 0:36 �0:65 0:73 0:08 �0:96 �0:25 0:40 �0:85 0:49 0:00

�0:26 �0:92 0:96 0:05 0:50 �0:04 �0:94 �0:89 �0:80 �0:51 0:75 0:59 0:22 �0:41 0:53 0:00

0:61 �0:45 1:10 �0:51 0:00 0:08 �0:92 �0:54 �0:10 �0:72 �0:44 0:11 0:95 0:05 0:79 0:00

0:94 �0:07 0:71 �0:35 �0:09 �0:45 0:21 �0:65 0:56 �0:25 �0:53 0:78 0:12 �0:74 0:10 0:00

0:32 0:68 0:20 �092 0:83 0:74 �0:31 0:45 �0:65 �0:78 �0:27 0:42 �0:86 0:14 �0:54 0:00

�0:91 0:17 �0:92 �0:46 0:61 0:05 0:25 �0:16 �0:82 �0:19 0:76 �1:05 �0:24 �0:18 �0:12 0:00

0:77 �1:88 0:77 0:24 �0:31 1:00 �096 0:79 0:40 0:36 �0:60 �0:46 0:66 �0:01 0:69 0:00

0:44 0:33 �0:56 0:35 �0:25 �0:05 0:65 0:33 �0:09 �0:52 �0:61 0:27 0:35 0:40 0:77 0:00

266666666666666666666666666666666664

377777777777777777777777777777777775

Matrix 2. Weight matrix of trained optimized RNN

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00

0:00 0:00 �1:35 �0:82 �0:86 �0:52 0:25 0:70 �0:95 �0:61 0:43 0:99 0:12 0:19 0:48 0:00

0:05 �0:42 0:43 �0:43 0:00 �0:46 0:00 0:91 �0:11 �0:51 0:60 0:83 0:34 �0:28 �0:56 0:00

0:00 0:34 0:46 0:00 0:00 0:00 0:00 0:00 0:48 0:71 �0:70 �0:01 0:40 �0:34 0:04 0:00

0:12 0:00 0:00 0:11 0:00 0:97 �0:07 0:00 0:00 0:19 0:04 0:39 �0:02 �0:94 0:40 0:00

0:00 0:00 0:15 0:00 0:67 �0:31 0:00 0:87 0:89 0:00 �0:91 �0:50 �0:30 0:77 �0:52 0:00

0:52 �0:86 0:00 0:00 0:00 0:00 0:00 0:00 0:00 0:00 �0:66 0:04 �0:17 0:59 �0:31 0:00

0:00 0:00 0:21 0:00 0:00 0:00 0:97 0:84 0:00 �0:20 0:00 0:00 �0:23 0:76 �0:65 0:00

0:00 0:00 0:81 �0:09 0:78 0:00 0:00 1:00 0:02 �1:00 0:00 0:00 0:64 �0:04 0:07 0:00

�0:86 0:32 0:00 0:00 0:63 �0:31 0:21 0:99 0:00 0:00 0:00 0:51 0:57 0:00 �083 0:00

0:00 0:00 �0:47 0:38 �0:14 0:35 �0:64 0:00 0:00 0:00 �0:76 0:00 0:00 �0:10 0:12 0:00

266666666666666666666666666666666664

377777777777777777777777777777777775

References

[1] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed.,Prentice-Hall, Upper Saddle River, NJ, 1999.

[2] C. Chiu, K. Mehrotra, et al., Robustness of feedforward neural networks, in:Proceedings of the IEEE International Conference on Neural Networks, vol. 2,1993, pp. 783–788.

[3] C. Alippi, M. Milena, A poly-time analysis of robustness in feedforward neuralnetworks, in: Proceedings of the IEEE International Workshop on Virtual andIntelligent Measurement Systmes, Budapest, Hungary, May 19–20, vol. 1,2001, pp. 76–80.

[4] C. Alippi, D. Sam, et al., A training-time analysis of robustness in feed-forwardneural networks, in: Proceedings of the IEEE International Joint Conference onNeural Networks, vol. 4, 2004, pp. 2853–2858.

[5] G. Dundar, K. Rose, The effects of quantization on multilayer neural networks,IEEE Transactions on Neural Networks 6 (6) (1995) 1446–1451.

[6] B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, NewYork, 2000.

[7] B. Widrow, J. Kolla0 r, Quantization Noise, Prentice-Hall, New Jersey, USA, 2002.[8] R. Eickhoff, U. Ruckert, Robustness of radial basis functions, Neurocomputing

70 (2007) 2758–2767.[9] A. Assoum, M. Geagea, et al., Influence on ANNs fault tolerance of binary

errors introduced during training, in: Proceedings of the InternationalConference on Information and Communication Technologies: From Theoryto Applications, 2004, pp. 435–436.

[10] D.S. Phatak, I. Koren, Complete and partial fault tolerance offeedforward neural nets, IEEE Transactions on Neural Networks 6 (2)(1995) 446–456.

[11] S.G. Pierce, Y.B. Haim, et al., Evaluation of neural network robust reliabilityusing information-gap theory, IEEE Transactions on Neural Networks 17 (6)(2006) 1349–1361.

[12] D. Liu, A.N. Michel, Robustness analysis of a class of neural networks, in:Circuits and Systems, Proceedings of the 36th Midwest Symposium, 1993, pp.1077–1080.

[13] D. Liu, A.N. Michel, Robustness analysis and design of a class of neuralnetworks with sparse interconnecting structure, Neurocomputing 12 (1996)59–76.

[14] Z. Feng, A.N. Michel, Robustness analysis of a class of discrete-time recurrentneural networks under perturbations, IEEE Transactions on Circuits andSystems—I: Fundamental Theory and Applications 46 (1999) 1482–1486.


[15] S.O. Yee, M.Y. Chow, Robustness of an induction motor incipient faultdetector neural network subject to small input perturbations, in: IEEEProceedings of Southeastcon ’91, vol. 1, 1991, pp. 365–369.

[16] K. KrishnaKumar, K. Nishta, Robustness analysis of neural networks with anapplication to system identification, Journal of Guidance, Control andDynamics 22 (1999) 695–701.

[17] W.L. Loh, On Latin hypercube sampling, Annals of Statistics 24 (5) (1996)2058–2080.

[18] J.F. Swidzinski, C. Kai, Nonlinear statistical modeling and yield estimationtechnique for use in Monte Carlo simulations, IEEE Transactions onMicrowave Theory and Techniques 48 (12) (2000) 2316–2324.

[19] E.A. Wan, R.V.D. Merwe, Kalman Filtering and Neural Networks, in: S. Haykin(Ed.), The Unscented Kalman Filter, Wiley Publishing, 2001 (Chapter 7).

[20] R.A. Horn, C.R. Johnson, Matrix Analysis, Cambridge University Press, 1990.[21] N. Kwak, Principal component analysis based on L1-Norm maximization, IEEE

Transactions on Pattern Analysis and Machine Intelligence 30 (9) (2008)1672–1680.

[22] X. Wang, Y. Huang, Optimized recurrent neural network-based tool wearmodeling in hard turning, Transactions of NAMRI/SME 37 (2009) 213–220.

[23] X. Wang, W. Wang, et al., Design of neural network-based estimator for toolwear modeling in hard turning, Journal of Intelligent Manufacturing 19 (4)(2008) 383–396.

[24] P.J. Werbos, Back propagation through time: what it does and how to do it,Proceedings of the IEEE 78 (10) (1990) 1550–1560.

[25] S. Singhal, L. Wu, Training feed forward networks with extended Kalmanfilter algorithm, in: Proceedings of the ICASSP, IEEE International Conferenceon Acoustics, Speech and Signal Processing, Glasgow, Scotland, 23–26 May,1989, pp. 1187–1190.

[26] L. Zhang, Neural network-based market clearing price prediction andconfidence interval estimation with an improved extended Kalman filtermethod, IEEE Transactions on Power Systems 20 (1) (2005) 59–66.

[27] G.V. Puskorius, L.A. Feldkamp, Neurocontrol of nonlinear dynamical systemswith Kalman filter trained recurrent networks, IEEE Transactions on NeuralNetworks 5 (1994) 279–297.

[28] K. KrishnaKumar, Optimization of the neural net connectivity pattern using abackpropagation algorithm, Neurocomputing 5 (1993) 273–286.

[29] K.S. Narendra, Adaptive Control of Dynamical System Using Neural Networks,Handbook of Intelligent Control Neural, Fuzzy, and Adaptive Approaches, VanNostrand Reinhold, New York, NY, 1992.

[30] J.K. Baksalary, R. Kala, A new bound for the euclidean norm of the differencebetween the least squares and the best linear unbiased estimators, TheAnnals of Statistics 3 (1980) 679–681.

Xiaoyu Wang is a Ph.D. candidate in MechanicalEngineering at Clemson University, South Carolina.He earned his M.S. in Mechanical Engineering from theFlorida International University, Miami, Florida. Hisresearch interests encompass artificial intelligence,statistic process modeling, and nonlinear control.

Yong Huang received his B.S. degree in MechatronicsEngineering from Xidian University, China, in 1993, hisM.S. degrees in Mechanical Engineering from ZhejiangUniversity, China, and the University of Alabama in1996 and 1999, respectively, and his M.S. in Electricaland Computer Engineering and Ph.D. in MechanicalEngineering from the Georgia Institute of Technology,Georgia in 2002. As professor of Mechanical Engineer-ing at Clemson University, South Carolina since 2003,his research addresses various challenges in advancedmanufacturing. His current research interests areadditive manufacturing for biological and energy
applications, precision engineering, and manufacturing
process monitoring. He is the recipient of the 2005 ASME Blackall Machine Tooland Gage Award, the 2006 SME Outstanding Young Manufacturing EngineerAward, and the 2008 NSF Career Award.

Nhan Nguyen, Ph.D., is a senior research scientist withthe Intelligent Systems Division at NASA Ames Re-search Center since 1986. He is currently the ProjectScientist for the NASA Integrated Resilient AircraftControl Project. His research interests include intelli-gent adaptive control, distributed-parameter optimalcontrol, flight dynamics, and flow control. He is theleading author in more than 53 papers, including 9journal papers. He is an associate fellow of AIAA andthe recipient of the NASA Exceptional AchievementMedal. He also holds a US patent.

robustness quantification of recurrent neural network using unscented transform

Documents