[ieee 2013 sixth international conference on advanced computational intelligence (icaci) - hangzhou,...

2013 Sixth International Conference on Advanced Computational Intelligence October 19-21, 2013, Hangzhou, China

Neuro-Optimal Learning Control Scheme for Gasification Process with Unknown System Model

Qinglai Wei, Member, IEEE and Derong Liu, Fellow, IEEE

Abstract- In this paper, a new iterative optimal learning control scheme for discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) approach is developed to obtain the optimal control law for coal gasification control systems. For the unknown coal gasification process, neural networks (NNs) are introduced to reconstruct the dynamics of the gasification process, where the approximation errors of the reconstruction dynamics are considered. Via system transformation, the optimal tracking control problem with approximation errors is transformed into a two-person zero-sum optimal control problem. A new iterative ADP algorithm is then developed to obtain the optimal control law for the transformed system with convergence analysis. Finally, numerical results are given to illustrate the performance of the present method.

I. INTRODUCTION

A DAPTIVE dynamic programming (ADP), proposed by Werbos [24] and [25], has played an important role as

a way to solve optimal control problem forward-in-time [7], [13], [16], [18], [26]. Iterative methods are widely used in ADP to obtain the solution of HJB equation indirectly and have received lots of attentions [4], [9], [20], [21]. There are two main iterative ADP algorithms that are based on policy and value iterations, respectively [8]. Policy iteration algorithms are implemented from an initial admissible control law to obtain the optimal solution of HJB equation [1], [14]. Value iteration algorithms are implemented from an initial performance index function to obtain the optimal control law [2], [27]. Although iterative ADP algorithms attract more and more attentions [5], [7], [10], [11], [22], [23], [28], many iterative algorithms need the accurate system model expressions. For most real-world control systems, such as the coal gasification control system, the accurate system model is complex and cannot be obtained generally. In this situation, approximation structures, such as neural networks, can be used to approximate the system model. So, there must exist approximation errors between the approximation functions and the expected one. When the accurate system model cannot be obtained, the convergence properties in the accurately iterative ADP algorithms may be invalid. To the best of our knowledge, there are no discussions on the optimal control scheme of the iterative ADP algorithms,

The authors are with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (phone: +86-10-62557379; fax: +86-10-62650912; e-mail: [email protected];[email protected]).

This work was supported in part by the National Natural Science Foundation of China under Grants 61034002, 61233001, and 61273140, in part by Beijing Natural Science Foundation under Grant 4132078, and in part by the Early Career Development Award of SKLMCCS.

978-1-4673-6343-3/13/$3\.00 ©2013 IEEE 354

where the modeling errors are considered. This motivates our research.

In this paper for the first time, a neuro-optimal learning control method of the coal gasification system using iterative adaptive dynamic programming is developed. First, NN is established for the coal gasification control system, where the mathematical expression of the coal gasification system is unnecessary. Second, the coal quality and reference control models are also established by NNs. Next, via the system transformation, the optimal tracking system is effectively transformed into a two-person zero-sum control system. The NN approximation errors of the systems, coal quality, reference control (approximation errors for brief) are considered as a system control (disturbance control for brief). Then the Hamilton-Jacobi-Isaacs (HJI) equation for the twoperson zero-sum optimal control problem is derived. A new iterative ADP algorithm is developed to obtain the optimal control law. Finally, numerical results are given to show the effectiveness of the developed iterative ADP algorithm.

This paper is organized as follows. In Section II, the problem formulation is presented. In Section III, NN s modeling methods for the coal gasification system, the coal quality and reference control are established. In Section IV, the system transformation is presented and the iterative ADP algorithm are developed to obtain the optimal control law considering the approximation errors. In Section V, numerical results are analyzed to demonstrate the effectiveness of the developed optimal control scheme. Finally, in Section VI, the conclusion is drawn.

II. PROBLEM FORMULATION

A. Coal Gasification Chemistry

The coal gasification inputs the coal water slurry (including coal and water) and combines with the oxygen into the gasifier. The coal gasification process in the gasifier operates at a high temperature, and the output of coal gasification includes synthesis gas and char.Suppose that the composition of the coal contains carbon (C), hydrogen (H), oxygen (0), char (Char), which is expressed by

where 1. Let 8(k) i=1

[ e1 (k), e2(k), e3(k), e4(k)]T denote the coal quality vector. The coal gasification reaction can be classified into two phases [17]. One phase is coal combustion reaction, the

chemical equations are expressed by

1 C + "202 = CO - 123.1 KJ Imol,

1 CO + "202 = CO2 - 282.9 KJ/mol. (1)

The other phase is water-gas shift reaction which is reversible and mildly exothermic

CO + H20 = CO2 + H2 -42.3KJ Imol, (2)

where CO is carbon monoxide, CO2 is carbon dioxide, H20 is water.

The coal combustion reaction is instantaneously and nonreversible. The water-gas shift reaction is reversible and the reaction is strongly depended with the reaction temperature. Let x( k) be the reaction temperature and let T( k) denote the reaction equilibrium coefficient. Then we have the following empirical form ula [17]

T(k) = nC02 • nH2

=

202.362 , (3) ( ) 0.921914

11CO . 11H20 x(k) - 635.52

where 11(.) denotes the molar quantity in the synthesis gas. For the coal gasification process, it is pointed out that the

reaction temperature is a key parameter [6], [17]. Hence, in this paper, an optimal control scheme will be established to make the reaction temperature effectively track a desired one.

Remark 2.1: Generally, the coal contains more elements, such as nitrogen (N), sulphur (S), chlorine (el) and so on. While the chemical reactions of these elements are not main reactions and nearly ineffective to the reaction temperature. Thus, for convenience of analysis, the reactions of these elements are omitted and only the main reactions are considered in this paper.

B. Control System Description

Let p(.) (k) denote the flow of the control input (Kg/h) and R(-) (k) denote the flow of the output (Kg/h). Then the control input can be defined as

u(k) = [u1(k), u2(k), u3(k)]T = [ Peoal (k), PH20(k), P02 (k)]T. (4)

The system output can be defined as

y(k) = [ Y1 (k), Y2(k), Y3(k), Y4(k), Ys(k )]T = [ Rco(k), RC02 (k), RH2 (k), RH20(k), RChar(k)]T .

(5)

According to (I )-(5), the coal gasification control system can be expressed as

x(k + 1) = F(x(k), u(k), 8(k)) y(k) = G(x(k), u(k), 8(k)),

where F( ·) and G( ·) are unknown system functions.

(6)

Let the desired state trajectory be T/- Then our destination is to design an optimal state-feedback tracking control law u*(k) = u*(x(k)), that makes the system state track the

desired state trajectory. However, it is nearly impossible

355

to obtain a direct optimal tracking controller for system (6). First, the system functions FO and GO are unknown nonlinear functions. Second, for desired trajectory TI, the corresponding reference control is also difficult to obtain for the unknown system. Furthermore, the coal quality 8(k) is also an unknown and uncontrollable parameter. Thus, new methods must be established to solve these problems.

III. DATA-BASED MODELING AND PROPERTIES

In this section, three-layer back-propagation (BP) NNs are introduced to approximate the system (6). We also use NNs to solve the reference control and obtain the coal quality. Let the number of hidden layer neurons be denoted by L. Let the weight matrix between the input layer and hidden layer be denoted by Y. Let the weight matrix between the hidden layer and output layer be denoted by W. Let the input vector of the NN be denoted as X. Then the output of three-layer NN is represented by:

FN( X, Y, W) = W O"( Y X) (7)

( ) R L [ ( )]. _ exp (z; ) -exp ( -zi ) . _ where 0" Y X E , , 0" Z , - ,t -exp (z; ) + exp ( -Zi )

1, ... L, are the activation function. The NN estimation error can be expressed by

where Y*, W* are the ideal weight parameters, c( X) is the reconstruction error. For convenience of analysis, only the output weights W are updating during the training, while the hidden weights are kept fixed [4], [5], [19]. Hence, in the following part, the NN function (7) is simplified by the expression FN( X, W) = W O"( X).

A. Control System Modeling and Properties

In this subsection, using input-state-output data, a BP NN model is established to reconstruct the system (6). Let the number of hidden layer neurons be denoted as L m1 , Lm2. Let the ideal weights be denoted as W,';,l and W,';,2' respectively. According to the universal approximation property of NNs, the NN representation of the system (6) can be written as

x(k + 1) = W;;'� O"(z(k)) + cm1(k), y(k) = W;;'�O"(z(k)) + cm2(k),

where z(k) = [xT(k), uT(k), 8T(k) jT is the NN input and 0" (.) is the NN activation function which is selected as a hyperbolic tangent function. Let the NN activation function satisfy 110"( ') II -s: 0" M for a constant 0" M. Let Cm (k) be the bounded NN reconstruction error which satisfies Ilcm(k)11 -s: c M. The NN model for the system is constructed as

x(k + 1) = W,';;l(k)O"(z(k)), fj(k) = W,';;2(k)0"(z(k)), (8)

where x(k) is the estimated system state vector and fj(k) is the estimated system output vector. Let W m1 (k) be the estimation of the ideal weight matrix W,';,l and let W m2 (k)

be the estimation of the ideal weight matrix W,';,2 ' Then, we define the system identification errors as

x(k + 1) = x(k + 1) - x(k + 1) = W�l (k)O"(z(k)) - cml(k),

y(k) = y(k) - y(k) = W�2(k)0"(z(k)) - cm2(k), (9)

where Wm1(k) = Wm1(k) - W,';,l and Wm2(k) Wm2(k) - W,';,2 ' Let ¢ml(k) = W�l(k)O"(z(k)) and

¢m2(k) = W�2(k)0"(z(k)). Then, we can get

x(k + 1) = ¢ml(k) - cml(k), y(k) = ¢m2(k) - cm2(k).

The weights are adjusted to minimize the following error

1 1 Em(k) = 2xT(k + l)x(k + 1) + 2yT(k)Y(k).

By a gradient-based adaptation rule, the weights are updated as

Wm1(k + 1) = Wm1(k) -lmlO"(z(k))xT(k + 1), Wm2(k + 1) = Wm2(k) -lm20"(z(k))yT(k), (10)

where lml > 0, 1m2 > 0 are learning rates. Before proceeding, the following assumption is necessary. Assumption 1: The NN approximation errors Cml (k) and

Cm2 (k) are assumed to be upper bounded by a function of estimation error such that

c;'l(k)cml(k) � AMXT(k)x(k), ¢;'2(k)cm2(k) � ¢;'2(k)¢m2(k),

where 0 < AM < 1 is a bounded constant value. Then, we have the following theorem.

Theorem 3.1: Let the identification scheme (8) be used to identify the nonlinear system (6), and let the NN weights be updated by (10). If Assumption 1 holds, then the system identification error x( k) is asymptotically stable and the error matrices Wm1(k), Wm2(k) both converge to zero, as k -+ 00.

Proof" Consider the following Lyapunov function candidate defined as

The difference of the Lyapunov function candidate is given by

b.L(x(k), Wm1(k), Wm2(k)) =xT(k + l)x(k + 1) - xT(k)x(k)

+ -11 tr{W�l(k + I)Wml(k + 1) - W�l(k)Wml(k)} ml

+ -11 tr{W�2(k + I)Wm2(k + 1) - W�2(k)Wm2(k)}. m2

356

With the identification error dynamics (9) and the weight tuning rules of Wm1(k + 1), Wm2(k + 1) in (10), we can obtain

b.L(x(k), Wm1(k), Wm2(k)) = ¢;'l (k)¢ml (k) - 2¢;'1 (k)cml (k) + C;'l (k)cml (k)

+ lmWT(z(k))O"(z(k))xT(k + l)x(k + 1) - xT(k)x(k) + lm20"T(z(k))0"(z(k))yT(k)y(k) - 2Wm2(k)O"T(z(k))y(k) - 2¢;'1(k)x(k + 1).

Applying the Cauchy-Schwarz inequality, we can get

b.L(x(k), Wm1(k), Wm2(k)) � ¢;'l(k)¢ml(k) + c;'l(k)cml(k) - xT(k)x(k)

+ 21mlO"T(z(k))0"(z(k)) (¢;'l(k)¢ml(k) + c;'l(k)cml(k)) - 2(¢;'2(k)¢m2(k) - ¢;'2(k)cm2(k)) + lm20"T(z(k))0"(z(k)) x (¢;'2(k) - cm2(k)f(¢;'2(k) - cm2(k)). Considering 1100(z(k))11 � O"M, we can get

b.L(x(k), Wm1(k), Wm2(k)) � - (1- 21mW�I) I I¢ml(k)112

2 � 2 - (1- AM - 21mlAMO"M)llx(k)11 - 2 (11¢m2(k)112 - ¢;'2(k)cm2(k)) + lm20"i:I II¢;'2(k) - cm2(k)11

2. Let lml be selected as

and 1m2 be selected as

I ' < 2(1I¢m2(k)112 - ¢;'2(k)cm2(k)) m2 - 2 II T 112 O"M ¢m2(k) - cm2(k)

Then we have b.L( k) � O. The proof is completed. •

B. Data-Based Identifications of Coal Quality and Reference

Control

In this subsection, NN will be used to identify the coal quality function 8(k) and solve the reference control law U f (k) using the system data. Different from the system modeling, the coal quality data cannot generally be detected and identified in real-time coal gasification process. This means that the coal quality data can only be achieved offline. Noticing this feature, an iterative training method of the neural networks can be adopted.

According to (6), we can solve 8(k), which is expressed as

8(k) = Fe(x(k), x(k + 1), y(k), u(k)). (11)

Usually, FeO is a high nonlinear system and the analytical expression of Fe 0 is nearly impossible to obtain. Thus, a BP NN (8 network for brief) is established to identify the coal quality function 8(k).

Let the number of hidden layer neurons be denoted as Le. Let the ideal weights be denoted as We. The NN representation of (I I ) can be written as

8(k) = WeT a(ze(k)) + ce(k), (12)

where ze(k) = [xT (k), xT (k + 1), yT (k), uT (k)]T and

c e (k) is the reconstruction error. The NN coal quality function is constructed as

8(k) = WJ; (k)a(ze (k)) (13)

where 8(k) the estimated coal quality function, WJ;(k) is estimated weight matrix. According to (I I ), we notice that solving 8(k) needs the data of x(k+l). As we adopt off-line data to train the NN, the corresponding data can be achieved. Define the identification error as

8j (k) = 8(k) - 8j (k) = ¢�(k) - ce(k), . � T �T where ¢�(k) = W� (k)a(ze(k)) and W� (k) =

W�T (k) - WE't(k). The weights are adjusted to minimize the following error

Ee(k) = � 8jT (k)8j (k). 2 By a gradient-based adaptation rule, the weights are updated as

where Ie > 0 is the learning rate. Next, we will solve the reference control using NN (uf

network for brief). In this paper, as we aim to design a state feedback controller to make the system state track the desired one, according to the state equation in (6), we give x(k), x(k + 1), 8(k) to approximate the reference control function U f (k), which is expressed as

Uf(k) = Fu(x(k), x(k + 1), 8(k)). (15)

Let the number of hidden layer neurons be denoted as Lu. Let the ideal weights be denoted as W,;. The NN representation of (I I ) can be written as

(16)

where zu(k) = [xT (k), xT (k + 1), 8T (k)]T and cu(k) is the reconstruction error. The NN reference control is constructed as

Uf(k) = Fu(x(k), x(k + 1), 8(k)) = W,'; (k)a(zu (k)) (17)

where U f (k) the estimated reference control, W,'; (k) is estimated weight matrix. Define the identification error as

uj(k) = uf(k) - uj(k) = ¢Uk) - cu(k), (18)

where ¢Uk) = WF(k)a(zu(k)) and WF(k) = WF(k)W:T(k). The weights are adjusted to minimize the following error

357

By gradient-based adaptation rule, the weights are updated as

where lu > 0 is the learning rate. Now we give the convergence properties of 8 network and

uf network. Theorem 3. 2: Let the identification schemes (13) and (17)

be used to identify 8(k) and uf(k) in (I I ) and (16), respectively. Let the NN weights be updated by (14) and (20), respectively. If for 'v'j = 1, 2, . . . , the inequalities

¢{{ (k)ce(k) S ¢{{ (k)¢�(k), ¢tT(k)cu(k), S ¢tT(k)¢Uk) (21)

hold, then the error matrices WJ; (k), W,'; (k) both converge to zero, as j ---+ 00.

IV. DESIGN OF OPTIMAL TRACKING CONTROLLER BY

ITERATIVE ADP ALGORITHM WITH ApPROXIMATION ERRORS

A. The System Transformation With System Errors and Con

trol Disturbance

In order to transform the system, for the desired system state '17, a desired reference control (desired control for brief) can be obtained. Taking the desired state trajectory rl into (15), we can obtain the reference control trajectory

Ud (k) = Fu (7J, 7J, 8 (k ) ), where Ud (k) is defined as the desired control. Let

Fu(I7, I7, 8(k)) = W,';(k)a(rl, I7, 8(k)) be the neural network function which approximates the reference control ud(k). If the weights W,';(k) converge to W:T(k) suffi

ciently, then we have

ud(k) = Fu(l7, rl, 8(k)) + cu(k). (22)

As 8(k) cannot be obtained directly, 8 network is used to approximate 8(k). According to (12) and (13), let weights We (k) be convergent to We sufficiently, we have

8(k) = 8(k) + ce(k). As the activation function a (. ) is smooth. Let Zdu (k) =

[I7T, I7T, 8T(k)]T and zdu(k) = [I7T, I7T, 8T(k)]T. According to mean value theorem, we have

Fu(Zdu(k)) = Fu(Zdu(k)) -\7(�e)ce(k), (23)

where \7(�e) = oFu(l7, rl, �e) , �e = ce8(k) + (1 _

o�e ce)(8(k) - ce(k)) and 0 S Ce S 1 is a certain constant. As ce (k) is bounded and a( ·) is smooth, then we have

11\7(�e)ce(k)11 is bounded. If we let the neural-networked reference control be expressed by

(24)

then we can get

(25)

where Eu(k) = \7(�e)ce(k) + cu(k). Let ui5(k) is the error between the control u(k) and the reference control ud(k), then we can obtain

u(k) = ud(k) + ui5(k) + Eu(k) = u(k) + Eu(k), (26)

where we let u(k) = ud(k)+ ui5(k) be the estimation control. According to (8), let the weights of neural networks are

convergent to W,';,l (k) and W,';,2 (k) sufficiently. If we let

F(z(k)) = W;;:l(k)O"(z(k)), then the system state equation can be written as

x(k + 1) = F(z(k)) + cml(k). (27)

As u(k) and 8(k) cannot be obtained directly, approximations are adopted. Let z(k) = [xT (k), ii:T (k), 8T (k)]T. As the activation function 0" (.) is smooth. Then according to mean value theorem, we have

F(z(k)) = F(z(k)) + \7(�u)su(k) + \7(��)ce(k),

h D(t) - aF(x(k), �u(k), 8(k)) t - -(k) w ere v c,u -a�u (k) , c,u - Cu U +

(1 - cu)(u(k) - su(k)), 0 ::.; Cu ::.; 1 is a certain constant.

Let \7(tl ) = aF(x(k), u(k), ��) tl = I 8(k) + (1 _ c,e atl ' c,e ce c, e

c�)( 8 (k) -ce (k)), and 0 ::.; c� ::.; 1 is a certain constant. As

su(k), c�(k) are both bounded and 0" ( ' ) is smooth, then we have 11\7(�u)su(k)11 and 11\7(��)c�(k)11 are both bounded. So we let 11\7(�u)su(k)11 ::.; Iisull and 11\7(��)c�(k)11 < II sell · Then (27) can be written as

x(k + 1) = F(x(k), u(k), 8(k)) + \7(�u)su(k) (28)

Let the tracking error be defined as

e(k) = x(k) - rl, (29)

where 17 is the desired state trajectory. Let

(30)

where ud(k) is the neural-networked reference control trajectory expressed by (24). According to (25), and (30), we can get

u(k) = ue(k) + ud(k) + Eu(k), (31)

where Eu(k) = su(k) + cu(k). According to (29) and (31), we have

F(x(k), u(k), 8(k)) = F((e(k) + rl), ( ue(k) + ud(k)), 8(k)) + \7(�u)Eu(k), (32)

h D(t ) aF((e(k) + 17), �'" 8(k)) d t _ w ere v c,u - an c,u -

a�u Cu ( ue (k) +Ud( k) )+(l-cu) ( ue (k )+Ud( k) +( Su (k) +cu (k))), o ::.; Cu ::.; 1. Thus (28) can be written as

e(k + 1) =F((e(k) +17), ( ue(k) + ud(k)), 8(k)) - rl + w(k), (33)

358

where

w(k) = \7(�u)su(k) + \7(��)c�(k) + Cml (k) + \7(�u)Eu(k). As \7(�u)su(k), \7(��)c�(k), \7(�u)Eu(k) and cml(k) are all bounded, we have the system disturbance is bounded. Let

Ilcml(k)11 ::.; IlsmIli and 11\7(�u)Eu(k)11 ::.; IIEull, then we can get

On the other hand, according to (24), ud(k) can also be seen as a constant vector. Then system (33) can be transformed as the following regulation system

e(k + 1) = P(e(k), ue(k), 8(k)) + w(k), (34) where P(e(k), ue(k), 8(k)) = F((e(k) + 17), ( ue(k) + ud(k)), 8(k)) -17. From (34), we can see that the nonlinear tracking control system (6) is transformed into a regulation system, where the system errors and the control fluctuation are transformed into an unknown bounded system disturbance.

B. Derivation of the Iterative ADP Algorithm With Approx

imation Errors

In this subsection, our destination is to obtain an optimal control that makes the tracking error e( k) converge to zero under the system disturbance w(k). As the system disturbance w(k) is unknown, this makes the design of the optimal controller very difficult. In [3], the optimal control problem for system (34) was transformed into a two-person zerosum optimal control problem, where the system disturbance w(k) was defined as a system control. The optimal control law is obtained under the worst case of the disturbance (the disturbance control maximizes the performance index function). Inspired by [3], we define w(k) as a disturbance control of the system and the two controls Ue (k) and w (k) of system (34) are designed to optimize the following quadratic performance index function

=

k=O -wT(k)Cw(k)) , (35)

where we let JJe(k) = ( ue(k), ue(k + 1), ... ) and Jll.(k) = (w(k), w(k+l), . . . ). Define the matrices A, B, C > O. Then, the optimal performance index function can be defined as

J*(e(k)) = min max {J(e(k), JJe(k), Jll.(k))}. (36) :!!:c(k) :!Q(k)

Let U(e(k), ue(k), w(k)) = eT(k)Ae(k) + uI(k)B ue(k)wT(k)Cw(k) be the utility function. In this paper, we assume that the utility function U(e(k), ue(k), w(k)) > 0 for \/e(k), ue(k), w(k) "I- O. Generally, the system errors are small. This makes the system disturbance w(k) be small and the utility function is larger than zero. If w(k) are large, we can reduce the matrix C or enlarge the matrices A and B. Hence the assumption U(e(k), ue(k), w(k)) > 0 can be guaranteed.

According to the principle of optimality, J* ( e (k)) satisfies the discrete-time Hamilton-1acobi-Isaacs (HJI) equation

J*(e(k)) = min max{U(e(k), ue(k), w(k))+ J*(e(k + I))}. u,(k) w(k) (37)

Define the laws of optimal controls as

w*(e(k)) = argmin {U(e(k), ue(k), w(k)) + J*(e(k + I))}, w(k)

u:(e(k)) = arg min {U(e(k), ue(k), w*(e(k))) uc(k)

+J*(e(k + I))}. Hence, the HJI equation (37) can be written as

J*(e(k)) = U(e(k), u:(e(k)), w*(e(k))) + J*(e(k + 1)).

We can see that if we want to obtain the optimal control laws u�(e(k)) and w*(e(k)), we must obtain the optimal performance index function J*(e(k)). Generally, J*(e(k)) is unknown before all the controls Ue (k) and w (k) are considered. This makes HJI equation is generally unsolvable. In this paper, a new iterative ADP algorithm with system and approximation errors is developed to overcome these difficulties. In the present iterative ADP algorithm, the performance index function and control law are updated by iterations, with the iteration index i increasing from ° to infinity. Let the initial performance index function Vo(e(k)) == 0. For i = 0,1, . . . , The iterative control law wi(k) and vi(e(k)) can be computed as

Wi( e(k)) = arg max {U( e(k), ue(k), w(k)) + V;( e(k + I))} , w(k) (38)

vi(e(k)) =argmin{U(e(k), ue(k),wi(e(k)))+ V;(e(k + I))}. u,(k)

Update the iterative performance index function by

V;+l(e(k)) = min max {U( e(k), ue(k), w(k)) + V;( e(k + I))} u,(k) w(k)

(39)

= U(e(k), vi(e(k)),wi(e(k)) + V;(e(k + 1)) (40)

where e( k + 1) is expressed as (34). Next, we will analyze the convergence property of the

iterative ADP algorithm. Theorem 4.1: For \I i = 0, 1, . . . , let V; (e( k)) be expressed

as (40) and J* (e( k)) is the optimal performance index function. Then we have

V;(e(k)) --+ J*(e(k)), (41)

as i --+ 00. Proof" First, we will show that for i 0, 1, ... ,

V;(e(k)) is a monotonically increasing sequence by mathematical induction. Let i = 0, we have

V1(e(k)) = min max {U(e(k), ue(k), w(k)) + Vo(e(k + I))} u,(k) w(k) = U(e(k),O,O) � Vo(e(k)). (42)

359

Assume that the conclusion holds for i = 1 -1, I = 1,2, . . . . Then for I, we have

Vi+1(e(k))

= min max {U( e(k), ue(k), w(k)) + Vi( e(k + I))} u,(k) w(k)

� min max{U(e(k), ue(k), w(k)) + Vi-l(e(k + I))} uc(k) w(k) =Vi(e(k)). (43)

The mathematical induction is completed. Next, we will show that V;(e(k)) :::; J*(e(k)) for \Ii =

0,1, . . . . We also use mathematical induction to prove the conclusion. Assume that there exist constants , and t5 that satisfy J*(e(k + 1)) :::; ,U(e(k), ue(k), w(k)) and Vo(e(k)) :::; t5J*(e(k)). Let i = 1. We have

V1(e(k)) = min max{U(e(k),ue(k),w(k)) + Vo(e(k + I))} uc(k) w(k)

:::; (1 + (1 ��1)) J*(xd·

Assume that the conclusion holds for i = 1- 1, I = 1,2, . . . . Then for i = I, we have

Vi+l(Xk) = min max {U(e(k), ue(k), w(k)) + Vi(e(k + I))} u,(k) w(k) :::; min max{U(e(k),ue(k),w(k))

u,(k) w(k)

+ (1 + t5 -1

I-I) J*(e(k + I))} (1 +,-1)

:::; (1 + (1 : ��1 ) 1 ) J*(e(k)).

The proof is completed.

V. NUMERICAL ANALYSIS

•

In this section, numerical experiments are will be studied to show the effectiveness of our iterative ADP algorithm. Let the coal gasification control system be expressed as (6). We let the current reaction temperature in the gasifier be x(O) = 1000 D C . Observe corresponding current system input and output data (Kg/h) which are Uo = [60960,47572, 44752jT and Yo = [74690,34381,4265,29653, 10295jT.

Let the desired reaction temperature '17 = 1320 D C . To model the coal gasification control system (6), we collect 20,000 temperature data from the real-world coal gasification operational system. The corresponding 20,000 system input data and 20,000 system output data are also recorded. Then three-layer BP NN is established with the structure 8-20-1 to approximate the state equation in (6) and the NN is model network. The control input is expressed by (4). We also use three-layer BP NN with structure 8-20-5 to approximate the input-output equation in (6) and the NN is input-output network. Let the learning rates of the model network and input-output network be both 1m = 0.002. Using the gradient-based weight update rule [18] to train the neural networks for 20,000 iteration steps to reach the training precision 10-6. The converged weights are given by Next, we adopt three-layer BP NNs to identify the coal

quality equation (12) and the reference control equation (15). The structure of 8 network and U f network are chosen as 10-20-4 and 6-20-3, respectively. Using the gradient-based weight update rule, train the two neural networks for 20,000 iteration steps under the learning rate 0.002 to reach the training precision 10-6. The converged weights are gIven by

Taking the current system data x(O), Uo and Yo into 8 network and we can obtain the coal quality 8 (k) =

[0.6789,0.0373,0.1149,0.1689]T. Taking the desired state 17 = 1320 and the coal quality 8(k) into uf network, we can obtain the desired control input expressed by Ud (k) =

[61408.7452,44430.69, 51200]T. Define the training precisions of model network, 8 network, and U f network to 10-3. The training precisions of critic and action networks are kept with 10-6.

Let the iteration index i = 200. The convergent trajectory of the iterative performance index function is shown in Fig. 1. We apply the optimal control law to the system for Tf =

100 time steps and obtain the following results. The optimal state trajectory is shown in Fig. 2. The corresponding control trajectories and system output trajectories are shown in Fig. 3 and Fig. 4, respectively.

co

1 1 � 08 '" " @ 0.6 § .g rf. 0.4

O:O--=-��--=80--�'O�O�'2=O�'�40��'6�O�'=80�200 Iteration steps

Fig. l. The convergent trajectory of iterative performance index function.

Fig. 2. The trajectory of state.

360

� X 104 � X 104

6.15 4.9 " " <ii � 4.8 .§. OS

.� �

-�

4.7 '0 '0 � � 4.6 c. c. c c

� � 4.5

0 6.11 0 4.4 u 0 50 100 u 0 50 100 Time steps Time steps

(a) (b) x 104

X 104

� 5.2 �

� 7.9 6 So'.

� >: '0 '0 � � c. % c

� 0

E 0 � u 4.4 Ul 7.5 0 50 100 0 50 100

Time steps Time steps (c) (d)

Fig. 3. The trajectories of control and system output. (a) Coal input trajectory. (b) H20 input trajectory. (c) 02 input trajectory. (d) CO output trajectory.

x 104 � 3.6 � 4500 � � 8 3.4 N So'. OS 4000 �

N >-�

� 3.2 % � 0 3500

E � i � Ul 2.8 Ul 3000 0 50 100 0 50 100

Time steps Time steps (a) (b)

� � x 104

1.038 " " � ! OS �

� >-

�

� �

� c. S 0

E E i 2.8 � 1.032 Ul 0 50 100 Cf) 0 50 100

Time steps Time steps (c) (d)

Fig. 4. The trajectories of system output. (a) C02 output trajectory. (b) H2 output trajectory. (c) H20 output trajectory. (d) Char output trajectory.

V I. CONCLUSIONS

In this paper, an effective iterative ADP algorithm is established to solve optimal tracking control problems for coal gasification systems. NNs are use to approximate the system model, the coal quality and the reference control, respectively and the mathematical model of the coal gasification is unnecessary. Considering the system errors of NNs, the optimal tracking control problem is transformed into a two-person zero-sum optimal regulation control problem. Iterative ADP algorithm is then established to obtain the optimal control law with convergence proofs. Finally, numerical results are displayed to illustrate the performance of the developed algorithm.

REFERENCES

[1] M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network

HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, May 2005. [2] A. AI-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time

nonlinear HJB solution using approximate dynamic programming: convergence proof," IEEE Trans. Syst., Man, and Cybern. B, Cybern., vol. 38, no. 4, pp. 943-949, Aug. 2008.

[3] T. Basar and P. Bemard, Hoo Optimal Control and Related Minimax Design Problems. Boston, MA: Birkhauser, 1995.

[4] S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, "A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems," Automatica, vol. 49, no. I, pp. 82-92, Jan. 2013.

[5] T. Dierks and S. Jagannathan, "Online optimal control of affine nonlinear discrete-time systems with unknown intemaldynamics by using time-based policy update," IEEE Transactions on Neural Networks and

Learning Systems, vol. 23, no. 7, pp. 1118-1129, July 2012. [6] N. Gopalsami and A. C. Raptis, "Acoustic velocity and attenuation

measurements in thin rods with application to temperature profiling in coal gasification systems," IEEE Transactions on Sonics and Ultrasonics, vol. 31, no. I, pp. 32-39, Jan. 1984.

[7] T. Huang and D. Liu, "A self-learning scheme for residential energy system control and management," Neural Computing and Applications,

vol. 22, no. 2, pp. 259-269, Feb. 2013. [8] F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, "Reinforcement

learning and adaptive dynamic programming for feedback control," IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76-105, 2012.

[9] F. L. Lewis and D. Liu, ReintrJr(:ement Learning and Approximate Dynamic Programming.t(,r Feedback Control, New Jersey: John Wiley & Son, IEEE Press, 2012.

[10] D. Liu, H. Javaherian, O. Kovalenko, and T. Huang, "Adaptive critic learning techniques for engine torque and air-fuel ratio control," IEEE

Trans. Syst., Man, and Cybern. B, Cybern., vol. 38, no. 4, pp. 988-993, Aug. 2008.

[II] D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, "Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming," IEEE Transactions on

Automation Science and Engineering, vol. 9, no. 3, pp. 628-634, July 2012.

[12] D. Liu and Q. Wei, "Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems," IEEE Transactions on

Cybernetics, vol. 43, no. 2, pp. 779-789, April 2013. [13] D. Liu, Y. Zhang, and H. Zhang, "A self-learning call admission

control scheme for CDMA cellular networks," IEEE Trans. Neural

Networks, vol. 16, no. 5, pp. 1219-1228, Sept. 2005. [14] J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive

dynamic programming," IEEE Trans. Syst., Man, and Cybern. C, Appl.

Rev., vol. 32, no. 2, pp. 140-153, May 2002. [15] Z. Ni, H. He, and J. Wen, "Adaptive learning in tracking control

based on the dual critic network design," IEEE Transactions on Neural

Networks and Learning Systems, vol. 24, no. 6, pp. 913-928, June 2013.

[16] D. V. Prokhorov and D. C. Wunsch, "Adaptive critic designs," IEEE

Trans. Neural Networh, vol. 8, no. 5, pp. 997-1007, Sept. 1997. [17] P. Ruprecht, W. Schafer, and P. Wallace, "A computer model of

entrained coal gasification," Fuel, vol. 67, no. 6, pp. 739-742, 1988. [18] J. Si and Y.-T. Wang, "On-line learning control by association and

reinforcement," IEEE Trans. Neural Networks, vol. 12, no. 2, pp. 264-276, Mar. 2001.

[19] K, G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, "Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality," Automatica, vol. 48, no. 8, pp. 1598-1611, Aug. 2012.

[20] F. Y. Wang, N. Jin, D. Liu, and Q. Wei, "Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with E-error bound," IEEE Trans. Neural Networh, vol. 22, no. I, pp. 24-36, Jan. 2011.

[21] F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: an introduction," IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 39-47, 2009.

[22] Q. Wei, H. Zhang, and J. Dai, "Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions," Neurocomputing, vol. 72, no. 7-9, pp. 1839-1848, 2009.

[23] Q. Wei and D. Liu, "An iterative E-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state," Neural

Networks, vol. 32, pp. 236-244, 2012.

361

[24] P. J. Werbos, "Advanced forecasting methods for global crisis warning and models of intelligence," General Systems Yearbook, vol. 22, pp. 25-38, 1977.

[25] P. J. Werbos, "A menu of designs for reinforcement learning over time," in: Neural Networh.t(,r Control, W. T. Miller, R. S. Sutton and P. J. Werbos, Eds. Cambridge: MIT Press, 1991, pp. 67-95.

[26] X. Xu, Z. Hou, C. Lian, and H. He, "Online learning control using adaptive critic designs with sparse kernel machines," IEEE Transac

tions on Neural Networks and Learning Systems, vol. 24, no. 5, pp. 762-775, May 2013.

[27] H. Zhang, Q. Wei, and Y. Luo, "A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm," IEEE Trans. Syst., Man, and Cybern.

B, Cybern., vol. 38, no. 4, pp. 937-942, July 2008. [28] H. Zhang, Q. Wei, and D. Liu, "An iterative adaptive dynamic pro

gramming method for sol ving a class of nonlinear zero-sum differential games," Automatica, vol. 47, no.!, pp. 207-214, Jan. 2011.

[ieee 2013 sixth international conference on advanced computational intelligence (icaci) - hangzhou,...

Documents