computational nonlinear stochastic control based on the ... · computational nonlinear stochastic...

15
Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar * S. Chakravorty John L. Junkins February 13, 2008 Abstract The optimal control of nonlinear stochastic systems is considered in this paper. The central role played by the Fokker-Planck-Kolmogorov equation in the stochastic control problem is shown under the assumption of asymptotic stability. A computational approach for the problem is devised based on policy iteration/ successive approximations, and a finite dimensional approximation of the control parametrized diffusion operator, i.e., the controlled Fokker- Planck operator. Several numerical examples are provided to show the efficacy of the proposed computational methodology. 1 Introduction Numerous fields of science and engineering present the problem of uncertainty propagation through nonlinear dynamic systems. 1 One may be interested in the determination of the response of engineering structures - beams/plates/entire buildings under random excitation (in structure mechanics 2 ), or the motion of particles under the influence of stochas- tic force fields (in particle physics 3 ), or the computation of the prediction step in the design of a Bayesian filter (in filtering theory) 4, 5 among others. All these applications require the study of the time evolution of the Probability Density Function (PDF), p(t, x), corresponding to the state, x, of the relevant dynamic system. The pdf is given by the solution to the Fokker-Planck-Kolmogorov equation (FPE), which is a PDE in the pdf of the system, defined by the underlying dynamical system’s parameters. In this paper, approximate solutions of the FPE are considered, and subsequently leveraged for the design of controllers for nonlinear stochastic dynamical systems. In the past few years, the authors have developed a generalized multi-resolution meshless FEM methodology, partition of unity FEM (PUFEM), utilizing the recently developed GLOMAP (Global Local Orthogonal MAPpings) methodology to provide the partition of unity functions 6, 7 and the orthogonal local basis functions, for the solution of the FPE. The PUFEM is a Galerkin projection method and the solution is characterized in terms of an finite dimensional representation of the Fokker-Planck operator underlying the problem. The methodology is also highly amenable to parallelization. 8–12 Though the FPE is invaluable in quantifying the uncertainty evolution through nonlinear systems, perhaps its greatest benefit may be in the stochastic analysis, design and control of nonlinear systems. In the context of nonlinear stochastic control, Markov Decision Processes have long been one of the most widely used methods for discrete time stochastic control. However, the Dynamic Programming equations underlying the MDPs suffer from the curse of dimensional- ity. 13–15 Various approximate Dynamic Programming (ADP) methods have been proposed in the past several years for overcoming the curse of dimensionality, 15–19 and broadly can be categorized under the category of functional rein- forcement learning. These methods are essentially model free method of approximating the optimal control policy in stochastic optimal control problems. These methods generally fall under the category of value function approximation * Mrinal Kumar is a Ph.D student in the department Aerospace Engineering at the Texas A&M University, College Station, TX-77843, [email protected], http://people.tamu.edu/˜mrinal. Suman Chakravorty is an Assistant Professor with the Faculty of Aerospace Engineering at Texas A&M University, College Station John L. Junkins is the Royce E. Weisenbaker Distinguished Professor of Aerospace Engineering at Texas A&M University, College Station 1

Upload: others

Post on 28-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

Computational Nonlinear Stochastic Control based on the

Fokker-Planck-Kolmogorov Equation

Mrinal Kumar ∗S. Chakravorty †John L. Junkins ‡

February 13, 2008

Abstract

The optimal control of nonlinear stochastic systems is considered in this paper. The central role played by theFokker-Planck-Kolmogorov equation in the stochastic control problem is shown under the assumption of asymptoticstability. A computational approach for the problem is devised based on policy iteration/ successive approximations,and a finite dimensional approximation of the control parametrized diffusion operator, i.e., the controlled Fokker-Planck operator. Several numerical examples are provided to show the efficacy of the proposed computationalmethodology.

1 Introduction

Numerous fields of science and engineering present the problem of uncertainty propagation through nonlinear dynamicsystems.1 One may be interested in the determination of the response of engineering structures - beams/plates/entirebuildings under random excitation (in structure mechanics2), or the motion of particles under the influence of stochas-tic force fields (in particle physics3), or the computation of the prediction step in the design of a Bayesian filter (infiltering theory)4,5 among others. All these applications require the study of the time evolution of the ProbabilityDensity Function (PDF), p(t,x), corresponding to the state, x, of the relevant dynamic system. The pdf is givenby the solution to the Fokker-Planck-Kolmogorov equation (FPE), which is a PDE in the pdf of the system, definedby the underlying dynamical system’s parameters. In this paper, approximate solutions of the FPE are considered,and subsequently leveraged for the design of controllers for nonlinear stochastic dynamical systems. In the past fewyears, the authors have developed a generalized multi-resolution meshless FEM methodology, partition of unity FEM(PUFEM), utilizing the recently developed GLOMAP (Global Local Orthogonal MAPpings) methodology to providethe partition of unity functions6,7 and the orthogonal local basis functions, for the solution of the FPE. The PUFEMis a Galerkin projection method and the solution is characterized in terms of an finite dimensional representation ofthe Fokker-Planck operator underlying the problem. The methodology is also highly amenable to parallelization.8–12

Though the FPE is invaluable in quantifying the uncertainty evolution through nonlinear systems, perhaps its greatestbenefit may be in the stochastic analysis, design and control of nonlinear systems. In the context of nonlinear stochasticcontrol, Markov Decision Processes have long been one of the most widely used methods for discrete time stochasticcontrol. However, the Dynamic Programming equations underlying the MDPs suffer from the curse of dimensional-ity.13–15 Various approximate Dynamic Programming (ADP) methods have been proposed in the past several yearsfor overcoming the curse of dimensionality,15–19 and broadly can be categorized under the category of functional rein-forcement learning. These methods are essentially model free method of approximating the optimal control policy instochastic optimal control problems. These methods generally fall under the category of value function approximation

∗Mrinal Kumar is a Ph.D student in the department Aerospace Engineering at the Texas A&M University, College Station, TX-77843,[email protected], http://people.tamu.edu/˜mrinal.

†Suman Chakravorty is an Assistant Professor with the Faculty of Aerospace Engineering at Texas A&M University, College Station‡John L. Junkins is the Royce E. Weisenbaker Distinguished Professor of Aerospace Engineering at Texas A&M University, College

Station

1

Page 2: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

methods,15 policy gradient/ approximation methods17,18 and actor-critic methods.16,19 These methods attempt toreduce the dimensionality of the DP problem through a compact parametrization of the value functions (with respectto a policy or the optimal function) and the policy function. The difference in the methods is mainly through theparametrization that is employed in order to achieve the above goal, and range from nonlinear function approximatorssuch as neural networks16 to linear approximation architectures.15,20 These methods learn the optimal policies by re-peated simulations on a dynamical system and thus, can take a long time to converge to a good policy, especially whenthe problem has continuous state and control spaces. In contrast, the methodology proposed here is model-based, anduses the finite dimensional representation of the underlying parametrized diffusion operator to parametrize both thevalue function as well as the control policy in the stochastic optimal control problem. Considering the low order finitedimensional controlled diffusion operator allows us to significantly reduce the dimensionality of the planning problem,while providing a computationally efficient recursive method for obtaining progressively better control policies.

The literature on computational methods for solving continuous time stochastic control problems is relatively sparsewhen compared to the discrete time problem. One of the approaches is through the use of locally consistent Markovdecision processes.21 In this approach, the continuous controlled diffusion operator is approximated by a finite dimen-sional Markov chain which satisfies certain local consistency conditions, namely that its drift and diffusion coefficientsmatch that of the original process locally. The resulting finite state MDP is solved by standard DP techniques such asvalue iteration and policy iteration. The method relies on a finite difference discretization and thus, can be computa-tionally very intensive in higher dimensional spaces. In another approach,22,23 the diffusion process is approximatedby a finite dimensional Markov chain through the application of generalized cell to cell mapping.24 However, eventhis method suffers from the curse of dimensionality because it involves discretizing the state space into a grid whichbecomes increasingly infeasible as the dimension of the system grows. Also, finite difference and finite element methodshave been applied directly to the nonlinear Hamilton-Jacobi-Bellman partial differential equation.25,26 The methodproposed here differs in that it uses policy iteration in the original infinite dimensional function space, along with afinite dimensional representation of the controlled diffusion operator in order to solve the problem. Considering a lowerorder approximation of the underlying operator results in a significant reduction in dimensionality of the computa-tional problem. Utilizing the policy iteration algorithm results in typically having to solve a sequence of a few linearequations (typically < 5) before practical convergence is obtained, as opposed to solving a high dimensional nonlinearequation if the original nonlinear HJB equation is solved.

The literature for solving deterministic optimal control problems in continuous time is relatively mature when com-pared to its stochastic counterpart. The method of successive approximations/ policy iteration has widely been usedto solve deterministic optimal control problems. Various different methods have been proposed for solving the policyevaluation step in the policy iteration algorithms that include Galerkin-based methods27,28 and neural network basedmethods29–33 among others. The methodology outlined here can be viewed as an extension of this extensive body ofwork to the stochastic optimal control problem. However, it should be noted, that the extension is far from trivial asthe stochastic optimal control problem, especially the the control of degenerate diffusions, has pathologies that haveto be treated carefully in order to devise effective computational methods for solving such problems. We would liketo mention that we are interested in the feedback control of dynamical systems here, i.e, in the solution of the HJBequation as opposed to solving the open loop optimal control problem based on the Pontryagin minimum principle34

that results in a two-point boundary value problem involving the states and co-states of the dynamical system. Pleasesee references34–36 for more on this approach to solving the optimal control problem. Also, we would like to state herethat, to the best of our knowledge, there is no stochastic equivalent to the two-point boundary value problems thatresult from the Pontryagin minimum principle being applied to the deterministic optimal control problem.

The rest of the paper is arranged as follows. Section 2 introduces the forward and backward Kolmogorov equationsand shows their equivalence under the condition of asymptotic stability. Section 3 outlines the computational methodused to obtain a finite dimensional representation of the infinite dimensional Fokker-Planck or forward Kolmogorov(FPE), and the backward Kolmogorov Equation (BKE). In Section 4, the finite dimensional representations of theFPE and BKE operators are used to outline a recursive procedure, based on policy iteration, to obtain the solutionto the stochastic nonlinear control problem. In Section 5, the methodology is applied to various different nonlinearcontrol problems.

2

Page 3: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

2 The Forward and Backward Kolmogorov Equations

In this section, we shall introduce the forward Kolmogorov or the Fokker-Planck Equation (FPE) and the BackwardKolmogorov equation. While the former is central to the propagation of uncertainty and nonlinear filtering of dynamicalsystems, the latter is paramount in the solution of stochastic control problems. It can be shown that if the FPE isasymptotically stable, then the forward and backward equations are equivalent in a sense to be made precise later inthis section.For the sake of simplicity, we shall consider the case of a one-dimensional system in this section, the results are easilygeneralized to multidimensional systems. Consider the stochastic differential equation:

x = f(x) + g(x)W, (1)

where W represents white noise. Associated with the above equation is the Forward Kolmogorov or the Fokker-PlanckEquation (FPE):

∂p

∂t= −∂(fp)

∂x+

12∂2(g2p)∂x2

, (2)

and the Backward Kolmogorov Equation (BKE):

∂p

∂t= f

∂(p)∂x

+g2

2∂2p

∂x2. (3)

The FPE governs the evolution of the probability density function of the stochastic system Eq.1 forward in time whilethe BKE governs the evolution of the uncertainty (pdf) in the system backward in time.

The FPE is said to be asymptotically stable if there exists a unique pdf p∞ such that any initial condition, eitherdeterministic or probabilistic, decays to the pdf p∞ as time goes to infinity. In the following,we show the equivalenceof the BKE and the FPE under the condition of asymptotic stability of the FPE.Since the FPE is asymptotically stable, let the pdf at any time be represented as follows:

p(x, t) = p∞(x)p(x, t), (4)

i.e., as the product of the stationary pdf and a time varying part. Substituting into the FPE, we obtain:

∂p∞p

∂t= −∂(fp∞p)

∂x+

12∂2(g2p∞p)

∂x2, (5)

which after using the fact that p∞ is the stationary solution, i.e., that

∂p∞∂t

= −∂(fp∞)∂x

+12∂2(g2p∞)∂x2

= 0, (6)

and expanding the various terms in Eq. 5 leads us to the following identity:

p∞∂p

∂t= −fp∞

∂p

∂x+

12g2p∞

∂2p

∂x2+∂p

∂x

∂(g2p∞)∂x

. (7)

Consider the term −fp∞ ∂p∂x + ∂p

∂x∂(g2p∞)

∂x in the above equation. Note that from Eq. 6 that

∂x(−fp∞ +

12∂g2p∞∂x

) = 0, (8)

and hence,

−fp∞ +12∂(g2p∞)∂x

= const. (9)

Hence, if we assume that the invariant distribution is well-behaved such that the terms fp∞ → 0, and ∂g2p∞∂x → 0 as

x→ ±∞, i.e, if the stationary distribution is not heavy-tailed, then

−fp∞ +12∂(g2p∞)∂x

= 0, (10)

3

Page 4: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

and hence, substituting the above into Eq. 7, and dividing throughout by p∞(x), it follows that

∂p

∂t= f

∂(p)∂x

+g2

2∂2p

∂x2, (11)

i.e, the time varying part of the pdf follows the BKE.Denote by Lf (.) and Lb(.) the FPE and BKE operators respectively, i.e.,

Lf (p) = −∂(fp)∂x

+12∂2(g2p)∂x2

, (12)

Lb(p) = f∂p

∂x+g2

2∂2p

∂x2. (13)

The BKE operator is the formal adjoint of the FPE operator. Under the condition of asymptotic stability of theFPE, and in view of the above development, the BKE and FPE are equivalent in the following sense: if ψ(x) is andeigenfunction of the BKE operator Lb(.) with corresponding eigenvalue λ, then p∞(x)ψ(x) is an eigenfunction of theFPE operator with the same eigenvalue λ. Hence, the knowledge of the eigenfunctions of the BKE operator along withknowledge of the stationary distribution of the FPE is sufficient for identifying the eigenstructure of the FPE, andvice-versa. Hence, under the condition of asymptotic stability, knowledge of the stationary solution along with anyone of the FPE or the BKE operators is sufficient to completely determine the characteristics of the other operator.In the following, we outline some of the other important properties of the FPE/ BKE operators under the conditionof asymptotic stability of the FPE. Consider the following Hilbert space, L2(p∞), i.e, the L2 space where the innerproduct is weighted by the stationary distribution:

< f, g >∞=∫f(x)g(x)p∞(x)dx. (14)

Then, from the development above it is quite simple to show that the BKE operator is a symmetric operator in L2(p∞),i.e.,

< Lb(f), g >∞=< f,Lb(g) >∞, ∀ f, g ∈ L2(p∞). (15)

Under the condition that the stationary distribution is not heavy tailed, the semi-group associated with the BKEoperator is compact as well as self-adjoint and thus, the operator has a discrete spectrum. Moreover, the eigenfunctionsof the operator also form a complete orthonormal basis for L2(p∞). However, the sufficient conditions for the existenceof a discrete spectrum are not well-known in the multidimensional case, especially for degenerate diffusions (please seebelow), to the best of the knowledge of the authors. However, in obtaining the finite dimensional representations of theoperators, the symmetry of the operator is enough to assure the well-posedness of the associated problems. In the caseof multidimenesional problems, the FPE and the BKE operators are respectively given by the following expressions:

Lf (p) = −∑

i

∂xifip+

12

∑i,j

∂2

∂xixj(GGt)ijp, (16)

Lb(V ) =∑

i

fi∂

∂xiV +

12

∑i,j

(GGt)ij∂2

∂xixjV, (17)

given the nonlinear stochastic multidimensional system

X = F (X) +G(X)W, (18)F (X) = [f1(X), · · · , fn(X)], (19)

where W is multidimensional white noise. Note that in most multidimensional systems, especially mechanical ones,the matrix G is rank deficient since the input to the systems is at only the velocity level variables, and hence, so is thematrix GGt. In general, a diffusion operator may be written as

D(V ) ≡∑

i

ai(X)∂V

∂xi+

12

∑i,j

dij(X)∂2V

∂xixj, (20)

where the A(x) = [a1(X), · · · , aN (X)] is a drift vector and D(X) = [dij(X)] is a diffusion matrix. In the case whenD(x) is not strictly positve definite for all x, the diffusion process is termed degenerate (also sometimes it is said the

4

Page 5: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

diffusion process violates the uniform ellipticity condition). For diffusions arising out of stochastic dynamical systems,i.e., the BKE operator, it follows that they are degenerate diffusions due to the rank defficiency of G. In general, thetheoretical treatment of the solution of such equations is made very difficult by the degeneracy because classical solu-tions to PDEs governed by such operators may not exist. In these cases, the generalized notion of viscosity solutionsneeds to be introduced.37–39

The FPE and the BKE are paramount in the propagation of uncertainty through, and the filtering and controlof stochastic nonlinear systems. As mentioned above though the theoretical treatment of these operators in themultidimensional case is very difficult, nevertheless, we may obtain a finite dimensional, i.e., a matrix representationof the above operators in order to practically solve these problems, atleast in an approximate fashion within theresulting finite dimensional space. We shall show the exact application of the finite dimensional representation of theoperators to uncertainty propagation, filtering and control in the next two sections.

3 Finite Dimensional Representation of the FPE/ BKE operators

As mentioned in the previous section, the FPE and the BKE operators are the key to the analysis, filtering and controlof stochastic dynamical systems. However, these are infinite dimensional operators and in order to be practicallyfeasible, a finite dimensional representation of these operators needs to be obtained. The Galerkin residual projectionmethod is one of the most powerful methods of obtaining such a representation and is outlined in the following.

The Galerkin projection method is a general method to solve partial differential equations and is often used to solve theFPK equation. Let p(t, x) represent the time varying solution to the FPK equation (2). Let φ1(x), · · · , φn(x), · · · represent a set of basis functions on some compact domain D. the various choices of the basis functions lead to differentmethodologies. The Galerkin method makes the assumption that the pdf p(t, x) can be written as a linear combinationof the basis functions φi(x) with time varying coefficients, i.e.,

p(t, x) =∞∑

i=1

ci(t)φi(x). (21)

The key to obtaining the solution to the pdf p(t, x) is to find the coefficients ci(t) in the above equation. The Galerkinmethod achieves this by truncating the expansion of p(t, x) for some finite N and projecting the equation error ontothe set of basis functions φi(x), i.e.,

p(t, x) ≈N∑

i=1

ci(t)φi(x), (22)

and

<∂∑N

i=1 ci(t)φi(x)∂t

− Lf (N∑

i=1

ci(t)φi(x)), φj(x) >= 0, (23)

where < ., . > represents the inner product on L2(D), the space of Lebesgue integrable functions on the domain D, andLf represents the FPE operator. Note that the above represents a finite set of differential equations in the co-efficientsci(t), which can be solved for the coefficients ci(t), given by:

N∑i=1

ci < φi, φj > −N∑

i=1

ci < Lf (φi), φj >= 0,∀j. (24)

Let

M = [< φi, φj >], (25)K = [< Lf (φi), φj >]. (26)

The pair (M,K) represents the finite dimensional approximation of the FPE operator. If the basis functions aremutually orthogonal, then the stiffness matrix K represents the finite dimensional representation of the FPE operator.An eigendecompositon of the pair (M,K) yields the approximation of the eigenstructure of the infinite dimensional

5

Page 6: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

FPE operator. The uncertainty propagation problem, as well as the prediction step of the filtering problem involvethe solution of the linear differential equation

MC +KC = 0, (27)

over a finite or infinite time interval. Given the eigenstructure of the above LTI system has been found and storeda priori, the above equation can be trivially solved in real-time. Moreover, if the FPK equation is asymptoticallystable, it can be shown after some work that the eigenvalues of the FPE operator are all real and non-positive. In thefollowing section, we shall show how the FPE/ BKE operator may be used to solve the stochastic nonlinear controlproblem.The Galerkin method is not the only method that is available in order to obtain the finite dimensional representation ofthe FPE/ BKE operator. The generalized Galerkin residual projection method can be used to obtain the representation.In the generalized procedure, the residual (or equation error) obtained by substituting the approximate solution intothe equation of interest, is projected onto a set of generalized functions instead of the basis functions themselves.For instance, collocation/ pseudospectral methods involve projecting the equation error onto a set of delta functionslocated at a specially selected set of collocation points.35,36 A full comparison of these methods for the computationalsolution of the FPE/ BKE equations is beyond the scope of this paper. However, we would like to mention thatthe generalized FEM based methods developed by the authors8–12 have consistently outperformed other competingGalerkin residual projection methods.

4 Nonlinear Stochastic Control

An iterative approach to solving the Hamilton-Jacobi-Bellman (HJB) equation is presented now for the optimal controlproblems of nonlinear stochastic dynamical systems. Consider the following dynamical system:

x = f(x) + g(x)u + h(x)w(t); x(t = 0) = x0 (28)

where, u is the control input (vector) and w represents white noise excitation. We are interested in minimizing thefollowing discounted cost functional over the infinite horizon:

J(x0) = E[∫ ∞

0

[l(ϕ(t)) + ‖u(ϕ(t))‖2R]e−βtdt

](29)

where, ϕ(t) is a trajectory that solves Eq.28 and l is often referred to as the state-penalty function, and β is a givendiscount factor. The infinite horizon problem results in a stationary, i.e., time invariant control law. However, for thecase of systems with additive noise, i.e., when h(x) is independent of x, as would be the case for instance in the LQGproblem, the cost-to-go is undefined since the trajectories of the system never decay to zero. Hence, in that case, thediscounting is a practical step to ensure a bounded cost-to-go function. The discount factor may also be interpretedas a finite horizon over which the control law tries to optimize the system performance. For this problem, the optimalcontrol law, u∗(x) is known to be given in terms of a value function V ∗(x) as follows:

u∗(x) = −12R−1gT ∂V

∂x(x) (30)

where, the value function V ∗(x) solves the following well known (stationary) Hamilton-Jacobi Bellman equation:

∂V ∗

∂x

T

f +12hQhT ∂

2V ∗

∂x2+ l − 1

4∂V ∗

∂xgR−1gT ∂V

∂x− βV ∗ = 0, V ∗(0) = 0 (31)

The above framework (solving for the optimal control law, u∗(x)) is known as the H2 control paradigm. Notice thatdesigning the optimal control in the above framework requires solving a nonlinear PDE (Eq.31), which is in generala very difficult problem. It is however possible to restructure the above equations so that the central problem canbe reduced to solving a sequence of linear PDEs - a much easier proposition. This is achieved via the substitution ofEq.30 into the quadratic term involving the gradient of the value function in the HJB (Eq.31). Doing so gives us thefollowing equivalent form the the HJB:

∂V ∗

∂x

T

(f + gu) +12hQhT ∂

2V ∗

∂x2+ l + ‖u‖2R − βV ∗ = 0 (32)

The above form of the HJB is known as the generalized HJB. Notice that the substitution discussed above has convertedthe nonlinear PDE to a linear PDE in the value function, V ∗(x). Eq.32 forms the core of a policy iteration algorithmin which the optimal control law can be obtained by iterating upon an initial stabilizing control policy as follows:

6

Page 7: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

1. Let u(0) be an initial stabilizing control law (policy) for the dynamical system (Eq. 28), i.e., the FPE correspond-ing to the closed loop under u(0) is asymptotically stable.

2. For i = 0 to ∞

• Solve for V (i) from:

∂V (i)

∂x

T

(f + gu(i)) +12hQhT ∂

2V (i)

∂x2+ l + ‖u(i)‖2R − βV (i) = 0 (33)

• Update policy as:

u(i+1)(x) = −12R−1gT ∂V

(i)

∂x(x) (34)

3. End [Policy Iteration]

The convergence of the above algorithm has been proven for the deterministic case (Q = 0) in references27,28 and istrue for the stochastic case too.40 However, conditions for the existence of classical solutions to the linear elliptic PDEin the policy evaluation step (Eq. 33), and the asymptotic stability of the closed loop systems under the resultingcontrol policies, in the sense that the associated FPE is stable, have not been obtained to the best knowledge of theauthors. In this paper, we shall assume that the policy evaluation step admits a classical solution as well as that thesequence of control policies generated asymptotically stabilize the closed loop system. In fact, these assumptions areborne out by the numerical examples considered in the next section. The great advantage of using the policy iterationalgorithm over directly trying to solve the nonlinear HJB equation is that the algorithm typically converges in veryfew steps (≤ 5).Let us take a closer look at Eq.33 - writing it in operator form, we have:

(L − β)V = q (35)

where,

L = (f + gu)T∂

∂x+

12hQhT ∂2

∂x2(36)

q = −(l + ‖u‖2R) (37)

Notice that the operator L() is identical to the BKE operator of the closed loop under control policy u. The approxi-mation for the value function is then written in terms of basis functions in the following manner:

V =N∑

i=1

ciφi(x) (38)

where, φi(x) could be local or global basis functions depending on the nature of the approximation being used. In thecurrent paper, we use global polynomials (i.e. φ()i are polynomials supported on Ω). Then, following the Galerkinapproach, we project the residual error resulting from plugging the approximation (Eq.38) into the generalized HJB(Eq.33) on to the space of polynomials on Ω as follows:

N∑i=1

ci

∫Ω

(L − β)[φi(x)]φj(x)dΩ =∫

Ω

qφjdΩ, j = 1, 2, . . . , N (39)

Eq.39 is equivalent to a linear system of algebraic equations in ci:

Kc = f (40)

where

Kij =∫

Ω

(L − β)[φj ]φidΩ (41)

fi =∫

Ω

qφidΩ (42)

7

Page 8: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

Note that

K = Kb − βI, (43)

where Kb is the finite dimensional representation of the backward Kolmogorov operator. Some notes on the admis-sibility of basis functions (φi) are due here. Notice that the BK operator involves only derivatives of the unknownfunction, V (x). It is thus clear that when the discount factor (β) is zero, it is not possible to include a use a completeset of basis functions without making the stiffness matrix K singular (φ ≡ 1 causes a rank deficiency of 1). However,so far as solving Eq. 40 is concerned, it is not important to have the constant basis function in the approximationspace. In theory, a nontrivial β allows us to include φ ≡ 1 in the basis set; but in practice it would require a largediscount factor before the K is reasonably well-conditioned for inversion; which in turn would take the solution farfrom being optimal since then the control policy becomes very short sighted.

The above formulation can also be carried out on local subdomains by choosing φi to be supported on subdomainsforming a cover for the global domain Ω. Then the projection integrals (39) would be evaluated over the localsubdomains. This is the basic approach in finite element methods and meshless finite element methods, where thelocal subdomains overlap each other. The authors have developed a method based on the Partition of Unity FiniteElement method (PUFEM) to solve the forward (Fokker Planck) and backward Kolmogorov equations.8–12

5 Numerical Examples

5.1 Two Dimensional System - Van der Pol Oscillator

Consider the following two dimensional stochastic Van der Pol oscillator:

x+ v1x+ v2(1− x2)x = u+ gw(t) (44)

with, v1 = 1, v2 = −1, g = 1. The objective is to minimize the following cost function:

J(x0, x0) = E[∫ ∞

0

12

[(x2 + x2) + u2

]dt

](45)

The approximation for the value function was written using a complete set of global polynomials up to fourth order(using polynomials up to the sixth order did not provide significant improvement in the approximation accuracy). Itis notable that in the above formulation, no direct means of enforcing state or control input constraints have beenused. In order to obtain reasonable bounds on the control input, the cost function (Eq.45) was scaled so as to modifythe generalized HJB (Eq.33) as follows (in addition to including the discount factor (β)):

∂V (i)

∂x

T

(f + gu(i)) +12hQhT ∂

2V (i)

∂x2+ βV = −eγ(l + ‖u(i)‖R) (46)

i.e.,

l(·) → eγ l(·) (47)R → eγR, γ < 0 (48)

Fig. 1 shows the converged results obtained for the above system. A linear starting policy was used to start theiteration process, i.e., u(0) = Kx, with K = −20. Tuning parameters β = 0.05 and γ = −0.15 were found to giveacceptable results after about 13 iterations of the control policy. Fig.2(i) shows the system response without a controlinput (open loop). Clearly, the states diverge. Fig.2(ii) shows the system response with the initial stabilizing control.For both these results, nominal initial conditions were used inside the domain of operation, Ω = [−5, 5]× [−5, 5]. Fig.3shows the optimal state trajectories obtained and the optimal control law.

5.2 Two Dimensional System - Duffing Oscillator

Next consider the following hard spring stochastic duffing oscillator:

x− x+ εx3 = u+ gw(t) (49)

8

Page 9: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

(i) Converged Value Function Surface. (ii) Converged Control Input Surface. The maximum con-trol required was modulated using the tuning parameterγ.

Figure 1 Converged Results for the Stochastic Van der Pol Oscillator.

(i) Open Loop System Response for Eq.44. (ii) System Response to Initial Stabilizing Control, u(0) =−20x.

Figure 2 System Response of the Stochastic Van der Pol Oscillator I.

9

Page 10: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

Response to u(0)

Response to uconverged

(i) Optimal Path Obtained, Shown Alongside Response tou(0).

(ii) Optimal Control Law Converged upon and u(0).

Figure 3 System Response of the Stochastic Van der Pol Oscillator II.

(i) Converged Value Function Surface. (ii) Converged Control Input Surface. The maximum con-trol required was modulated using the tuning parameterγ.

Figure 4 Converged Results for the Stochastic Hard Spring Duffing Oscillator.

The above system represents the hard-spring case (ε = +1.5) with an unstable open-loop equilibrium at (x, x) = (0, 0)and stable equilibria at (

√ε, 0) and (−

√ε, 0). The cost function to be minimized is the same as for the previous system,

and we use a linear stabilizing control given by: u(0) = −30x. Similar convergence characteristics as for the Van derPol oscillator were obtained and are shown in Figs. 4-6.

5.3 Four Dimensional System - Missile Pitch Control Autopilot

Finally, we consider a four-dimensional system that models the pitch motion control for a missile autopilot. Thismodel is the same as that considered in refrences27,28 with a noise term added to all kinetic equations:

q =My

Iy+ gqw1(t), My = Cm(α, δ)QSd (50)

α =cos2 αmU

Fz + q + gαw2(t), Fz = Cn(α, δ)Sd (51)

δ = −2ζωnδ + ω2n(δc − δ) + gδw3(t) (52)

10

Page 11: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

(i) Open Loop System Response for Eq.49. (ii) System Response to Initial Stabilizing Control, u(0) =−30x.

Figure 5 System Response of the Stochastic Duffing Oscillator I.

Response to u(0)

Response to uconverged

(i) Optimal Path Obtained, Shown Alongside Response tou(0).

(ii) Optimal Control Law Converged upon and u(0).

Figure 6 System Response of the Stochastic Duffing Oscillator II.

11

Page 12: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

(i) Converged Value Function Surface. (ii) Converged Control Input Surface.

Figure 7 Converged Results for the Missile Pitch Controller, Showing Various Sections of the Four-D State Space.

The control input in the above equations is δc(t), which is the commanded tail-fin deflection, while δ(t) denotes theactual tail-fin deflection. Pitch rate and angle of attack are denoted by q and α respectively. wq(t), wα(t) and wδ(t)are independent components of a 3 dimensional white noise process. The aerodynamic coefficient are given by thefollowing nonlinear functions:

Cm(α, δ) = b1α3 + b2α|α|+ b3α+ b4δ (53)

Cn(α, δ) = a1α3 + a2α|α|+ a3α+ a4δ (54)

The values of the various constants can be found in Beard et. al. (1998). The cost function to be minimized is givenby:

J(x0) = E

[∫ ∞

0

12(q − qss)2 + 25(α− αss)2 +

(Fz − Fzss

m

)2

+12‖u− δss‖2R

](55)

where the ‘ss’ subscripted quantities denote steady state values. We use the same starting stabilizing control as thatused in Beard et. al.:

u(0) = 0.08(q + qss) + 0.38(α+ αss) + 0.37Fz

m(56)

A complete basis of global quadratic shape functions was used to approximate the value function in the policy iterationalgorithm. Various sections of the converged value function and optimal control surface are shown in Fig.7. The optimaltrajectories are shown in Fig.7. Fig.8 shows the system response to the optimal control law obtained. Notice that thepitch rate settles to zero and the angle of attack acquires the desired steady state value.

5.4 Notes on Modal Analysis

It is easy to see that any constant function, ψ = c, is a stationary solution of the BK equation, and an eigenfunctionof the BK operator with trivial eigenvalue. It was mentioned in Section that the constant basis function, φ ≡ 1, isnot admissible while solving for the coefficients, ci. However, when considering the eigenvalue problem for the BKoperator, Lψ = λψ, it is imperative that the constant basis function be included in the set; because the stationarysolution of the BK equation is indeed the constant function. The modified BK operator (β > 0), however, doesnot admit a constant stationary solution and an eigenfunction with a trivial eigenvalue (Fig 9). We reiterate thatit is important while solving the eigenvalue problem to utilize a complete basis set in the approximation space (i.e.including the constant function) in order to recover the stationary eigenfunction of the BK operator. On the otherhand, it is possible to solve for the coefficients, ci, without using the constant basis function. Fig.9 shows the absolutevalues of the eigenvalues for both the Fokker-Planck and the Backward Kolmogorov operators.

12

Page 13: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

Open LoopResponse to uconverged

(i) Optimal Path Obtained, Shown Alongside Open LoopTrajectory.

(ii) Optimal Control Law Obtained.

Figure 8 System Response of the Missile Pitch Controller.

(i) Comparison of the Spectra of the BK and FP Operatorsfor the Van der Pol Oscillator.

(ii) The Modified BK Operator does not Contain the Triv-ial Eigenvalue in its Spectrum (β = 0.02).

Figure 9 Spectra of the BK, modified BK and FP Operators for the Van der Pol Oscillator

13

Page 14: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

6 Conclusion

In this paper, we have outlined a computational methodology for solving the nonlinear stochastic optimal control basedon the policy iteration algorithm and a finite dimensional approximation of the controlled Fokker-Planck-Kolomogorov/diffusion operator. The computational methodology was also tested on a number of test cases where it was shownto have satisfactory performance. The next step in our research will be to extend these methods to more complexstochastic systems such as jump diffusion processes and stochastic hybrid systems. We envisage using a hierarchicalhybrid methodology in order to solve the abovementioned problems wherein the methods outlined in this paper wouldform the lower, continuous level of the hierarchy.

REFERENCES

1. H. Risken, The Fokker Planck Equation: Methods of Solution and Applications, ser. Springer Series in Synergetics.

2. D. C. Polidori and J. L. Beck, “Approximate solutions for nonlinear vibration problems,” Prob. Engr. Mech.,vol. 11, pp. 179–185, 1996.

3. M. C. Wang and G. Uhlenbeck, “On the theory of brownian motion ii,” Reviews of Modern Physics, vol. 17, no.2-3, pp. 323–342, 1945.

4. A. H. Jazwinski, Stochastic Processes and Filtering Theory. New york, NY: Academic Press, 1970.

5. A. Doucet, J. F. G. deFrietas, N. J. Gordon, and eds., Sequential Monte Carlo Methods in Practice. New York:Springer-Verlag, 2001.

6. J. L. Junkins, P. Singla, T. D. Griffith, and T. Henderson, “Orthogonal global/local approximation in n-dimensions: Applications to input/output approximation,” in 6th International conference on Dynamics andControl of Systems and Structures in Space, Cinque-Terre, Italy, 2004.

7. J. L. Junkins and P. Singla, “Orthogonal global/local approximation in n-dimensions: Applications to in-put/output approximation,” in to appear in Computer Methods in Engg. and Science, Submitted.

8. M. Kumar, P. Singla, J. Junkins, and S. Chakravorty, “A multi-resolution meshless approach to steady stateuncertainty determination in nonlinear dynamical systems,” in 38th IEEE Southeastern Symposium on SystemsTheory, Cookeville, TN, Mar 5-7, 2006.

9. M. Kumar, P. Singla, S. Chakravorty, and J. L. Junkins, “The partition of unity finite element approach to thestationary fokker-planck equation,” in AIAA/AAS Astrodynamics Specialist Conference, Keystone, CO, USA,21-24 August, 2006, AIAA Paper 2006-6285.

10. S. Chakravorty, “A homotopic galerkin approach to the solution of the fokker-planck-kolmogorov equation,” inProceedings of the 2006 American Control Conference, 2006.

11. M. Kumar, S. Chakravorty, and J. L. Junkins, “An eigendecomposition approach for the transient fokker-planckequation,” in Proceedings of the 18th Engineering Mechanics Division Conference of the ASCE, Blacksburg, VA,2007.

12. ——, “A semianalytic meshless approach to the fokker-planck equation with application to hybrid systems,” inProc. IEEE Conf. Dec. and Control, vol. to appear, 2007.

13. R. E. Bellman, Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.

14. D. P. Bertsekas, Dynamic Programming and Optimal Control, vols I and II. Cambridge: Athena Scientific, 2000.

15. D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Boston, MA: Athena Scientific, 1996.

16. P. Werbos, “Approximate dynmaic programming for real-time control and neural modeling,” in Handbook ofIntelligent Control, White and Sofge, Eds. Van Nostrand Reinhold.

17. P. Marbach, Simulation based Optimization of Markov Reward Processes, PhD Thesis. Boston, MA: Mas-sachusetts Institute of Technology, 1999.

14

Page 15: Computational Nonlinear Stochastic Control based on the ... · Computational Nonlinear Stochastic Control based on the Fokker-Planck-Kolmogorov Equation Mrinal Kumar ∗S. Chakravorty

18. J. Baxter and P. Bartlett, “Infinite horizon policy-gradient estimation,” Journal of Artificial Intelligence Research,vol. 15, pp. 319–350, 2001.

19. R. S. S. et. al., “Policy gradient methods for reinforcement learning with function approximation,” in Proc. 1999Neural Information Proc. Sys., 1999.

20. M. Lagoudakis and R. Parr, “Least squares policy iteration,” Journal of Machine Learning Research, vol. 4, pp.1107–1149, 2003.

21. H. J. Kushner and P. Dupuis, Numerical Methods for Stochastic Control Problems in Continuous Time. Newyork, NY: Springer, 2001.

22. L. G. Crespo and J. Q. Sun, “Stochastic optimal control of nonlinear systems via short-time gaussian approxima-tion and cell mapping,” Nonlinear Dynamics, vol. 28, pp. 323–342, 2002.

23. ——, “Stochastic optimal control via bellman’s principle,” Automatica, vol. 39, pp. 2109–2114, 2003.

24. C. S. Hsu, Cell to Cell Mapping: Amethod for the Global Analysis of Nonlinear Systems. New York, NY:Springer-Verlag, 1987.

25. F. B. Hanson, “Techniques in computational stochastic dynamic programming,” in Control and Dynamical Sys-tems: Advances in Theory and Applications, C. T. Leondes, Ed., vol. 76. New York, NY: Academic Press,1996.

26. ——, “Computational dynamic programming on a vector multiprocessor,” IEEE Transactions on AutomaticControl, vol. 36, pp. 507–511, 1991.

27. R. Beard, G. Saridis, and J. Wen, “Galerkin approximation of the generalized hamilton-jacobi-bellman equation,”Automatica, vol. 33, pp. 2159–2177, 1997.

28. ——, “Approximate solutions to the time-invariant hamilton-jacobi-bellman equation,” Journal of Optim. Th.Appl., 1998.

29. F. L. L. et. al., Neural Network control of Robotic Manipulators and nonlinear systems. Taylor and Francis, 1999.

30. M. A. Kahlaf and F. L. Lewis, “Nearly optimal state feedback control of constrained nonlinear system using aneural network hjb,” Annual Reviews in Control, vol. 28, pp. 239–251, 2004.

31. M. Abu-Khalaf, J. Huang, and F. L. Lewis, Nonlinear H2/ H-∞ constrained Feedback control: a practical approachusing neural networks. New York: Springer Verlag, June 2006.

32. R. Padhi and S. N. Balakrishnan, “Proper orthogonal decomposition based optimal neurocontrol synthesis of achemical reactor process using approximate dynamic programming,” Neural Netw., vol. 16, no. 5-6, pp. 719–728,2003.

33. S. Ferrari, Algebraic and Adaptive Learning in Neural Control Systems, PhD thesis. Princeton University, 2002.

34. A. E. Bryson and Y. Ho, Applied Optimal Control. Washington, D.C.: Hemsiphere Publsihing Co., 1975.

35. I. M. Ross and F. Fahroo, “Pseudospectral knotting methods for solving nonsmooth optimal control problems,”AIAA J. of Guid. Cont. Dyn., vol. 27, pp. 397–405, 2004.

36. ——, “Issues in real-time computation of optimal control,” Mathematical and Computer Modeling, vol. 43, pp.1172–1188, 2006.

37. W. H. Fleming and H. Soner, Controlled Markov Processes and Viscosity Solutions. New york, NY: Springer,2001.

38. M. G. Crandall and P. L. Lions, “Viscosity solutions of hamilton-jacobi equations,” Transactions of the AmericanMathematical Society, vol. 277, pp. 1–42, 1983.

39. M. G. Crandall, H. Ishii, and P. L. Lions, “User’s guide to viscosity solutions of second order partial differentialequations,” Bulletin of the American Mathematical Society, vol. 27, pp. 1–67, 1992.

40. M. Puterman, Markov Decision Process. Hoboken, NJ: Wiley-Interscience, 1994.

15