athans workshop 10 07

Upload: fsaleheen9599

Post on 08-Jan-2016

218 views

Category:

Documents


0 download

DESCRIPTION

optimal control notes

TRANSCRIPT

  • Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington

    F.L. LewisMoncrief-ODonnell Endowed Chair

    Head, Controls & Sensors Group

    Talk available online at http://ARRI.uta.edu/acs

    Notes on Optimal Control

    Supported by :NSF - PAUL WERBOSARO RANDY ZACHERY

    Draguna Vrabie

  • Mike Athans

    BOOKSOptimal Control (McGraw Hill, 1966) Systems, Networks and Computation: Basic Concepts (McGraw Hill, 1972) Multivariable Methods (McGraw Hill, 1974)

    Third President of the IEEE Control Systems Society, 1972 to 1974

    PhD from Berkeley in 1961

    More than 40 Phd students- how many?

  • Awards

    American Automatic Control Council's Donald P. Eckman Award "for outstanding contributions to the field of automatic control" (first recipient of the award, 1964) American Society for Engineering Education's Frederick E. Terman Award as "the outstanding young electrical engineering educator" (first recipient, 1969) American Control Council's Education Award for "outstanding contributions and distinguished leadership in automatic control education" (second recipient, 1980) Fellow of the IEEE (1973) Fellow of the AAAS (1977) Distinguished Member of the IEEE Control Systems Society (1983) IEEE Control Systems Society's 1993 H.W. Bode Prize, including the delivery of the Bode Plenary Lecture at the 1993 IEEE Conference on Decision and Control American Automatic Control Council's Richard E. Bellman Control Heritage Award (1995) "In Recognition of a Distinguished Career in Automatic Control; As a Leader and Champion of Innovative Research; As a Contributor to Fundamental Knowledge in Optimal, Adaptive, Robust, Decentralized and Distributed Control; and as a Mentor to his Students"

    Athans & Falb, Optimal Control (McGraw Hill, 1966) First book on OC?

  • 1. Use Neural Networks as Function Approximators in Adaptive Control

    2. Optimal Adaptive Control gives new adaptive control algorithms

  • Two-layer feedforward static neural network (NN)

    (.)

    (.)

    (.)

    (.)

    x1

    x2

    y1

    y2

    VT WT

    inputs

    hidden layer

    outputs

    xn ym

    1

    2

    3

    L

    (.)

    (.)

    (.)

    Summation eqs Matrix eqs

    )( xVWy TT=

    +

    +=

    = =

    K

    ki

    n

    jkjkjiki wvxvwy

    10

    10

    Have the universal approximation propertyOvercome Barrons fundamental accuracy limitation of 1-layer NN

  • qdRobot System[ I]

    Robust ControlTerm

    q

    v(t)

    e

    PD Tracking Loop

    rKv

    Neural Network Robot Controller

    ^

    qd

    f(x)

    Nonlinear Inner Loop

    ..

    Feedforward Loop

    Universal Approximation Property

    Problem- Nonlinear in the NN weights sothat standard proof techniques do not work

    Feedback linearization

    NN universal basis property means no regression matrix is neededExtension of Adaptive Control to Nonlinear-in the parameters systems

  • Theorem 1 (NN Weight Tuning for Stability)

    Let the desired trajectory )(tqd and its derivatives be bounded. Let the initial tracking error bewithin a certain allowable set U . Let MZ be a known upper bound on the Frobenius norm of theunknown ideal weights Z . Take the control input as

    vrKxVW vTT += )( with rZZKtv MFZ )()( += .

    Let weight tuning be provided by

    WrFxrVFrFW TTT ' = , VrGrWGxV TT )'( =

    with any constant matrices 0,0 >=>= TT GGFF , and scalar tuning parameter 0> . Initialize the weight estimates as randomVW == ,0 .

    Then the filtered tracking error )(tr and NN weight estimates VW , are uniformly ultimately bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large controlgains vK . Backprop terms-

    Werbosrobustifying terms-e-mod, sigma mod, etc.

    Extension of Adaptive Control to nonlinear-in parameters systemsNo regression matrix needed

    Extra Jacobian terms from NLIP network

    NLIP

  • [ I]

    Robust ControlTerm vi(t)

    Tracking Loop

    rKr

    Nonlinear FB Linearization Loop

    F1(x)^qr =

    qrqr.e

    ee = .

    qd =qdqd.

    ..qd

    RobotSystem1/KB1 i

    F2(x)^

    Kid

    NN#1

    NN#2Backstepping Loop

    ue[ I]

    Robust ControlTerm vi(t)

    Tracking Loop

    rKrKr

    Nonlinear FB Linearization Loop

    F1(x)F^1(x)^

    Neural network backstepping controller for Flexible-Joint robot arm

    qr =qrqr.qr =qr =qrqr.qrqr.e

    ee = .e = .

    qd =qdqd.qd =qd =qdqd.qdqd.

    ..qd..qd

    RobotSystem1/KB1 i

    F2(x)F^2(x)^

    KKid

    NN#1

    NN#2Backstepping Loop

    ue

    Backstepping

    Advantages over traditional Backstepping- no regression functions needed

    Add an extra feedback loopTwo NN neededUse passivity to show stability

    Flexible & Vibratory Systems

  • MechanicalSystem

    Kv[ ]v

    reqd

    Estimateof NonlinearFunction

    w

    --

    D(u)u

    NN DeadzonePrecompensator

    I

    II

    ( )f x

    q

    dq NN in Feedforward Loop- Deadzone Compensation

    Acts like a 2-layer NNWith enhanced backprop tuning !

    little observer

    Actuator Nonlinearities - Deadzone, saturation, backlash

    iiiTTTTT

    iii WWrTkWrTkUuUWrwUTW )(')( 21 = WrSkrwUWUuUSW TTiii

    TT )()(' 1= Critic:Actor:

  • Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.

    Cellular Metabolism

    Permeability control of the cell membrane

    http://www.accessexcellence.org/RC/VL/GG/index.html

    Optimality in Biological Systems

  • Optimality in Control Systems DesignR. Kalman 1960

    Rocket Orbit Injection

    http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf

    FmmmF

    rwvv

    mF

    rrvw

    wr

    =+=

    +==

    cos

    sin22

    ObjectivesGet to orbit in minimum timeUse minimum fuel

    Dynamics

  • Adaptive Control is generally not Optimal

    Optimal Control is off-line, and needs to know the system dynamics to solve design eqs.

    We want ONLINE ADAPTIVE OPTIMAL Control

  • ),( uxfx = +==t

    T

    t

    dtRuuxQdtuxrtxV ))((),())((

    ),,(),(),(),(),(0 uxVxHuxruxf

    xVuxrx

    xVuxrV

    TT

    +

    =+

    =+= 0)0( =V

    +=

    += ),(),(min),(min0

    *

    )(

    *

    )(uxf

    xVuxrx

    xVuxr

    T

    tu

    T

    tu

    xVxgRtxh T

    = *

    12

    1* )())((

    dxdVggR

    dxdVxQf

    dxdV T

    TT *1

    *

    41

    *

    )(0

    +

    = 0)0( =V

    System

    Cost

    Hamiltonian

    Optimal cost

    Optimal control

    HJB equation

    Continuous-Time Optimal Control

    Bellman

    +=

    += ),(),(min),(min0

    )()(uxf

    xVuxrx

    xVuxr

    T

    tu

    T

    tu

    ,

    ,

    For a given control, the cost satisfies this eq.

    In LQR, this is a Lyapunov eq

    In LQR, this is a Riccati eq

  • Linear system, quadratic cost

    System:Utility:

    The cost is quadratic

    Optimal control (state feedback):

    HJB equation is the algebraic Riccati equation (ARE):

    PBPBRQPAPA TT 10 ++=

    1( ) ( ) ( )Tu t R B Px t Lx t= =

    BuAxx +=0,0;),( >+= QRRuuQxxuxr TT

    ==t

    T tPxtxduxrtxV )()(),())((

    Full system dynamics must be knownOff-line solution

  • ),,(),(),(0 uxVxHuxruxf

    xV T

    +

    =

    0 ( , ( )) ( , ( ))T

    kk k

    V f x h x r x h xx

    = + (0) 0kV =

    1121( ) ( )

    T kk

    Vh x R g xx

    +

    =

    CT Policy Iteration

    Convergence proved by Saridis 1979 if Lyapunov eq. solved exactly

    Beard & Saridis used complicated GalerkinIntegrals to solve Lyapunov eq.

    Abu Khalaf & Lewis used NN to approx. V for nonlinear systems and proved convergence

    RuuxQuxr T+= )(),(Utility

    Cost for any given u(t)

    Lyapunov equation

    Iterative solution

    Pick stabilizing initial control

    Find cost

    Update control

    Full system dynamics must be knownOff-line solution

    To avoid solving HJB equation

  • LQR Policy iteration = Kleinman algorithm

    1. For a given control policy solve for the cost:

    2. Improve policy:

    If started with a stabilizing control policy the matrix monotonically converges to the unique positive definite solution of the Riccati equation.

    Every iteration step will return a stabilizing controller. The system has to be known.

    xLu k=k

    Tk

    Tkkk

    Tk RLLCCAPPA +++=0

    11 = kTk PBRL

    kk BLAA =

    0L kP

    Kleinman 1968

    Lyapunov eq.

  • Policy Iteration Solution

    ( ) T TRic P A P PA Q PBB P + +

    1 1( ) ( ) 0T T T T

    i i i i i iA BB P P P A BB P PBB P Q+ + + + + =

    ( ) 11 ( ), 0,1,ii i P iP P Ric Ric P i+ = =

    Policy iteration

    This is in fact a Newtons Method

    Then, Policy Iteration is

    Frechet Derivative

    )()()(' iTT

    iT

    P PBBAPPPBBAPRic i +

  • Policy Iterations without Lyapunov Equations

    Dynamic programming built on Bellmans optimality principle alternative

    form for CT Systems [Athans & Falb 1966, Lewis & Syrmos 1995]

    * *( )

    ( ( )) min ( ( ), ( )) ( ( ))t t

    utt t t

    V x t r x u d V x t t

    +

    < +

    = + +

    )()()()())(),(( RuuQxxuxr TT +=

    Draguna Vrabie

    f(x) and g(x) do not appear

  • Solving for the cost Our approach

    ))(()())(( TtxVdtRuuQxxtxVTt

    t

    TT +++= +Lxu =

    Draguna Vrabie

    f(x) and g(x) do not appear

    For a given control

    The cost satisfies

    )()()()()( TtPxTtxdtRuuQxxtPxtx TTt

    t

    TTT ++++= +

    PBRL T1=

    LQR case

    Optimal gain is

  • 1 1( ( )) ( ) ( ( ))t T

    T kT kk k

    t

    V x t x Qx u Ru dt V x t T+

    + += + + +1. Policy iteration

    11

    1 ++ = kTk PBRLInitial stabilizing control is needed

    Draguna Vrabie

    For LQR case

    Cost update

    Control gain update

    A and B do not appear

    B needed for control update

    ( ) ( )k ku t L x t=

    1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tx t P x t x Q L RL x d x t T P x t T

    ++ += + + + +

    Solving for the cost Our approach

  • 1 1( ) ( ) 0T T T T

    i i i i i iA B B P P P A B B P P B B P Q+ + + + + =Theorem D. Vrabie

    This algorithm converges and is equivalent to Kleinmans Algorithm( ) 11 ( ) , 0 , 1 ,ii i P iP P R i c R i c P i+ = =

    1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tx t P x t x Q L RL x d x t T P x t T

    ++ += + + + +

    1 1 11 1( ) ( ) 0

    T T T Tk k k k k kA BR B P P P A BR B P P BR B P Q

    + + + + + =

    Lemma 1

    is equivalent to

    xQRKKxxAPPAxdt

    xPxdi

    Ti

    Tiii

    Ti

    TiT

    )()()( +=+=

    Proof:

    ( ) ( ) ( ) ( ) ( ) ( )t T t T

    T T T T Ti i i i i

    t t

    x Q K RK xd d x Px x t Px t x t T Px t T+ +

    + = = + +

    1 Tk kL R B P

    =Solves Lyapunov equation without knowing A or B

    Only B is needed

  • 1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tx t P x t x Q L RL x d x t T P x t T

    ++ += + + + +

    Critic update

    11

    1 ++ = kTk PBRL

    1kp +Now use RLS along the trajectory to get new weights Unpack weights into the matrix Pk+1Then find updated FB gain

    ( )( ) ( )Tvec ABC C A vec B=

    [ ]1 1( ) ( ) ( ) ( ) ( ) ( )t T

    T T T Tk k i i

    tp t p x t x t T x Q K RK x d

    ++ + + = +

    1 1( ) ( ) ( ) ( ) ( )t T

    T T T Tk i i k

    tp x t x Q K RK x d p x t T

    ++ += + + +

    is the quadratic basis set

    Use Kronecker product

    To set this up as ( ) ( ) ( )x t x t x t=

    ( , )t t T + Reinforcement on time interval [t, t+T]

    Algorithm Implementation

    Quadratic regression vector

  • 1. Select initial control policy

    2. Find associated cost

    3. Improve control 11 1

    Tk kL R B P

    + +=

    [ ]1 ( ) ( ) ( ) ( ) ( ) ( , )t T

    T T Tk k k

    tp x t x t T x Q L RL x d t t T

    ++ + = + = +

    Solves Lyapunov eq. without knowing dynamics

    t t+T

    observe x(t)

    observe x(t+T)

    apply uk(t)=Lkx(t)

    observe cost integral

    update P

    do RLS until convergence to Pk+1

    update control gain to Lk+1

    Measure cost increment (reinforcement)by adding V as a state. Then

    ( )T k T kV x Qx u Ru= +

    A is not needed anywhere

  • Algorithm Implementation

    The Critic update

    can be setup as

    Evaluating for n(n+1)/2 trajectory points, one can setup a least squares problem to solve

    1 1( ) ( ) ( )( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tx t P x t x Q L RL x d x t T P x t T

    ++ += + + + +

    11 ( )

    Tip XX XY

    + = 1 2[ ( ) ( ) ... ( )]NX t t t =1 2[ ( , ) ( , ) ... ( , )]N Ti i iY d x K d x K d x K=

    ( ( ), )kd x t L

    Or use batch Least-Squares solution along the trajectory

    [ ]1 1( ) ( ) ( ) ( ) ( ) ( ) ( ( ), )t T

    T T T Ti i k k k

    tp t p x t x t T x Q L RL x d d x t L

    ++ + + = +

    is the quadratic basis set( ) ( ) ( )x t x t x t=

  • Direct Optimal Adaptive Controller

    A hybrid continuous/discrete dynamic controller whose internal state is the observed cost over the interval

    Draguna Vrabie

    u

    VZOH T

    0; xBuAxx +=System

    RuuQxxV TT +=

    Critic

    TT

    xu

    VZOH T

    0; xBuAxx +=System

    RuuQxxV TT +=

    Critic

    Actor

    TT

    x

    FB Gain L

    Lkmultiplier

    DynamicControlSystem

  • )()( txLtu kk =

    Continuous-time control with discrete gain updates

    t

    Lk

    k0 1 2 3 4 5

    Sample periods need not be the sameThey can be selected on-line in real time

    Gain update (Policy)

    Control

  • 1( ) ( ) ( )( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tx t P x t x Q L RL x d x t T P x t T

    ++ = + + + +

    1( ( )) ( ) ( ( ))t T

    T kT kk k

    t

    V x t x Qx u Ru dt V x t T+

    + = + + +2. CT ADP Greedy iteration

    )()( txLtu kk =

    11

    1 ++ = kTk PBRL

    No initial stabilizing control needed

    Draguna Vrabie

    LQR

    Control policy

    Cost update

    Control gain update A and B do not appear

    B needed for control update

    Direct Optimal Adaptive Control for Partially Unknown CT Systems

  • ( )( ) ( )Tvec ABC C A vec B=

    1( ) ( ) ( )( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tx t P x t x Q L RL x d x t T P x t T

    ++ = + + + +

    1 ( ) ( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tp x t x Q L RL x d p x t T

    ++ = + + +

    The critic update

    Use Kronecker product

    To set this up as

    1kp +Now use RLS along the trajectory to get new weights

    Unpack weights into the matrix Pk+1Then find updated FB gain 1

    11 ++ = kTk PBRL

    is the quadratic basis set( ) ( ) ( )x t x t x t=

    Algorithm Implementation

    Previous weightsRegression vector

  • 1. Select control policy

    2. Find associated cost

    3. Improve control 11 1

    Tk kL R B P

    + +=

    1 ( ) ( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tp x t x Q L RL x d p x t T

    ++ = + + +

    Solves for cost update without knowing dynamics

    t t+T

    observe x(t)

    observe x(t+T)

    apply u

    observe cost integral

    update P

    do RLS until convergence to Pk+1

    update control gain to Lk+1

    Measure cost increment by adding V as a state. Then ( )T i T iV x Qx u Ru= +

    No initial stabilizing control needed

    A is not needed anywhere

  • Direct Optimal Adaptive Controller

    A hybrid continuous/discrete dynamic controller whose internal state is the observed value over the interval

    Draguna Vrabie

    u

    VZOH T

    0; xBuAxx +=System

    RuuQxxV TT +=

    Critic

    TT

    xu

    VZOH T

    0; xBuAxx +=System

    RuuQxxV TT +=

    Critic

    Actor

    TT

    x

    FB Gain L

    LkDynamicControlSystem

    Has a different critic cost updateNo initial stabilizing gain needed

  • Greedy update is equivalent to

    Analysis of the algorithm

    For a given control policy

    ( ){ }1 0( ( )) ( ( )), 0t T TT k kk ktV x t x Qx u Ru d V x t T V++ = + + + =

    ( ) ( )k ku t L x t= kTk PBRL 1=k

    Tk PBBRAA

    1=

    1 ( )T Tk k k k

    t TA t A t A T A TT

    k k k kt

    P e Q L RL e dt e P e+

    + = + +

    with

    a strange pseudo-discretized RE

    Draguna Vrabie

    ( ) APBBPBPBPAQAPAP kTKTkkTkTk 11 + ++=c.f. DT RE ( ) kKTkTkkkTkk LBPBPLQAPAP +++=+1

  • ++=+T

    tAk

    Tkkk

    tAkk dteRLLQAPAPePP k

    Tk

    01 )(

    Lemma 2. CT HDP is equivalent to

    kT

    k PBBRAA1=

    ADP solves the CT ARE without knowledge of the system dynamics A

    Analysis of the algorithmDraguna Vrabie

    This extra term means the initial Control action need not be stabilizing

    When ADP converges, the resulting P satisfies the Continuous-Time ARE !!

  • Direct OPTIMAL ADAPTIVE CONTROL

    Solve the Riccati Equation WITHOUT knowing the plant dynamics

    Model-free ADP

    Works for Nonlinear Systemsa neural network is used to approximate the cost

    Robustness?Comparison with adaptive control methods?

  • Policy Evaluation Critic updateLet K be any state feedback gain for the system (1). One can measure the associated cost over the infinite time horizon

    where is an initial infinite horizon cost to go.

    ( , ( )) ( ) ( ) ( ) ( , ( ))t T

    T T

    tV t x t x Q K RK x d W t T x t T

    += + + + +

    ( , ( ))W t T x t T+ +

    What to do about the tail issues in Receding Horizon Control

  • )()( txLtu kk =

    Continuous-time control with discrete gain updates

    t

    Lk

    k0 1 2 3 4 5

    Sample periods need not be the same

    Gain update (Policy)

    Control

  • 0 0.5 1 1.5 2-0.3

    -0.2

    -0.1

    0Control signal

    Time (s)

    0 0.5 1 1.5 2-0.4

    -0.2

    0Controller parameters

    Time (s)

    0 0.5 1 1.5 20

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4System states

    Time (s)

    0 1 2 3 4 5 60

    0.05

    0.1

    0.15

    0.2

    Critic parameters

    Time (s)

    P(1,1)P(1,2)P(2,2)P(1,1) - optimalP(1,2) - optimalP(2,2) - optimal

    Simulations on: F-16 autopilotLoad frequency control for power system

    A matrix not needed

    Converge to SS Riccati equation soln

  • Nonlinear Case Works Too

    ( ) ( )x f x g x u= +

    ( ( )) ( , ) ( ( ) )Tt t

    V x t r x u dt Q x u Ru dt

    = = +

    10 ( , ( )) ( , ( ))T

    kk k

    V f x h x r x h xx

    + = +

    1 1121 1( ) ( )

    T kk k

    Vu h x R g xx

    ++ +

    = =

    ( )1 1( ( )) ( ) ( ( ))t T Tk k k kt

    V x t Q x u Ru dt V x t T+

    + += + + +1 1

    1 1( ) ( )T k

    k kVu h x R g x

    x +

    + += =

    Policy Iteration

    Prove this is equivalent to:

    which is known to converge

    f(x) not needed

    f(x) needed

  • ( )1 1( ( )) ( ) ( ( ))t T Tk k k kt

    V x t Q x u Ru dt V x t T+

    + += + + +Approximate value by Neural Network ( )TV W x=

    ( )1 1( ( )) ( ) ( ( ))t T Tk k k kt

    W x t Q x u Ru dt W x t T +

    + += + + +[ ] ( )1 ( ( )) ( ( )) ( )t T Tk k k

    t

    W x t x t T Q x u Ru dt +

    + + = +regression vector Reinforcement on time interval [t, t+T]

    1kW +Now use RLS along the trajectory to get new weights

    Then find updated FB1 11 1

    2 21 1 1( ( ))( ) ( ) ( )

    ( )

    TT Tk

    k k kV x tu h x R g x R g x Wx x t

    + + +

    = = =

    Algorithm Implementation

  • ),( uxfx = +==t

    T

    t

    dtRuuxQdtuxrtxV ))((),())((

    ),,(),(),(),(),(0 uxVxHuxruxf

    xVuxrx

    xVuxrV

    TT

    +

    =+

    =+= 0)0( =V

    +=

    += ),(),(min),(min0

    *

    )(

    *

    )(uxf

    xVuxrx

    xVuxr

    T

    tu

    T

    tu

    xVxgRtxh T

    = *

    12

    1* )())((

    dxdVggR

    dxdVxQf

    dxdV T

    TT *1

    *

    41

    *

    )(0

    +

    = 0)0( =V

    System

    Cost

    Hamiltonian

    Optimal cost

    Optimal control

    HJB equation

    Continuous-Time Optimal Control

    Bellman

    +=

    += ),(),(min),(min0

    )()(uxf

    xVuxrx

    xVuxr

    T

    tu

    T

    tu

    )())(,()( 1++= khkkkh xVxhxrxV

    c.f. DT value recursion, where f(), g() do not appear