optimal lineal quadratic control

13
 Optimal linear-quadratic control Martin Ellison 1 Motivation The lectures so far have described a general method - value function itera- tions - for solving dynamic progra mming problems. How ever , one problem alluded to at the end of the last lecture was that the method su ff ers from the “curs e of dimens ionali ty”. If the num ber of state va riables is large then there are many arguments in the value function and it becomes computationally ve ry inten siv e to iterate the v alue function. In pra ctic e, a state space with dimension about four or  ve already tests the limits of the power of the cur- rent gener ation of computers. In this lecture, we examine a particular class of dynamic programming problems that can be solv ed relatively easily . W e focus on problems of linear-quadratic control, in which the payoff  function is quadratic and the transition equation is linear. Man y standard problems in economics can be cast in such a linear-q uadra tic framew ork. W e will sho w how a combination of analytical and numerical analysis can be used to derive the solution to the linear-quadratic problem. 1

Upload: jeanpierre

Post on 04-Nov-2015

214 views

Category:

Documents


1 download

DESCRIPTION

It is very helpful for you

TRANSCRIPT

  • Optimal linear-quadratic control

    Martin Ellison

    1 Motivation

    The lectures so far have described a general method - value function itera-

    tions - for solving dynamic programming problems. However, one problem

    alluded to at the end of the last lecture was that the method suers from the

    curse of dimensionality. If the number of state variables is large then there

    are many arguments in the value function and it becomes computationally

    very intensive to iterate the value function. In practice, a state space with

    dimension about four or five already tests the limits of the power of the cur-

    rent generation of computers. In this lecture, we examine a particular class

    of dynamic programming problems that can be solved relatively easily. We

    focus on problems of linear-quadratic control, in which the payo function is

    quadratic and the transition equation is linear. Many standard problems in

    economics can be cast in such a linear-quadratic framework. We will show

    how a combination of analytical and numerical analysis can be used to derive

    the solution to the linear-quadratic problem.

    1

  • 2 Key reading

    The formal analysis for this lecture is taken from Dynamic Macroeconomic

    Theory by Tom Sargent, Harvard University Press, 1987.

    3 Other reading

    Optimal linear-quadratic control is discussed in most graduate macroeco-

    nomics textbooks, e.g. chapter 4 of Recursive Macroeconomic Theory, 2nd

    ed by Lars Ljungqvist and Tom Sargent, MIT Press, 2000. The concepts are

    taken from the engineering theory of optimal control so more sophisticated

    treatments can be found in books such as Analysis and Control of Dynamic

    Economic Systems by Gregory Chow, 1975. Gauss codes for matrix Riccati

    equation iterations in a dynamic general equilibrium context are available

    from Morten Ravns homepage at http://faculty.london.edu/mravn/

    4 Linear-quadratic control

    The general framework we will analyse is one in which the agent chooses a

    vector of controls to influence a series of state variables . We do not limit

    the dimension of either of these vectors, although it is natural to consider

    cases where the number of state variables exceeds the number of controls,

    otherwise it may well be that there is a very simple trivial solution which

    controls the states perfectly. In the context of linear-quadratic control, we

    assume that the transition equation governing the evolution of state is linear

    linear in past values of the state variables and linear in current values of the

    control variables. We allow the problem to be stochastic by including random

    shocks (with variance-covariance matrix ) to the state variables. Thepayo function is assumed to be quadratic in the state and control variables,

    2

  • giving quadratic forms in the objective. The fully-specified linear-quadratic

    control problem is specified below. The symmetric matrices and are the

    weights of state and control variables in the payo function. Matrices and

    govern the linear evolution of state variables in the transition equation.

    min{}

    P=0

    [0 + 0]

    +1 = + +

    In dynamic programming form, the value function is defined over the

    state variables .

    () = min[0 +

    0 + ( + + )] (1)

    It is certainly possible for us to proceed as before by discretising the

    state space for the value function and applying value function iteration to

    converge to the optimal policy. However, such a procedure is computationally

    very intensive and unnecessary in the special linear-quadratic case. Instead,

    we will use a dierent approach which combines analytical and numerical

    methods. The key to the method is that we know the general form of the

    policy and value functions for linear-quadratic control problems. Armed

    with this knowledge, it is much easier to proceed. We begin by postulating

    a quadratic form for the value function, in which is an idempotent matrix

    so 0 = .

    () = 0 +

    We proceed by substituting this form (with as yet undetermined matrices

    and ) into the value function (1). For convenience of notation, we drop

    the time subscripts. In all cases, and refer to time dated variables.

    3

  • () = min

    0+ 0+ (++ )0 (++ ) +

    Expanding the quadratic terms in brackets, while remembering that ()0 =

    00 gives

    () = min

    0+ 0

    +

    00+ 00+ 00

    +00+ 00+ 00

    +0+ 0+ 0

    +

    The expected values of the stochastic shocks is zero so terms of the form

    00 00 0 and 0 drop out. We are left with

    () = min

    "0+ 0+

    00+ 00

    +00+ 00+ 0

    !+

    #(2)

    The first order condition with respect to can be used to derive optimal

    policy. Note that 0 = 2,

    = 0and

    0 = .

    ()

    = 2+ 20+ 20 = 0

    Solving in terms of implies

    = (+ 0)10

    Or, more succinctly,

    =

    = (+ 0)10

    4

  • Several things are worthy of note at this stage. Firstly, optimal control

    requires the control vector to react linearly to the state variables. We have yet

    to confirm that this implies a quadratic value function as first postulated, but

    it already suggests that the policy function has a very simple form. Secondly,

    the coecient matrix in the policy function is a non-linear function of

    the fundamental matrices and the matrix in the postulated value

    function. We therefore can approach the problem as one of determining

    or . Our choice is to calculate , and then calculated the implied , but

    other techniques take the opposite approach.

    Economically, the policy reaction function is interesting because it is in-

    dependent of the stochastic shocks . This is because certainty equivalence

    holds in a linear-quadratic framework. There is no eect on policy, unless

    shocks enter multiplicatively or payos are not quadratic.

    We continue next to demonstrate that the linear policy function (derived

    from a postulated quadratic value function) does actually imply a quadratic

    value function. In the process, we will be able to determine the two matrices

    and . To do this, we substitute the policy function = back intothe value function (2). Note that 0 00 is a scalar and so equal to00.

    0+ =

    0+ 0 0

    +

    00 200

    +0 00

    !+ 0+

    Comparing coecients on constant terms,

    = 0+

    We simplify this equation by applying the result 0 = (0) =

    (0) = ().

    5

  • =

    1 ()

    This equation shows how the additive uncertainty caused by the stochas-

    tic element does have an eect on the value function, but that this eect

    is limited to the constant term, which is independent of policy. Hence, cer-

    tainty equivalence holds in this respect. Comparing coecients on the terms

    quadratic in ,

    = + 0 + (0 20 + 00 )

    Rearranging,

    = + 0 20 + 0(+ 0)

    We know that optimal policy defines as (+0)10. Hence,

    we have

    = + 0 20(+ 0)10

    +0((+ 0)1)0(+ 0)(+ 0)10

    Using the fact that (1)0 = ( 0)1 and (+ 0)0 = (+ 0),

    this reduces to

    = + 0 20(+ 0)1 0

    This equation confirms that a linear policy function does imply a quadratic

    value function. It is often known as the algebraic matrix Riccati equation.

    At present, it implicitly defines that matrix in the value function in terms

    of the structural matrices and . The matrix Riccati equation is as

    far as we can go analytically in linear-quadratic control. It does define as a

    function of and , but the relationship is not linear and potentially is

    6

  • highly non-linear. Fortunately, a relatively simple iterative technique based

    on a matrix Riccati dierence equation can be applied. Instead of trying to

    solve the Riccati equation directly, we start from an initial guess of the ma-

    trix in the value function. The initial guess is updated to +1 according

    to

    +1 = + 0 20(+

    0)

    10

    This equation is iterated until convergence, which is guaranteed to unique-

    ness under very weak conditions. Specifically, having eigenvalues in of

    modulus less than unity is a sucient condition. In fact, even explosive sys-

    tems with eigenvalues grater than one in absolute value can be handled if

    some other weak conditions hold.

    Iteration of the matrix Riccati equation is directly analogous to the value

    function iterations we discussed in previous lectures. In fact, what we are

    doing is actually to iterate over the value function, with each successive

    matrix equivalent to our earlier iterations over . Once has converged, it

    is a simple matter to calculate in the optimal policy function.

    5 Numerical application

    To illustrate the practicalities of matrix Riccati dierence equation iterations,

    we discuss Matlab code to solve a simple example of linear-quadratic control.

    Our model is one in which a central bank is trying to simultaneously control

    inflation and output by choosing the interest rate .The instantaneous

    payo function for the central bank is assumed to be quadratic in inflation,

    output and the interest rate.

    L = 2 + 2 + 012

    7

  • We assume that the central bank places equal weight on inflation and

    output deviations from target (normalised to zero for convenience) and a

    smaller weight on deviations in the interest rate from target. The objective

    of the central bank is to minimise the present discounted value of expected

    losses, with discounting at the rate . The structure of the economy is given

    by two equations.

    +1 = 075 05 + +1 = 025 05 +

    It is not intended that these equations are to be considered a serious

    representation of the structure of the economy. Rather, the purpose is to

    illustrate our technique. The first equation determines inflation, which is

    assumed to be highly persistent and negatively correlated with interest rates.

    The timing is such that current interest rate decisions only aect inflation

    with a lag - a timing convention favoured by Athanasios Orphanides amongst

    others. The second equation determines output in a similar fashion. High

    interest rates depress output but output itself is not as persistent as inflation.

    The timing convention remains the same so interest rate decision only aect

    output with a lag. Both inflation and output are subject to (potentially

    correlated) random disturbances in the form of shocks and . The full

    minimisation problem is

    min{}

    P=0

    2 +

    2 + 01

    2

    +1 = 075 05 + +1 = 025 05 +

    8

  • The general form of optimal linear-quadratic control is

    min{}

    P=0

    [0 + 0]

    +1 = + +

    To cast our model in this general form, we define state variables as ( )

    0, the control variables as = , and the disturbances as ( )0. The matrices and are given by

    =

    1 0

    0 1

    ! = 01 =

    075 0

    0 025

    ! =

    0505

    !

    The theory discussed in the previous section implies that all we need to

    do is iterate the matrix Riccati equation to find , the calculate the policy

    reaction coecients . The equations we will need are therefore

    +1 = + 0 20(+

    0)

    10

    = (+ 0)10

    The Matlab code to solve the optimal linear-quadratic control problem

    is discussed below. Firstly, a new program is started by clearing the screen

    and the discount factor is defined.

    CLEAR;

    beta=0.99;

    The matrices and are first defined to be of the correct dimension

    and the non-zero elements are set.

    9

  • Q=zeros(1,1);

    R=zeros(2,2);

    A=zeros(2,2);

    B=zeros(2,1);

    Q(1,1)=0.1;

    R(1,1)=1;

    R(1,2)=0;

    R(2,1)=0;

    R(2,2)=1;

    A(1,1)=0.75;

    A(1,2)=0;

    A(2,1)=0;

    A(2,2)=0.25;

    B(1,1)=-0.5;

    B(2,1)=-0.5;

    The next section initialises the matrix Riccati equation iterations The

    variable is used to measure the largest absolute in the elements of be-

    tween successive iterations. The variable is simply a count of how many

    iterations have been carried out. The initial guess of the matrix is con-

    tained in the matrix 0. As initial values, we use

    0 =

    0000001 0

    0 0000001

    !These starting values are used rather than zero because, if = 0 and 0

    is zero then the matrix +0 in the Riccati equation is not invertible.

    In our example, 6= 0 and we could just as easily used zeros as startingvalues. In practice, the algorithm is not sensitive to starting values in the

    vast majority of cases.

    10

  • d=1;

    i=0;

    P0=-0.000001*eye(2);

    Begin matrix Riccati equation iterations. We continue iterations until

    the maximum absolute dierence in the elements of between iterations

    is less than 0.0000000001. The new value +1 is stored in the matrix 1.

    After each iteration, the new value 1 is compared to the old value 0. The

    dierence is contained in , from which the maximum absolute value is

    extracted into . If is not suciently small then the initial guess 0 is

    updated and iterations continue. For each iteration, the iteration number

    and maximum absolute deviation are collected in and respectively in

    order to be printed at the end.

    WHILE d0.0000000001

    P1=R+beta*AP0*A-(beta*A*P0*B)*(invpd(Q + beta*B*P0*B))

    *(beta*B*P0*A);

    Pd=P1-P0;

    d=MAX(ABS(Pd));

    d=MAX(d);

    P0=P1;

    i=i+1;

    END;

    The matrix Riccati equation iterations are now complete. The policy

    function matrix is calculated from the final iteration of the matrix.

    Both policy function matrix and value function matrix are printed in

    the command window.

    P=P0;

    F=-inv(Q+beta*B*P*B)*(beta*B*P*A);

    11

  • ID=[I(2:length(I)) D(2:length(I))];

    disp( i d);

    disp(ID);

    disp( SOLUTIONS);

    disp(F);

    disp(F);

    disp(P);

    disp(P);

    The output of the computer code is as follows

    i d

    1.00000000000000 1.00000093812485

    2.00000000000000 0.32523416362466

    3.00000000000000 0.08055819178924

    4.00000000000000 0.01887570616174

    5.00000000000000 0.00433868656498

    6.00000000000000 0.00099273060023

    7.00000000000000 0.00022690780983

    8.00000000000000 0.00005185174817

    9.00000000000000 0.00001184823275

    10.00000000000000 0.00000270731205

    11.00000000000000 0.00000061861695

    12.00000000000000 0.00000014135300

    13.00000000000000 0.00000003229893

    14.00000000000000 0.00000000738025

    15.00000000000000 0.00000000168638

    16.00000000000000 0.00000000038533

    17.00000000000000 0.00000000008805

    12

  • SOLUTIONS

    F

    0.74495417123607 0.17590987848800

    P

    1.43029303877617 -0.10618330436474

    -0.10618330436474 1.04418992871296

    As can be seen from the low number of iterations, the matrix Riccati

    equation iterations converge quickly. Returning to the context of our nu-

    merical model, the results imply policy and value functions of the following

    form.

    = 0745 + 0176

    ( ) = 1432 + 104

    2 2 011

    According to the policy function, the interest rate needs to rise whenever

    inflation or output is above target. The result is intuitively appealing, with

    the central bank deflating the economy when inflation and/or output is too

    high. The larger reaction to inflation than output is due to our assumption

    that inflation is more persistent than output. Inflation is intrinsically more

    problematic in the model since, if inflation deviates from target in the current

    period then the deviation is likely to persist to the next period.

    The value function can similarly be interpreted. The coecient on the

    square of inflation exceeds that on the square of output precisely because

    the higher persistence of inflation makes it more problematic. The nega-

    tive coecient on the cross-product of inflation and output reflects the fact

    that it is easier to control inflation and output when they are deviating from

    target in the same direction. A rise in the interest rate depresses both infla-

    tion and output, so if inflation is above target and output below target (i.e.

    stagflation) then it is very dicult to stabilise the economy.

    13