solution of macroeconometric models

Solution ofMacroeconometric Models

Giorgio Pauletto

November 1995Department of Econometrics

University of Geneva

Contents

1 Introduction 1

2 A Review of Solution Techniques 4

2.1 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Computational Complexity . . . . . . . . . . . . . . . . . 8

2.1.3 Practical Implementation . . . . . . . . . . . . . . . . . . 8

2.2 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 8



2.3 Direct Methods for Sparse Matrices . . . . . . . . . . . . . . . . . 10

2.3.1 Data Structures and Storage Schemes . . . . . . . . . . . 11

2.3.2 Fill-in in Sparse LU . . . . . . . . . . . . . . . . . . . . . 13



2.4 Stationary Iterative Methods . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.2 Gauss-Seidel Method . . . . . . . . . . . . . . . . . . . . . 15

2.4.3 Successive Overrelaxation Method . . . . . . . . . . . . . 16

2.4.4 Fast Gauss-Seidel Method . . . . . . . . . . . . . . . . . . 17

2.4.5 Block Iterative Methods . . . . . . . . . . . . . . . . . . . 17

2.4.6 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 19


2.5 Nonstationary Iterative Methods . . . . . . . . . . . . . . . . . . 21

2.5.1 Conjugate Gradient . . . . . . . . . . . . . . . . . . . . . 21

2.5.2 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5.3 Conjugate Gradient Normal Equations . . . . . . . . . . . 24

CONTENTS ii

2.5.4 Generalized Minimal Residual . . . . . . . . . . . . . . . . 25

2.5.5 BiConjugate Gradient Method . . . . . . . . . . . . . . . 27

2.5.6 BiConjugate Gradient Stabilized Method . . . . . . . . . 28

2.5.7 Implementation of Nonstationary Iterative Methods . . . 29

2.6 Newton Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 29


2.6.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7 Finite Difference Newton Method . . . . . . . . . . . . . . . . . . 32

2.7.1 Convergence of the Finite Difference Newton Method . . 33

2.8 Simplified Newton Method . . . . . . . . . . . . . . . . . . . . . . 34

2.8.1 Convergence of the Simplified Newton Method . . . . . . 35

2.9 Quasi-Newton Methods . . . . . . . . . . . . . . . . . . . . . . . 35

2.10 Nonlinear First-order Methods . . . . . . . . . . . . . . . . . . . 37

2.10.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.11 Solution by Minimization . . . . . . . . . . . . . . . . . . . . . . 39

2.12 Globally Convergent Methods . . . . . . . . . . . . . . . . . . . . 41

2.12.1 Line-search . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.12.2 Model-trust Region . . . . . . . . . . . . . . . . . . . . . . 43

2.13 Stopping Criteria and Scaling . . . . . . . . . . . . . . . . . . . . 44

3 Solution of Large Macroeconometric Models 46

3.1 Blocktriangular Decomposition of the Jacobian Matrix . . . . . . 47

3.2 Orderings of the Jacobian Matrix . . . . . . . . . . . . . . . . . . 48

3.2.1 The Logical Framework of the Algorithm . . . . . . . . . 50

3.2.2 Practical Considerations . . . . . . . . . . . . . . . . . . . 56

3.3 Point Methods versus Block Methods . . . . . . . . . . . . . . . . 56

3.3.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3.2 Discussion of the Block Method . . . . . . . . . . . . . . . 57

3.3.3 Ordering and Convergence for First-order Iterations . . . 59

3.4 Essential Feedback Vertex Sets and the Newton Method . . . . . 61

4 Model Simulation on Parallel Computers 62

4.1 Introduction to Parallel Computing . . . . . . . . . . . . . . . . . 62

4.1.1 A Taxonomy for Parallel Computers . . . . . . . . . . . . 63

4.1.2 Communication Tasks . . . . . . . . . . . . . . . . . . . . 67

4.1.3 Synchronization Issues . . . . . . . . . . . . . . . . . . . . 69

CONTENTS iii

4.1.4 Speedup and Efficiency of an Algorithm . . . . . . . . . . 70

4.2 Model Simulation Experiences . . . . . . . . . . . . . . . . . . . . 71

4.2.1 Econometric Models and Solution Algorithms . . . . . . . 71

4.2.2 Parallelization Potential for Solution Algorithms . . . . . 73

4.2.3 Practical Results . . . . . . . . . . . . . . . . . . . . . . . 76

5 Rational Expectations Models 82

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.1.1 Formulation of RE Models . . . . . . . . . . . . . . . . . . 85

5.1.2 Uniqueness and Stability Issues . . . . . . . . . . . . . . . 86

5.2 The Model MULTIMOD . . . . . . . . . . . . . . . . . . . . . . . 89

5.2.1 Overview of the Model . . . . . . . . . . . . . . . . . . . . 89

5.2.2 Equations of a Country Model . . . . . . . . . . . . . . . 90

5.2.3 Structure of the Complete Model . . . . . . . . . . . . . . 92

5.3 Solution Techniques for RE Models . . . . . . . . . . . . . . . . . 92

5.3.1 Extended Path . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3.2 Stacked-time Approach . . . . . . . . . . . . . . . . . . . 94

5.3.3 Block Iterative Methods . . . . . . . . . . . . . . . . . . . 97

5.3.4 Newton Methods . . . . . . . . . . . . . . . . . . . . . . . 107

A Appendix 122

A.1 Finite Precision Arithmetic . . . . . . . . . . . . . . . . . . . . . 122

A.2 Condition of a Problem . . . . . . . . . . . . . . . . . . . . . . . 123

A.3 Complexity of Algorithms . . . . . . . . . . . . . . . . . . . . . . 125

List of Tables

4.1 Complexity of communication tasks on a linear array and a hy-percube with p processors. . . . . . . . . . . . . . . . . . . . . . 69

4.2 Execution times of Gauss-Seidel and Jacobi algorithms. . . . . . 77

4.3 Execution time on CM2 and Sun ELC. . . . . . . . . . . . . . . . 80

4.4 Execution time on Sun ELC and CM2 for the Newton-like algo-rithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.1 Labels for the zones/countries considered in MULTIMOD. . . . 90

5.2 Spectral radii for point and block Gauss-Seidel. . . . . . . . . . . 100

5.3 Operation count in Mflops for Newton combined with SGE andMATLAB’s sparse solver, and Gauss-Seidel. . . . . . . . . . . . 113

5.4 Average number of Mflops for BiCGSTAB. . . . . . . . . . . . . 118

5.5 Average number of Mflops for QMR. . . . . . . . . . . . . . . . 119

5.6 Average number of Mflops for GMRES(m). . . . . . . . . . . . . 120

5.7 Average number of Mflops for MATLAB’s sparse LU. . . . . . . 121

List of Figures

2.1 A one dimensional function F (x) with a unique zero and its cor-responding function f(x) with multiple local minima. . . . . . . . 41

2.2 The quadratic model g(ω) built to determine the minimum ω. . . 43

3.1 Blockrecursive pattern of a Jacobian matrix. . . . . . . . . . . . 49

3.2 Sparsity pattern of the reordered Jacobian matrix. . . . . . . . . 49

3.3 Situations considered for the transformations. . . . . . . . . . . . 53

3.4 Tree T = (S, U). . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5 Numerical example showing the structure is not sufficient. . . . 60

4.1 Shared memory system. . . . . . . . . . . . . . . . . . . . . . . . 64

4.2 Distributed memory system. . . . . . . . . . . . . . . . . . . . . 65

4.3 Linear Array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4 Ring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 Mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6 Torus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.7 Hypercubes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.8 Complete graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.9 Long communication delays between two processors. . . . . . . . 70

4.10 Large differences in the workload of two processors. . . . . . . . 70

4.11 Original and ordered Jacobian matrix and corresponding DAG. . 75

4.12 Blockrecursive pattern of the model’s Jacobian matrix. . . . . . . 77

4.13 Matrix L for the Gauss-Seidel algorithm. . . . . . . . . . . . . . . 78

5.1 Linkages of the country models in the complete version of MUL-TIMOD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2 Incidence matrix of D in MULTIMOD. . . . . . . . . . . . . . . 93

5.3 Incidence matrices E3 to E1, D and A1 to A5. . . . . . . . . . . 100

LIST OF FIGURES vi

5.4 Alignment of data in memory. . . . . . . . . . . . . . . . . . . . 104

5.5 Elapsed time for 4 processors and for a single processor. . . . . 106

5.6 Relation between r and κ2 in submodel for Japan for MULTIMOD.112

5.7 Scheduling of operations for the solution of the linear system ascomputed on page 110. . . . . . . . . . . . . . . . . . . . . . . . . 116

5.8 Incidence matrix of the stacked system for T = 10. . . . . . . . 117

Acknowledgements

This thesis is the result of my research at the Department of Econometrics ofthe University of Geneva, Switzerland.

First and foremost, I wish to express my deepest gratitude to Professor ManfredGilli, my thesis supervisor for his constant support and help. He has shown agreat deal of patience, availability and humane qualities beyond his professionalcompetence.

I would like to thank Professor Andrew Hughes-Hallett for accepting to readand evaluate this work. His research also made me discover and take interest inthe field of simulation of large macroeconometric models.

I am also grateful to Professor Fabrizio Carlevaro for accepting the presidencyof the jury, and also reading my thesis.

Moreover, I thank Professor Jean-Philippe Vial and Professor Gerhard Wannerfor being part of the jury and evaluating my work.

I am happy to be able to show my gratitude to my colleagues and friends ofthe Departement of Econometrics for creating a pleasant and enjoyable work-ing environment. David Miceli provided constant help and kind understandingduring all the stages of my research.

I am grateful to Pascale Mignon for helping me proofreading my text.

Finally, I wish to thank my parents for their kindness and encouragementswithout which I could never have achieved my goals.

Geneva, November 1995.

Chapter 1

Introduction

The purpose of this book is to present the available methodologies for the solu-tion of large-scale macroeconometric models. This work reviews classical solu-tion methods and introduces more recent techniques, such as parallel computingand nonstationary iterative algorithms.

The development of new and more efficient computational techniques has sig-nificantly influenced research and practice in macroeconometric modeling. Ouraim here is to supply practitioners and researchers with both a general presen-tation of numerical solution methods and specific discussions about particularproblems encountered in the field.

An econometric model is a simplified representation of actual economic phe-nomena. Real economic behavior is typically represented by an algebraic setof equations that forms a system of equations. The latter involves endogenousvariables, which are determined by the system itself, and exogenous variables,which influence but are not determined by the system. The model also con-tains parameters that we will assume are already estimated by an adequateeconometric technique.

We may express the econometric model in matrix form for a given period t as

F (yt, zt, β) = εt ,

where F is a vector of n functions fi, yt is a vector of n endogenous variables,zt is a vector of m exogenous variables, β is a vector of k parameters and εt isa vector of n stochastic disturbances with zero mean.

In this work, we will concentrate on the solution of the model with respect tothe endogenous variables yt. Hence, we will solve a system such as

F (yt, zt) = 0 . (1.1)

Such a model will be solved period after period for some horizon, generallyoutside the sample range used for estimation. Therefore, we usually drop theindex t.

A particular class of models, which contain anticipated variables, are describedin Chapter 5. In this case, the solution has to be computed simultaneously forthe periods considered.

Introduction 2

Traditionally, in the practice of solving large macroeconometric models, twokinds of solution algorithms have been used. The most popular ones are proba-bly first-order iterative techniques and related methods like Gauss-Seidel. Oneobvious reason for this is their ease of implementation. Another reason is thattheir computational complexity is in general quite low, mainly because Gauss-Seidel naturally exploits the sparse structure of the system of equations. Theconvergence of these methods depends on the particular quantification of theequations and their ordering. Convergence is not guaranteed and its speed islinear.

Newton-type methods constitute a second group of techniques commonly usedto solve models. These methods use the information about the derivatives ofthe equations. The major advantages are then a quadratic convergence, thefact that the equations do not need to be normalized and that the orderingdoes not influence the convergence rate. The computational cost comprises theevaluation of the derivatives forming the Jacobian matrix and the solution ofthe linear system. If the linear system is solved using a classical direct methodbased on LU or QR decomposition, the complexity of the whole method is O(n3).This promises interesting savings in computations if size n can be reduced. Acommon technique consists then in applying the Newton method only to a subsetof equations, for instance the equations formed by the spike variables.

This leads to a block method, i.e. a first-order iterative method where only asubsystem of equations is solved with a Newton method. A recursive systemconstitutes the first block constitutes and the second block (in general muchsmaller) is solved by a Newton method.

However, such a method brings us back to the problem of convergence for theouter loop. Moreover, for macroeconometric models in most cases the block ofspike variables is also recursive, which then results in carrying out unnecessarycomputations.

Thus, the block method tries to take advantage from both the sparse structureof the system under consideration and the desirable convergence properties ofNewton-type algorithms. However, as explained above, this approach relapsesinto the convergence problem existing in the framework of a block method.

This suggests that the sparsity should be exploited when solving the linearsystem in the Newton method, which can be achieved by using appropriatesparse techniques.

This work presents methods for the solution of large macroeconometric models.The classical approaches mentioned above are presented with a particular em-phasis on the problem of the ordering of the equations. We then look into morerecent developments in numerical techniques.

The solution of a linear system is a basic task of most solution algorithmsfor systems of nonlinear equations. Therefore, we pay special attention to thesolution of linear systems. A central characteristic of the linear systems arisingin macroeconometric modeling is their sparsity. Hence, methods able to takeadvantage of a sparse structure are of crucial importance.

A more recent set of tools available for the solution of linear equations arenonstationary methods. We explore their performance for a particular class of

Introduction 3

models in economics.

The last decade has revealed that parallel computation is now practical and hasa significant impact on how large scale computation is performed. This tech-nology is therefore available to solve large numerical problems in economics. Aconsequence of this trend is that the efficient use of parallel machines may re-quire new algorithm development. We therefore address some practical aspectsconcerning parallel computation.

A particular class of macroeconometric models are models containing forwardlooking variables. Such models naturally give raise to very large systems ofequations, the solution of which requires heavy computations. Thus such modelsconstitute an interesting testing ground for the numerical methods addressed inthis research.

This work is organized into five chapters. Chapter 2 reviews solution tech-niques for linear and nonlinear systems. First, we discuss direct methods with aparticular stress on the sparse case. This is followed by the presentation of iter-ative methods for linear systems, displaying both stationary and nonstationarytechniques.

For the nonlinear case, we concentrate on the Newton method and some ofits principal variants. Then, we examine the nonlinear versions of first-orderiterative techniques and quasi-Newton methods. The alternative approach ofresidual minimization and issues about global convergence are also analyzed.

The macroeconometric models we consider are large and sparse and thereforeanalyzing their logical structure is relevant. Chapter 3 introduces a graph-theoretical approach to perform this analysis. We first introduce the method toinvestigate the recursive structures. Later, original techniques are developed toanalyze interdependent structures, in particular by an algorithm for computingminimal feedback sets. These techniques are then used to seek for a blockdecomposition of a model and we conclude with a comparison of computationalcomplexity of point methods versus block methods.

Chapter 4 addresses the main concerning the type of computer and the solutiontechnique used in parallel computation. Practical aspects are also examinedthrough the application of parallel techniques to the simulation of a mediumsized macroeconometric model.

In Chapter 5, we present the theoretical framework of rational expectation mod-els. In the first part, we discuss issues concerning the existence and unicity of thesolution. In the second part, we present a multi-region econometric model withforward-looking variables. Then, different solution techniques are experimentedto solve this model.

Chapter 2

A Review of SolutionTechniques

This chapter reviews classic and well implemented solution techniques for linearand nonlinear systems. First, we discuss direct and iterative methods for linearsystems. Some of these methods are part of the fundamental building blocksfor many techniques for solving nonlinear systems presented later. The topichas been extensively studied and many methods have been analyzed in scientificcomputing literature, see e.g. Golub and Van Loan [56], Gill et al. [47], Barrettet al. [8] and Hageman and Young [60].

Second, the nonlinear case is addressed essentially presenting methods based onNewton iterations.

First, direct methods for solving linear systems of equations are displayed. Thefirst section presents the LU factorization—or Gaussian elimination technique—is presented and the second section describes, an orthogonalization decompo-sition leading to the QR factorization. The case of dense and sparse systemsare then addressed. Other direct methods also exist, such as the Singular ValueDecomposition (SVD) which can be used to solve linear systems. Even thoughthis can constitute an interesting and useful approach we do not resort to ithere.

Section 2.4 introduces stationary iterative methods such as Jacobi, Gauss-Seidel,SOR techniques and their convergence characteristics.

Nonstationary iterative methods—such as the conjugate gradient, general min-imal residual and biconjugate gradient, for instance—a class of more recentlydeveloped techniques constitute the topic of Section 2.5.

Section 2.10 presents nonlinear first-order methods that are quite popular inmacroeconometric modeling. The topic of Section 2.11 is an alternative ap-proach to the solution of a system of nonlinear equations: a minimization of theresiduals norm.

To overcome the nonconvergent behavior of the Newton method in some circum-stances, two globally convergent modifications are introduced in Section 2.12.

2.1 LU Factorization 5

Finally, we discuss stopping criteria and scaling.

2.1 LU Factorization

For a linear model, finding a vector of solutions amounts to solving for x asystem written in matrix form

Ax = b , (2.1)

where A is a n× n real matrix and b a n× 1 real vector.

System (2.1) can be solved by the Gaussian elimination method which is awidely used algorithm and here, we present its application for a dense matrixA with no particular structure.

The basic idea of Gaussian elimination is to transform the original system intoan equivalent triangular system. Then, we can easily find the solution of sucha system. The method is based on the fact that replacing an equation by alinear combination of the others leaves the solution unchanged. First, this ideais applied to get an upper triangular equivalent system. This stage is called theforward elimination of the system. Then, the solution is found by solving theequations in reverse order. This is the back substitution phase.

To describe the process with matrix algebra, we need to define a transformationthat will take care of zeroing the elements below the diagonal in a column ofmatrix A. Let x ∈ R

n be a column vector with xk = 0. We can define

τ (k) = [0 . . . 0 τ(k)k+1 . . . τ (k)

n ]′ with τ(k)i = xi/xk for i = k + 1, . . . , n .

Then, the matrix Mk = I−τ (k)e′k with e′k being the k-th standard vector of Rn,

represents a Gauss transformation. The vector τ (k) is called a Gauss vector. Byapplying Mk to x, we check that we get

Mkx =

1 · · · 0 0 · · · 0...

. . ....

......

0 · · · 1 0 · · · 00 · · · −τ

(k)k+1 1 · · · 0

......

.... . .

...0 · · · −τ

(k)n 0 · · · 1

x1

...xk

xk+1

...xn

=

x1

...xk

0...0

.

Practically, applying such a transformation is carried out without explicitlybuilding Mk or resorting to matrix multiplications. For example, in order tomultiply Mk by a matrix C of size n × r, we only need to perform an outerproduct and a matrix subtraction:

MkC = (I − τ (k)e′k)C = C − τ (k)(e′kC) . (2.2)

The product e′kC selects the k-th row of C, and the outer product τ (k)(e′kC) issubtracted from C. However, only the rows from k + 1 to n of C have to beupdated as the first k elements in τ (k) are zeros. We denote by A(k) the matrixMk · · ·M1A, i.e. the matrix A after the k-th elimination step.


To triangularize the system, we need to apply n − 1 Gauss transformations,provided that the Gauss vector can be found. This is true if all the divisorsa(k)kk —called pivots—used to build τ (k) for k = 1, . . . , n are different from zero.

If for a real n×n matrix A the process of zeroing the elements below the diagonalis successful, we have

Mn−1Mn−2 · · ·M1A = U ,

where U is a n × n upper triangular matrix. Using the Sherman-Morrison-Woodbury formula, we can easily find that if Mk = I − τ (k)e′k then M−1

k =I + τ (k)e′k and so defining L = M−1

1 M−12 · · ·M−1

n−1 we can write

A = LU .

As each matrix Mk is unit lower triangular, each M−1k also has this property;

therefore, L is unit lower triangular too. By developing the product defining L,we have

L = (I + τ (1)e′1)(I + τ (2)e′2) · · · (I + τ (n−1)e′n−1) = I +n−1∑k=1

τ (k)e′k .

So L contains ones on the main diagonal and the vector τ (k) in the k-th columnbelow the diagonal for k = 1, . . . , n− 1 and we have

L =

1τ

(1)1 1

τ(1)2 τ

(2)1 1

......

. . .τ

(1)n−1 τ

(2)n−2 · · · τ

(n−1)1 1

.

By applying the Gaussian elimination to A we found a factorization of A into aunit lower triangular matrix L and an upper triangular matrix U . The existenceand uniqueness conditions as well as the result are summarized in the followingtheorem.

Theorem 1 A ∈ Rn×n has an LU factorization if the determinants of the first

n − 1 principal minors are different from 0. If the LU factorization exists andA is nonsingular, then the LU factorization is unique and det(A) = u11 · · ·unn.

The proof of this theorem can be found for instance in Golub and Van Loan [56,p. 96]. Once the factorization has been found, we obtain the solution for thesystem Ax = b, by first solving Ly = b by forward substitution and then solvingUx = y by back substitution.

Forward substitution for a unit lower triangular matrix is easy to perform. Thefirst equation gives y1 = b1 because L contains ones on the diagonal. Substitut-ing y1 in the second equation gives y2. Continuing thus, the triangular systemLy = b is solved by substituting all the known yj to get the next one.

Back substitution works similarly, but we start with xn since U is upper trian-gular. Proceeding backwards, we get xi by replacing all the known xj (j > i)in the i-th equation of Ux = y.


2.1.1 Pivoting

As described above, the Gaussian elimination breaks down when a pivot is equalto zero. In such a situation, a simple exchange of the equations leading to anonzero pivot may get us round the problem. However, the condition that allthe pivots have to be different than zero does not suffice to ensure a numericallyreliable result. Moreover at this stage, the Gaussian elimination method, isstill numerically unstable. This means that because of cancellation errors, theprocess described can lead to catastrophic results. The problem lies in the sizeof the elements of the Gauss vector τ . If they are too large compared to theelements from which they are subtracted in Equation (2.2), rounding errors maybe magnified thus destroying the numerical accuracy of the computation.

To overcome this difficulty a good strategy is to exchange the rows of the matrixduring the process of elimination to ensure that the elements of τ will alwaysbe smaller or equal to one in magnitude. This is achieved by choosing thepermutation of the rows so that

|a(k)kk | = max

i>k|a(k)

ik | . (2.3)

Such an exchange strategy is called partial pivoting and can be formalized inmatrix language as follows.

Let Pi be a permutation matrix of order n, i.e. the identity matrix with its rowsreordered. To ensure that no element in τ is larger than one in absolute value,we must permute the rows of A before applying the Gauss transformation. Thisis applied at each step of the Gaussian elimination process, which leads to thefollowing theorem:

Theorem 2 If Gaussian elimination with partial pivoting is used to computethe upper triangularization

Mn−1Pn−1 · · ·M1P1A = U ,

then PA = LU where P = Pn−1 · · ·P1 and L is a unit lower triangular matrixwith |ij | ≤ 1.

Thus, when solving a linear system Ax = b, we first compute the vector y =Mn−1Pn−1 · · ·M1P1b and then solve Ux = y by back substitution. This methodis much more stable and it is very unlikely to find catastrophic cancellationproblems. The proof of Theorem 2 is given in Golub and Van Loan [56, p. 112].

Going one step further would imply permuting not only the rows but also thecolumns of A so that in the k-th step of the Gaussian elimination the largestelement of the submatrix to be transformed is used as pivot. This strategy iscalled complete pivoting. However, applying complete pivoting is costly becauseone needs to search for the largest element in a matrix instead of a vector ateach elimination step. This overhead does not justify the gain one may obtainin the stability of the method in practice. Therefore, the algorithm of choicefor solving Ax = b, when A has no particular structure, is Gaussian eliminationwith partial pivoting.

2.2 QR Factorization 8

2.1.2 Computational Complexity

The number of elementary arithmetic operations (flops) for the Gaussian elim-ination is 2

3n3 − 12n2 − 1

6n and therefore this methods is O(n3).

2.1.3 Practical Implementation

In the case where one is only interested in the solution vector, it is not necessaryto explicitly build matrix L. It is possible to directly compute the y vector(solution of Ly = b) while transforming matrix A into an upper triangularmatrix U .

Despite the fact that Gaussian elimination seems to be easy to code, it is cer-tainly not advisable to write our own code. A judicious choice is to rely oncarefully tested software as the routines in the LAPACK library. These routinesare publicly available on NETLIB1 and are also used by the software MATLAB2

which is our main computing environment for the experiments we carried out.

2.2 QR Factorization

The QR factorization is an orthogonalization method that can be applied tosquare or rectangular matrices. Usually this is a key algorithm for computingeigenvalues or least-squares solutions and it is less applied to find the solutionof a square linear system. Nevertheless, there are at least 3 reasons (see Goluband Van Loan [56]) why orthogonalization methods, such as QR, might beconsidered:

• The orthogonal methods have guaranteed numerical stability which is notthe case for Gaussian elimination.

• In case of ill-conditioning, orthogonal methods give an added measure ofreliability.

• The flop count tends to exaggerate the Gaussian elimination advantage.3

(Particularly for parallel computers, memory traffic and other overheadstend to reduce this advantage.)

Another advantage that might favor the QR factorization is the possibility of up-dating the factors Q and R corresponding to a rank one modification of matrixA in O(n2) operations. This is also possible for the LU factorization; however,

1NETLIB can be accessed through the World Wide Web at http://www.netlib.org/

and collects mathematical software, articles and databases useful for the scientific com-munity. In Europe the URL is http://www.netlib.no/netlib/master/readme.html orhttp://elib.zib-berlin.de/netlib/master/readme.html .

2MATLAB High Performance Numeric Computation and Visualization Software is a prod-uct and registered trademark of The MathWorks, Inc., Cochituate Place, 24 Prime Park Way,Natick MA 01760, USA. URL: http://www.mathworks.com/ .

3In the application discussed in Section 4.2.2 we used the QR factorization available in thelibraries of the CM2 parallel computer.

2.2 QR Factorization 9

the implementation is much simpler with QR, see Gill et al. [47]. Updating tech-niques will prove particularly useful in the quasi-Newton algorithm presentedin Section 2.9.

These reasons suggest that QR probably are, especially on parallel devices, apossible alternative to LU to solve square systems. The QR factorization canbe applied to any rectangular matrix, but we will focus on the case of a n× nreal matrix A.

The goal is to apply to A successive orthogonal transformation matrices Hi,i = 1, 2, . . . , r to get an upper triangular matrix R, i.e.

Hr · · ·H1A = R .

The orthogonal transformations presented in the literature are usually basedupon Givens rotations or Householder reflections. This latter choice leads toalgorithms involving less arithmetic operations and is therefore presented in thefollowing.

A Householder transformation is a matrix of the form

H = I − 2ww′ with w′w = 1 .

Such a matrix is symmetric, orthogonal and its determinant is −1. Geometri-cally, this matrix represents a reflection with respect to the hyperplane definedby x|w′x = 0. By properly choosing the reflection plane, it is possible to zeroparticular elements in a vector.

Let us partition our matrix A in n column vectors [a1 · · · an]. We first look fora matrix H1 such as all the elements of H1a1 except the first one are zeros. Wedefine

s1 = −sign(a11)‖a1‖µ1 = (2s2

1 − 2a11s1)−1/2

u1 = [(a11 − s1) a21 · · · an1]′

w1 = µ1u1 .

Actually the sign of s1 is free, but it is chosen to avoid catastrophic cancellationthat may otherwise appear in computing µ1. As w′

1w1 = 1, we can let H1 =I − 2w1w

′1 and verify that H1a1 = [s1 0 · · · 0]′.

Computationally, it is more efficient to calculate the product H1A in the fol-lowing manner

H1A = A− 2w1w′1A

= A− 2w1

[w′

1a1 w′1a2 · · · w′

1am

]so the i-th column of H1A is ai− 2(w′

1ai)w1 = ai− (c1 u′1ai)w1 and c1 = 2µ2

1 =(s2

1 − s1a11)−1.

We continue this process in a similar way on a matrix A where we have removedthe first row and column. The vectors w2 and u2 will now be of dimension(n− 1)× 1 but we can complete them with zeros to build

H2 = I −[

0w2

] [0 w′

2

].

2.3 Direct Methods for Sparse Matrices 10

After n − 1 steps, we have Hn−1 · · ·H2H1A = R. As all the matrices Hi areorthogonal, their product is orthogonal too and we get

A = QR ,

with Q = (Hn−1 · · ·H1)′ = H ′1 · · ·H ′

n−1. In practice, one will neither form thevectors wi nor calculate the Q matrix as all the information is contained in theui vectors and the si scalars for i = 1, . . . , n.

The possibility to choose the sign of s1 such that there never is a subtractionin the computation of µ1 is the key for the good numerical behavior of the QRfactorization. We notice that the computation of u1 also involves a subtraction.It is possible to permute the column with the largest sum of squares below rowi − 1 into column i during the i-th step in order to minimize the risk of digitcancellation. This then leads to a factorization

PA = QR ,

where P is a permutation matrix.

Using this factorization of matrix A, it is easy to find a solution for the systemAx = b. We first compute y = Q′b and then solve Rx = y by back substitution.


The computational complexity of the QR algorithm for a square matrix of ordern is 4

3n3 + O(n2). Hence the method is of O(n3) complexity.


Again as for the LU decomposition, the explicit computation of matrix Q is notnecessary as we may build vector y during the triangularization process. Onlythe back substitution phase is needed to get the solution of the linear systemAx = b.

As has already been mentioned, the routines for computing a QR factoriza-tion (or solving a system via QR) are readily available in LAPACK and areimplemented in MATLAB.

2.3 Direct Methods for Sparse Matrices

In many cases, matrix A of the linear system contains numerous zero entries.This is particularly true for linear systems derived from large macroeconometricmodels. Such a situation may be exploited in order to organize the computationsin a way that involves only the nonzero elements. These techniques are known assparse direct methods (see e.g. Duff et al. [30]) and crucial for efficient solutionof linear systems in a wide class of practical applications.


2.3.1 Data Structures and Storage Schemes

The interest of considering sparse structures is twofold: first, the informationcan be stored in a much more compact way; second, the computations maybe performed avoiding redundant arithmetic operations involving zeros. Thesetwo aspects are somehow conflicting as a compact storage scheme may involvemore time consuming addressing operations for performing the computations.However, this conflict vanishes quickly when large problems are considered. Inorder to define our idea more clearly, let us define the density of a matrix asthe ratio between its nonzero entries and its total number of entries. Generally,when the size of the system gets larger, the density of the corresponding matrixdecreases. In other words, the larger the problem is, the sparser its structurebecomes.

Several storage structures exist for a same sparse matrix. There is no onebest data structure since the choice depends both on the data manipulationsthe computations imply and on the computer architecture and/or language inwhich these are implemented.

The following three data structures are generally used:

• coordinate scheme,

• list of successors (collection of sparse vectors),

• linked list.

The following example best illustrates these storage schemes. We consider the5× 5 sparse matrix

A =

0 −2 0 0 0.50 5 0 7 00 0 1.7 0 6

3.1 0 0 −0.2 00 0 1.2 −3 0

.

Coordinate Scheme

In this case, three arrays are used: two integer arrays for the row and columnindices—respectively r and c—and a real array x containing the elements.

For our example we have

r 4 1 2 3 5 2 4 5 1 3c 1 2 2 3 3 4 4 4 5 5x 3.1 −2 5 1.7 1.2 7 −0.2 −3 0.5 6

.

Each entry of A is represented by a triplet and corresponds to a column inthe table above. Such a storage scheme needs less memory than a full storageif the density of A is less than 1

3 . The insertion and deletion of elements areeasy to perform, whereas the direct access of elements is relatively complex.Many computations in linear algebra involve successive scans of the columns ofa matrix which is difficult to carry out using this representation.


List of successors (Collection of Sparse Vectors)

With this storage scheme, the sparse matrix A is stored as the concatenationof the sparse vectors representing its columns. Each sparse vector consists of areal array containing the nonzero entries and an integer array of correspondingrow indices. A second integer array gives the locations in the other arrays ofthe first element in each column.

For our matrix A, this representation is

index 1 2 3 4 5 6h 1 2 4 6 9 11

index 1 2 3 4 5 6 7 8 9 10 4 1 2 3 5 2 4 5 1 3x 3.1 −2 5 1.7 1.2 7 −0.2 −3 0.5 6

The integer array h contains the addresses of the list of row elements in andx. For instance, the nonzero entries in column 4 of A are stored at positionsh(4) = 6 to h(5)−1 = 9−1 = 8 in x. Thus, the entries are x(6) = 7, x(7) = −0.2and x(8) = −3. The row indices are given by the same locations in array , i.e.(6) = 2, (7) = 4 and (8) = 5.

MATLAB mainly uses this data structure to store its sparse matrices, see Gilbertet al. [44]. The main advantage is that columns can be easily accessed, whichis of very important for numerical linear algebra algorithms. The disadvantageof such a representation is the difficulty of inserting new entries. This arises forinstance when adding a row to another.

Linked List

The third alternative that is widely used for storing sparse matrices is the linkedlist. Its particularity is that we define a pointer (named head) to the first entryand each entry is associated to a pointer pointing to the next entry or to thenull pointer (named 0) for the last entry. If the matrix is stored by columns,we start a new linked list for each column and therefore we have as many headpointers as there are columns. Each entry is composed of two pieces: the rowindex and the value of the entry itself.

This is represented by the picture:

head 1 4 3.1 0

head 5 1 0.5 3 6 0

...

•

The structure can be implemented as before with arrays and we get


index 1 2 3 4 5head 4 5 9 1 7

index 1 2 3 4 5 6 7 8 9 10row 2 4 5 4 1 2 1 3 3 5

entry 7 −0.2 −3 3.1 −2 5 0.5 6 1.7 1.2link 2 3 0 0 6 0 8 0 10 0

For instance, to retrieve the elements of column 3, we begin to read head(3)=9.Then row(9)=3 gives the row index, the entry value is entry(9)=1.7 and thepointer link(9)=10 gives the next index address. The values row(10)=5, en-try(10)=1.2 and link(10)=0 indicate that the element 1.2 is at row number 5and is the last entry of the column.

The obvious advantage is the ease with which elements can be inserted anddeleted: the pointers are simply updated to take care of the modification. Thisdata structure is close to the list of successors representation, but does notnecessitate contiguous storage locations for the entries of a same column.

In practice it is often necessary to switch from one representation to another.We can also note that the linked list and the list of successors can similarly bedefined row-wise rather than column wise.

2.3.2 Fill-in in Sparse LU

Given a storage scheme, one could think of executing a Gaussian elimination asdescribed in Section 2.1. However, by doing so we may discover that the sparsityof our initial matrix A is lost and we may obtain relatively dense matrices Land U .

Indeed, depending on the choice of the pivots, the number of entries in L andU may vary. From Equation (2.2), we see that at step k of the Gaussian elimi-nation algorithm, we subtract two matrices in order to zero the elements belowthe diagonal of the k-th column. Depending on the Gauss vector τ (k), matrixτ (k)e′kC may contain nonzero elements which do not exist in matrix C. Thiscreation of new elements is called fill-in.

A crucial problem is then to minimize the fill-in as the number of operations isproportional to the density of the submatrix to be triangularized. Furthermore,a dense matrix U will result in an expensive back substitution phase.

A minimum fill-in may however conflict with the pivoting strategy, i.e. thepivot chosen to minimize the fill-in may not correspond to the element withmaximum magnitude among the elements below the k-th diagonal as definedby Equation (2.3). A common tradeoff to limit the loss of numerical stabilityof the sparse Gaussian elimination is to accept a pivot element satisfying thefollowing threshold inequality

|a(k)kk | ≥ u max

i>k|a(k)

ik | ,

where u is the threshold parameter and belongs to (0, 1]. A choice for u suggestedby Duff, Erisman and Reid [30] is u = 0.1 . This parameter heavily influencesthe fill-in and hence the complexity of the method.

2.4 Stationary Iterative Methods 14


It is not easy to establish an exact operation count for the sparse LU. The countdepends on the particular structure of matrix A and on the chosen pivotingstrategy. For a good implementation, we may expect a complexity of O(c2n)where c is the average number of elements in a row and n is the order of matrixA.


A widely used code for the direct solution of sparse linear systems is the HarwellMA28 code available on NETLIB, see Duff [29]. A new version called MA48 ispresented in Duff and Reid [31].

The software MATLAB has its own implementation using partial pivoting andminimum-degree ordering for the columns to reduce fill-in, see Gilbert et al. [44]and Gilbert and Peierls [45].

Other direct sparse solvers are also available through NETLIB (e.g. Y12MA,UMFPACK, SuperLU, SPARSE).

2.4 Stationary Iterative Methods

Iterative methods form an important class of solution techniques for solvinglarge systems of equations. They can be an interesting alternative to directmethods because they take into account the sparsity of the system and aremoreover easy to implement.

Iterative methods may be divided into two classes: stationary and nonstationary.The former rely on invariant information from an iteration to another, whereasthe latter modify their search by using the results of previous iterations.

In this section, we present stationary iterative methods such as Jacobi, Gauss-Seidel and SOR techniques.

The solution x∗ of the system Ax = b can be approximated by replacing A bya simpler nonsingular matrix M and by rewriting the systems as,

Mx = (M −A)x + b .

In order to solve this equivalent system, we may use the following recurrenceformula from a chosen starting point x0,

Mx(k+1) = (M −A)x(k) + b , k = 0, 1, 2, . . . . (2.4)

At each step k the system (2.4) has to be solved, but this task can be easyaccording to the choice of M .

The convergence of the iterates to the solution is not guaranteed. However, ifthe sequence of iterates x(k)k=0,1,2,... converges to a limit x(∞), then we havex(∞) = x∗, since relation (2.4) becomes Mx(∞) = (M − A)x(∞) + b, that isAx(∞) = b.


The iterations should be carried out an infinite number of times to reach thesolution, but we usually obtain a good approximation of x∗ after a fairly smallnumber of iterations.

There is a tradeoff between the ease in computing x(k+1) from (2.4) and thespeed of convergence of the stationary iterative method. The simplest choicefor M would be to take M = I and the fastest convergence would be obtainedby setting M = A. Of course, the choices of M that are of interest to us liebetween these two extreme cases.

Let us split the original system matrix A into

A = L + D + U ,

where D is the diagonal of matrix A and L and U are the strictly lower andupper triangular parts of A, defined respectively by dii = aii for all i, lij = aij

for i > j and uij = aij for i < j.

2.4.1 Jacobi Method

One of the simplest iterative procedures is the Jacobi method, which is foundby setting M = D. If we assume that the diagonal elements of A are nonzero,then solving the system Dx(k+1) = c for x(k+1) is easy; otherwise, we need topermute the equations to find such a matrix D. We can note that when themodel is normalized, we have D = I and the iterations are further simplified.

The sequence of Jacobi’s iterates is defined in matrix form by

Dx(k+1) = −(L + U)x(k) + b , k = 0, 1, 2, . . . ,

or by

Algorithm 1 Jacobi Method

Given a starting point x(0) ∈ Rn

for k = 0, 1, 2, . . . until convergencefor i = 1, . . . , n

x(k+1)i = (bi −

∑j =i

aijx(k)i )/aii

endend

In this method, all the entries of the vector x(k+1) are computed using only theentries of x(k). Hence, two separate vectors must be stored to carry out theiterations.

2.4.2 Gauss-Seidel Method

In the Gauss-Seidel method (GS), we use the most recently available informationto update the iterates. In this case, the i-th component of x(k+1) is computedusing the (i− 1) first entries of x(k+1) that have already been obtained and the(n− i− 1) other entries from x(k).


This process amounts to using M = L + D and leads to the formula

(L + D)x(k+1) = −Ux(k) + b ,

or to the following algorithm:

Algorithm 2 Gauss-Seidel Method


for k = 0, 1, 2, . . . until convergencefor i = 1, . . . , n

x(k+1)i = (bi −

∑j<i

aijx(k+1)i ) −

∑j>i

aijx(k)i )/aii

endend

The matrix formulation of the iterations is useful for theoretical purposes, butthe actual computation will generally be implemented component-wise as inAlgorithm 1 and Algorithm 2.

2.4.3 Successive Overrelaxation Method

A third useful technique called SOR for Successive Overrelaxation method isvery closely related to the Gauss-Seidel method. The update is computed as anextrapolation of the Gauss-Seidel step as follows: let x

(k+1)GS denote the (k + 1)

iterate for the GS method; the new iterates can then be written as in the nextalgorithm.

Algorithm 3 Successive Overrelaxation Method


for k = 0, 1, 2, . . . until convergenceCompute x

(k+1)GS by Algorithm 2

for i=1,. . . ,nx

(k+1)i = x

(k)i + ω(x

(k+1)GS,i − x

(k)i )

endend

The scalar ω is called the relaxation parameter and its optimal value, in order toachieve the fastest convergence, depends on the characteristics of the problemin question. A necessary condition for the method to converge is that ω lies inthe interval (0, 2]. When ω < 1, the GS step is dampened and this is sometimesreferred to as under-relaxation.

In matrix form, the SOR iteration is defined by

(ωL + D)x(k+1) = ((1 − ω)D − ωU)x(k) + ωb , k = 0, 1, 2, . . . . (2.5)

When ω is unity, the SOR method collapses to GS.


2.4.4 Fast Gauss-Seidel Method

The idea of extrapolating the step size to improve the speed of convergencecan also be applied to SOR iterates and gives rise to the Fast Gauss-Seidelmethod (FGS) or Accelerated Over Relaxation method, see Hughes Hallett [68]and Hadjidimos [59].

Let us denote by x(k+1)SOR the (k + 1) iterate obtained by Equation (3); then the

FGS iterates are defined by

Algorithm 4 FGS Method


for k = 0, 1, 2, . . . until convergenceCompute x

(k+1)SOR by Algorithm 3

for i = 1, . . . , nx

(k+1)i = x

(k)i + γ(x

(k+1)SOR,i − x

(k)i )

endend

This method may be seen as a second-order method, since it uses a SOR iterateas an intermediate step to compute its next guess, and that the SOR alreadyuses the information from a GS step. It is easy to see that when γ = 1, we findthe SOR method.

Like ω in the SOR part, the choice of the value for γ is not straightforward.For some problems, the optimal choice of ω can be explicitly found (this is dis-cussed in Hageman and Young [60]). However, it cannot be determined a priorifor general matrices. There is no way of computing the optimal value for γcheaply and some authors (e.g. Hughes Hallett [69], Yeyios [103]) offered ap-proximations of γ. However, numerical tests produced variable outcomes: some-times the approximation gave good convergence rates, sometimes poor ones, seeHughes-Hallett [69]. As for the ω parameter, the value of γ is usually chosen byexperimentation on the characteristics of system at stake.

2.4.5 Block Iterative Methods

Certain problems can naturally be decomposed into a set of subproblems withmore or less tight linkages.4 In economic analysis, this is particularly true formulti-country macroeconometric models where the different country models arelinked together by a relatively small number of trade relations for example (seeFaust and Tryon [35]). Another such situation is the case of disaggregatedmulti-sectorial models where the links between the sectors are relatively weak.

In other problems where such a decomposition does not follow from the con-struction of the system, one may resort to a partition where the subsystems areeasier to solve.

A block iterative method is then a technique where one iterates over the sub-systems. The technique to solve the subsystem is free and not relevant for the

4The original problem is supposed to be indecomposable in the sense described in Sec-tion 3.1.


discussion.

Let us suppose the matrix of our system is partitioned in the form

A =

A11 A12 · · · A1N

A21 A22 · · · A2N

......

...AN1 AN2 · · · ANN

where the diagonal blocks Aii i = 1, 2, . . . , N are square. We define the blockdiagonal matrix D, the block lower triangular matrix L and the block uppertriangular matrix U such that A = D + L + U :

D =

A11 0 · · · 00 A22 · · · 0

.

.

....

. . ....

0 0 · · · ANN

, L =

0 0 · · · 0A21 0 · · · 0

.

.

....

. . ....

AN1 AN2 · · · 0

, U =

0 A12 · · · A1N

0 0 · · · A2N

.

.

....

. . ....

0 0 · · · 0

.

If we write the problem Ay = b under the same partitioned form, we have A11 · · · A1N

......

AN1 · · · ANN

y1

...yN

=

b1

...bN

or elseN∑

j=1

Aij yj = bi , i = 1, 2, . . . , N .

Suppose the Aii i = 1, 2, . . . , N are nonsingular, then the following solutionscheme may be applied:

Algorithm 5 Block Jacobi method (BJ)


for k = 0, 1, 2, . . . until convergenceSolve for y

(k+1)i :

Aii y(k+1)i = bi −

N∑j=1j =i

Aij y(k)j , i = 1, 2, . . . , N

end

As we only use the information of step k to compute y(k+1)i , this scheme is called

a block iterative Jacobi method (BJ).

We can certainly use the most recent available information on the y’s for up-dating y(k+1) and this leads to the block Gauss-Seidel method (BGS):


Algorithm 6 Block Gauss-Seidel method (BGS)



(k+1)i :

Aii y(k+1)i = bi −

i−1∑j=1

Aij y(k+1)j −

N∑j=i+1

Aij y(k)j , i = 1, 2, . . . , N

end

Similarly to the presentation in Section 2.4.3, the SOR option can also be appliedas follows:

Algorithm 7 Block successive over relaxation method (BSOR)



(k+1)i :

Aii y(k+1)i = Aii y

(k)i + ω

bi −

i−1∑j=1

Aij y(k+1)j −

N∑j=i+1

Aij y(k)j − Aii y

(k)i

,

i = 1, 2, . . . , Nend

We assume that the systems Aii yi = ci can be solved by either direct or iterativemethods.

The interest of such block methods is to offer possibilities of splitting the problemin order to solve one piece at a time. This is useful when the size of the problemis such that it cannot entirely fit in the memory of the computer. Parallelcomputing also allows taking advantage of a block Jacobi implementation, sincedifferent processors can simultaneously take care of different subproblems andthus speed up the solution process, see Faust and Tryon [35].

2.4.6 Convergence

Let us now study the convergence of the stationary iterative techniques intro-duced in the last section.

The error at iteration k is defined by e(k) = (x(k) − x∗) and subtracting Equa-tion 2.4 evaluated at x∗ to the same evaluated at x(k), we get

Me(k) = (M −A)e(k−1) .

We can now relate e(k) to e(0) by writing

e(k) = Be(k−1) = B2e(k−2) = · · · = Bke(0) ,

where B is a matrix defined to be M−1(M − A). Clearly, the convergence ofx(k)k=0,1,2,... to x∗ depends on the powers of matrix B: if limk→∞ Bk = 0,then limk→∞ x(k) = x∗. It is not difficult to show that

limk→∞

Bk = 0⇐⇒ |λi| < 1 ∀i .


Indeed, if B = PJP−1 where J is the Jordan canonical form of B, then Bk =PJkP−1 and limk→∞ Bk = 0 if and only if limk→∞ Jk = 0. The matrix J isformed of Jordan blocks Ji and we see that the k-th power (for k larger thanthe size of the block) of Ji is

(Ji)k =

λki kλk−1

i

(k2

)λk−2

i · · ·(

kn − 1

)λk−n+1

i

. . .. . .

..

.(k2

)λk−2

i

kλk−1iλk

i

,

and therefore that the powers of J tend to zero if and only if |λi| < 1 for all i.

We can write the different matrices governing the convergence for each station-ary iterative method as follows:

BJ = −D−1(L + U) for Jacobi’s method,BGS = −(L + D)−1U for Gauss-Seidel,Bω = (ωL + D)−1((1 − ω)D − ωU) for SOR.

Therefore, the speed of convergence of such methods depends on the spectralradius of B, denoted by ρ(B) = maxi |λi| where λi stands for the i-th eigenvalueof matrix B. The FGS method converges for some γ > 0, if the real part of theeigenvalues of the matrix Bω is less than unity.

Given that the method converges, i.e. that ρ(B) < 1, the number of iterationsis approximately

log ε

log ρ(B),

with a convergence criterion5 expressed as

maxi

|x(k)i − x

(k−1)i |

|x(k−1)i |

< ε .

Hence, to minimize the number of iterations, we seek a splitting of matrix Aand parameters that yield a matrix B with the lowest possible spectral radius.

Different row-column permutations of A influence ρ(B) when GS, SOR and FGSmethods are applied, whereas Jacobi method is invariant to such permutations.These issues are discussed in more detail in Section 3.3. For matrices withoutspecial structure, these problems do not have a practical solution so far.


The number of elementary operations for an iteration of Jacobi or Gauss-Seidelis (2c + 1)n where c is the average number of elements in a row of A. For SOR,the count is (2c + 4)n and for FGS (2c + 7)n.

Therefore, iterative methods become competitive with sparse direct methods ifthe number of iterations K needed to converge is of order c or less.

5See Section 2.13 for a discussion of stopping criteria.

2.5 Nonstationary Iterative Methods 21

2.5 Nonstationary Iterative Methods

Nonstationary methods have been more recently developed. They use infor-mation that changes from iteration to iteration unlike the stationary methodsdiscussed in Section 2.4. These methods are computationally attractive as theoperations involved can easily be executed on sparse matrices and also requirefew storage. They also generally show a better convergence speed than station-ary iterative methods. Presentations of nonstationary iterative methods canbe found for instance in Freund et al. [39], Barrett et al. [8], Axelsson [7] andKelley [73].

First, we have to present some algorithms that solve particular systems, suchas symmetric positive definite ones, from which were derived the nonstationaryiterative methods for solving the general linear systems we are interested in.

2.5.1 Conjugate Gradient

The first and perhaps best known of the nonstationary methods is the Con-jugate Gradient (CG) method proposed by Hestenes and Stiefel [64]. Thistechnique solves symmetric positive definite systems Ax = b by using onlymatrix-vector products, inner products and vector updates. The method mayalso be interpreted as arising from the minimization of the quadratic functionq(x) = 1

2x′Ax−x′b where A is the symmetric positive definite matrix and b theright-hand side of the system. As the first order conditions for the minimizationof q(x) give the original system, the two approaches are equivalent.

The idea of the CG method is to update the iterates x(i) in the direction p(i)

and to compute the residuals r(i) = b−Ax(i) in such a way as to ensure that weachieve the largest decrease in terms of the objective function q and furthermorethat the direction vectors p(i) are A-orthogonal.

The largest decrease in q at x(0) is obtained by choosing an update in thedirection −Dq(x(0)) = b−Ax(0). We see that the direction of maximum decreaseis the residual of x(0) defined by r(0) = b−Ax(0). We can look for the optimumstep length in the direction r(0) by solving the line search problem

minα

q(x(0) + αr(0)) .

As the derivative with respect to α is

Dαq(x(0) + αr(0)) = x(0)′Ar(0) + αr(0)Ar(0) − b′r(0)

= (x(0)′A− b′)r(0) + αr(0)′Ar(0)

= r(0)′r(0) + αr(0)′Ar(0) ,

the optimal α is

α0 = − r(0)′r(0)

r(0)′Ar(0).

The method described up to now is just a steepest descent algorithm with exactline search on q. To avoid the convergence problems which are likely to arisewith this technique, it is further imposed that the update directions p(i) be


A-orthogonal (or conjugate with respect to A)—in other words, that we have

p(i)Ap(j) = 0 i = j . (2.6)

It is therefore natural to choose a direction p(i) that is closest to r(i−1) andsatisfies Equation (2.6). It is possible to show that explicit formulas for sucha p(i) can be found, see e.g. Golub and Van Loan [56, pp. 520–523]. Thesesolutions can be expressed in a computationally efficient way involving only onematrix-vector multiplication per iteration.

The CG method can be formalized as follows:

Algorithm 8 Conjugate Gradient

Compute r(0) = b − Ax(0) for some initial guess x(0)

for i = 1, 2, . . . until convergenceρi−1 = r(i−1)′r(i−1)

if i = 1 thenp(1) = r(0)

elseβi−1 = ρi−1/ρi−2

p(i) = r(i−1) + βi−1p(i−1)

endq(i) = Ap(i)

αi = ρi−1/(p(i)′q(i))

x(i) = x(i−1) + αip(i)

r(i) = r(i−1) − αiq(i)

end

In the conjugate gradient method, the i-th iterate x(i) can be shown to be thevector minimizing (x(i) − x∗)′A(x(i) − x∗) among all x(i) in the affine subspacex(0) + spanr(0), Ar(0), . . . , Am−1r(0). This subspace is called the Krylov sub-space.

Convergence of the CG Method

In exact arithmetic, the CG method yields the solution in at most n iterations,see Luenberger [78, p. 248, Theorem 2]. In particular we have the followingrelation for the error in the k-th CG iteration

‖x(k) − x∗‖2 ≤ 2√

κ

(√κ− 1√κ + 1

)k

‖x(0) − x∗‖2 ,

where κ = κ2(A), the condition number of A in the two norm. However, infinite precision and with a large κ, the method may fail to converge.

2.5.2 Preconditioning

As explained above, the convergence speed of the CG method is linked to thecondition number of the matrix A. To improve the convergence speed of theCG-type methods, the matrix A is often preconditioned, that is transformed intoA = SAS′, where S is a nonsingular matrix. The system solved is then Ax = b


where x = (S′)−1x and b = Sb. The matrix S is chosen so that the conditionnumber of matrix A is smaller than the condition number of the original matrixA and, hence, speeds up the convergence.

To avoid the explicit computation of A and the destruction of the sparsitypattern of A, the methods are usually formalized in order to use the originalmatrix A directly. We can build a preconditioner

M = (S′S)−1

and apply the preconditioning step by solving the system Mr = r. Sinceκ2(S′A(S′)−1) = κ2(S′SA) = κ2(MA), we do not actually form M from Sbut rather directly choose a matrix M . The choice of M is constrained to beinga symmetric positive definite matrix.

The preconditioned version of the CG is described in the following algorithm.

Algorithm 9 Preconditioned Conjugate Gradient


for i = 1, 2, . . . until convergenceSolve Mr(i−1) = r(i−1)

ρi−1 = r(i−1)′r(i−1)

if i = 1 thenp(1) = r(0)


p(i) = r(i−1) + βi−1p(i−1)

endq(i) = Ap(i)

αi = ρi−1/(p(i)′q(i))

x(i) = x(i−1) + αip(i)

r(i) = r(i−1) − αiq(i)

end

As the preconditioning speeds up the convergence, the question of how to choosea good preconditioner naturally arises. There are two conflicting goals in thechoice of M . First, M should reduce the condition number of the system solvedas much as possible. To achieve this, we would like to choose an M as closeto matrix A as possible. Second, since the system Mr = r has to be solvedat each iteration of the algorithm, this system should be as easy as possibleto solve. Clearly, the preconditioner will be chosen between the two extremecases M = A and M = I. When M = I, we obtain the unpreconditionedversion of the method, and when M = A, the complete system is solved inthe preconditioning step. One possibility is to take M = diag(a11, . . . , ann).This is not useful if the system is normalized, as it is sometimes the case formacroeconometric systems.

Other preconditioning methods do not explicitly construct M . Some authors,for instance Dubois et al. [28] and Adams [1], suggest to take a given numberof steps of an iterative method such as Jacobi. We can note that taking onestep of Jacobi amounts to doing a diagonal scaling M = diag(a11, . . . , ann), asmentioned above.

Another common approach is to perform an incomplete LU factorization (ILU)


of matrix A. This method is similar to the LU factorization except that itrespects the pattern of nonzero elements of A in the lower triangular part ofL and the upper triangular part of U . In other words, we apply the followingalgorithm:

Algorithm 10 Incomplete LU factorization

Set L = In The identity matrix of order n

for k = 1, . . . , nfor i = k + 1, . . . , n

if aki = 0 then Respect the sparsity pattern of A

ik = 0else

ik = aik/akk

for j = k + 1, . . . , nif aij = 0 then Respect the sparsity pattern of A

aij = aij − ikakj Gaussian elimination

endend

endend

endSet U = upper triangular part of A

This factorization can be written as A = LU + R where R is a matrix con-taining the elements that would fill-in L and U and is not actually computed.The approximate system LUr = r is then solved using forward and backwardsubstitution in the preconditioning step of the nonstationary method used.

A more detailed analysis of preconditioning and other incomplete factorizationsmay be found in Axelsson [7].

2.5.3 Conjugate Gradient Normal Equations

In order to deal with nonsymmetric systems, it is necessary either to convert theoriginal system into a symmetric positive definite equivalent one, or to generalizethe CG method. The next sections discuss these possibilities.

The first approach, and perhaps the easiest, is to transform Ax = b into asymmetric positive definite system by multiplying the original system by A′.As A is assumed to be nonsingular, A′A is symmetric positive definite and theCG algorithm can be applied to A′Ax = A′b. This method is known as theConjugate Gradient Normal Equation (CGNE) method.

A somewhat similar approach is to solve AA′y = b by the CG method and thento compute x = A′y. The difference between the two approaches is discussed inGolub and Ortega [55, pp. 397ff].

Besides the computation of the matrix-matrix and matrix-vector products, thesemethods have the disadvantage of increasing the condition number of the systemsolved since κ2(A′A) = (κ2(A))2. This in turn increases the number of iterationsof the method, see Barrett et al. [8, p. 16] and Golub and Van Loan [56].However, since the transformation and the coding are easy to implement, the


method might be appealing in certain circumstances.

2.5.4 Generalized Minimal Residual

Paige and Saunders [86] proposed a variant of the CG method that minimizesthe residual r = b − Ax in the 2-norm. It only requires the system to besymmetric and not positive definite. It can also be extended to unsymmetricsystems if some more information is kept from step to step. This method iscalled GMRES (Generalized Minimal Residual) and was introduced by Saadand Schultz [90].

The difficulty is not to loose the orthogonality property of the direction vectorsp(i). To achieve this goal, all previously generated vectors have to be keptin order to build a set of orthogonal directions, using for instance a modifiedGram-Schmidt orthogonalization process.

However, this method requires the storage and computation of an increasingamount of information. Thus, in practice, the algorithm is very limited becauseof its prohibitive cost.

To overcome these difficulties, the method may be restarted after a chosennumber of iterations m; the information is erased and the current intermediateresults are used as a new starting point. The choice of m is critically importantfor the restarted version of the method, usually referred to as GMRES(m).

The pseudo-code for this method is given hereafter.


Algorithm 11 Preconditioned GMRES(m)

Choose an initial guess x(0) and initialize an (m + 1) × m matrix Hm to hij = 0for k = 1, 2, . . . until convergence

Solve for r(k−1) Mr(k−1) = b − Ax(k−1)

β = ‖r(k−1)‖2 ; v(1) = r(k−1)/β ; q = mfor j = 1, . . . , m

Solve for w Mw = Av(j)

for i = 1, . . . , j Orthonormal basis by modified Gram-Schmidt

hij = w′v(i)

w = w − hijv(i)

endhj+1,j = ‖w‖2

if hj+1,j is sufficiently small thenq = jexit from loop on j

endv(j+1) = w/hj+1,j

endVm = [v(1) . . . v(q)]ym = argminy‖βe1 − Hmy‖2 Use the method given below to compute ym

x(k) = x(k−1) + Vmym Update the approximate solution

end

Apply Givens rotations to triangularize Hm to solve the least-squares prob-lem involving the upper Hessenberg matrix Hm

d = β e1 e1 is [1 0 . . . 0]′

for i = 1, . . . , q Compute the sine and cosine values of the rotation

if hii = 0 thenc = 1 ; s = 0

elseif |hi+1,i| > |hii| then

t = −hii/hi+1,i ; s = 1/√

1 + t2 ; c = s telse

t = −hi+1,i/hii ; c = 1/√

1 + t2 ; s = c tend

endt = c di ; di+1 = −s di ; di = thij = c hij − s hi+1,j ; hi+1,j = 0for j = i + 1, . . . , m Apply rotation to zero the subdiagonal of Hm

t1 = hij ; t2 = hi+1,j

hij = c t1 − s t2 ; hi+1,j = s t1 + c t2end

endSolve the triangular system Hmym = d by back substitution.

Another issue with GMRES is the use of the modified Gram-Schmidt methodwhich is fast but not very reliable, see Golub and Van Loan [56, p. 219]. Forill-conditioned systems, a Householder orthogonalization process is certainlya better alternative, even if it leads to an increase in the complexity of thealgorithm.


Convergence of GMRES

The convergence properties of GMRES(m) are given in the original paper whichintroduces the method, see Saad and Schultz [90].

A necessary and sufficient condition for GMRES(m) to converge appears in theresults of recent research, see Strikwerda and Stodder [95]:

Theorem 3 A necessary and sufficient condition for GMRES(m) to convergeis that the set of vectors

Vm = v|v′Ajv = 0 for 1 ≤ j ≤ m

contains only the vector 0.

Specifically, it follows that for a symmetric or skew-symmetric matrix A, GM-RES(2) converges.

Another important result stated in [95] is that, if GMRES(m) converges, it doesso with a geometric rate of convergence:

Theorem 4 If r(k) is the residual after k steps of GMRES(m), then

‖r(k)‖22 ≤ (1− ρm)k‖r(0)‖22where

ρm = min‖v‖=1

(∑mj=1(v

(1)′Av(j))2∑mj=1 ‖Av(j)‖22

)

and the vectors v(j) are the unit vectors generated by GMRES(m).

Similar conditions and rate of convergence estimates are also given for the pre-conditioned version of GMRES(m).

2.5.5 BiConjugate Gradient Method

The BiConjugate Gradient method (BiCG) takes a different approach basedupon generating two mutually orthogonal sequences of residual vectors r(i)and r(j) and A-orthogonal sequences of direction vectors p(i) and p(j).The interpretation in terms of the minimization of the residuals r(i) is lost. Theupdates for the residuals and for the direction vectors are similar to those ofthe CG method but are performed not only using A but also A′. The scalars αi

and βi ensure the bi-orthogonality conditions r(i)r(j) = p(i)Ap(j) = 0 if i = j.

The algorithm for the Preconditioned BiConjugate Gradient method is givenhereafter.


Algorithm 12 Preconditioned BiConjugate Gradient


Set r(0) = r(0)

for i = 1, 2, . . . until convergenceSolve Mz(i−1) = r(i−1)

Solve M ′z(i−1) = r(i−1)

ρi−1 = z(i−1)′r(i−1)

if ρi−1 = 0 then the method failsif i = 1 then

p(i) = z(i−1)

p(i) = z(i−1)


p(i) = z(i−1) + βi−1p(i−1)

p(i) = z(i−1) + βi−1p(i−1)

endq(i) = Ap(i)

q(i) = A′p(i)

αi = ρi−1/(p(i)′q(i))

x(i) = x(i−1) + αip(i)

r(i) = r(i−1) − αiq(i)

r(i) = r(i−1) − αiq(i)

end

The disadvantages of the method are the potential erratic behavior of the normof the residuals ri and unstable behavior if ρi is very small, i.e. the vectors r(i)

and r(i) are nearly orthogonal. Another potential breakdown situation is whenp(i)′q(i) is zero or close to zero.

Convergence of BiCG

The convergence of BiCG may be irregular, but when the norm of the residualis significantly reduced, the method is expected to be comparable to GMRES.The breakdown cases may be avoided by sophisticated strategies, see Barrett etal. [8] and references therein. Few other convergence results are known for thismethod.

2.5.6 BiConjugate Gradient Stabilized Method

A version of the BiCG method which tries to smooth the convergence was intro-duced by van der Vorst [99]. This more sophisticated method is called BiCon-jugate Gradient Stabilized method (BiCGSTAB) and its algorithm is formalizedin the following.

2.6 Newton Methods 29

Algorithm 13 BiConjugate Gradient Stabilized


Set r = r(0)

for i = 1, 2, . . . until convergenceρi−1 = r′r(i−1)

if ρi−1 = 0 then the method failsif i = 1 then

p(i) = r(i−1)

elseβi−1 = (ρi−1/ρi−2)(αi−1/wi−1)p(i) = r(i−1) + βi−1(p

(i−1) − wi−1v(i−1))

endSolve Mp = p(i)

v(i) = Apαi = ρi−1/r′v(i)

s = r(i−1) − αiv(i)

if ‖s‖ is small enough thenx(i) = x(i−1) + αipstop

endSolve Ms = st = Aswi = (t′s)/(t′t)x(i) = x(i−1) + αip + wisr(i) = s − witFor continuation it is necessary that wi = 0

end

The method is more costly in terms of required operations than BiCG, but doesnot involve the transpose of matrix A in the computations; this can sometimesbe an advantage. The other main advantages of BiCGSTAB are to avoid theirregular convergence pattern of BiCG and usually to show a better convergencespeed.

2.5.7 Implementation of Nonstationary Iterative Methods

The codes for conjugate gradient type methods are easy to implement, but theinterested user should first check the NETLIB repository. It contains the pack-age SLAP 2.0 that solves sparse and large linear systems using preconditionediterative methods.

We used the MATLAB programs distributed with the Templates book by Bar-rett et al. [8] as a basis and modified this code for our experiments.

2.6 Newton Methods

In this section and the following ones, we present classical methods for thesolution of systems of nonlinear equations. The following notation will be usedfor nonlinear systems. Let F : R

n → Rn represent a multivariable function.


The solution of the nonlinear system amounts then to the vector x∗ ∈ Rn such

that

F (x∗) = 0 ←→

f1(x∗) = 0f2(x∗) = 0

...fn(x∗) = 0 .

(2.7)

We assume F to be continuously differentiable in an open convex set U ⊂ Rn.

In the next section, we discuss the classical Newton recalling the main resultsabout convergence. We then turn to modifications of the classical method wherethe exact Jacobian matrix is replaced by some approximation.

The classical Newton method proceeds in approximating iteratively x∗ by asequence x(k)k=0,1,2,....

Given a point x(k) ∈ Rn and an evaluation of F (x(k)) and of the Jacobian matrix

DF (x(k)), we can construct a better approximation, called x(k+1) of x∗.

We may approximate F (x) in the neighborhood of x(k) by an affine functionand get

F (x) ≈ F (x(k)) + DF (x(k))(x− x(k)) . (2.8)

We can solve this local model to obtain the value of x that satisfies

F (x(k)) + DF (x(k))(x− x(k)) = 0 ,

i.e. the pointx = x(k) − (DF (x(k)))−1F (x(k)) . (2.9)

The value for x computed in (2.9) is again used to define a local model suchas (2.8). This leads to build the iterates

x(k+1) = x(k) − (DF (xk))−1F (x(k)) . (2.10)

The algorithm that implements these iterations is called the classical Newtonmethod and is formalized as follows:

Algorithm 14 Classical Newton Method

Given F : Rn → R

n continuously differentiable and a starting point x(0) ∈ Rn

for k = 0, 1, 2, . . . until convergenceCompute DF (x(k))Check that DF (x(k)) is sufficiently well conditionedSolve for s(k) DF (x(k)) s(k) = −F (x(k))x(k+1) = x(k) + s(k)

end

Geometrically, x(k+1) may be interpreted as the intersection of the n tangenthyperplanes to the functions f1, f2, . . . , fn with the hyperplane x|x = 0. Thislocal solution of the linearized problem is then used as a new starting point forthe next guess.

The main advantage of the Newton method is its quadratic convergence behav-ior, if appropriate conditions stated later are satisfied. However, this technique


may not converge to a solution for starting points outside some neighborhoodof the solution.

The classical Newton algorithm is also computationally intensive since it re-quires at every iteration the evaluation of the Jacobian matrix DF (x(k)) andthe solution of a linear system.


The computationally expensive steps in the Newton algorithm are the evaluationof the Jacobian matrix and the solution of the linear system. Hence, for a denseJacobian matrix, the complexity of the latter task determines an order of O(n3)arithmetic operations.

If the system of equations is sparse, as is the case in large macroeconometricmodels, we obtain an O(c2n) complexity (see Section 2.3) for the linear system.

An analytical evaluation of the Jacobian matrix will automatically exploit thesparsity of the problem. Particular attention must be paid in case of a numericalevaluation of DF as this could introduce a O(n2) operation count.

An iterative technique may also be utilized to approximate the solution of thelinear system arising. Such techniques save computational effort, but the num-ber of iterations needed to satisfy a given convergence criterion is not known inadvance. Possibly, the number of iterations of the iterative technique may befixed beforehand.

2.6.2 Convergence

To discuss the convergence of the Newton method, we need the following defi-nition and theorem.

Definition 1 A function G : Rn → R

n×m is said to be Lipschitz continuouson an open set U ⊂ R

n if for all x, y ∈ U there exists a constant γ such that‖G(y)−G(y)‖a ≤ γ‖y− x‖b, where ‖ · ‖a is a norm on R

n×m and ‖ · ‖b on Rn.

The value of the constant γ depends on the norms chosen and the scale of DF .

Theorem 5 Let F : Rn → R

m be continuously differentiable in the open convexset U ⊂ R

n, x ∈ U and let DF be Lipschitz continuous at x in the neighborhoodU . Then, for any x + p ∈ U ,

‖F (x + p)− F (x)−DF (x)p‖ ≤ γ

2‖p‖2 ,

where γ is the Lipschitz constant.

This theorem gives a bound on how close the affine F (x)+DF (x)p is to F (x+p).This bound contains the Lipschitz constant γ which measures the degree of non-linearity of F . A proof of Theorem 5 can be found in Dennis and Schnabel [26,p. 75] for instance.

2.7 Finite Difference Newton Method 32

The conditions for the convergence of the classical Newton method are thenstated in the following theorem.

Theorem 6 If F is continuously differentiable in an open convex set U ⊂ Rn

containing x∗ with F (x∗) = 0, DF is Lipschitz continuous in a neighborhood ofx∗ and DF (x∗) is nonsingular and such that ‖DF (x∗)−1‖ ≤ β > 0, then theiterates of the classical Newton method satisfy

‖x(k+1) − x∗‖ ≤ βγ‖x(k) − x∗‖2, k = 0, 1, 2, . . . ,

for a starting guess x(0) in a neighborhood of x∗.

Two remarks are triggered by this theorem. First, the method converges fastas the error of step k + 1 is guaranteed to be less than some proportion of thesquare of the error of step k, provided all the assumptions are satisfied. For thisreason, the method is said to be quadratically convergent. We refer to Dennisand Schnabel [26, p. 90] for the proof of the theorem. The original works andfurther references are cited in Ortega and Rheinboldt [85, p. 316].

The constant βγ gives a bound for the relative nonlinearity of F and is a scalefree measure since β is an upper bound for the norm of (DF (x∗))−1. ThereforeTheorem 6 tells us that the smaller this measure of relative nonlinearity, thefaster Newton method converges.

The second remark concerns the conditions needed to verify a quadratic conver-gence. Even if the Lipschitz continuity of DF is verified, the choice of a startingpoint x(0) lying in a convergent neighborhood of the solution x∗ may be an apriori difficult problem.

For macroeconometric models the starting values can naturally be chosen asthe last period solution, which in many cases is a point not too far from thecurrent solution. Macroeconometric models do not generally show a high levelof nonlinearity and, therefore, the Newton method is generally suitable to solvethem.

2.7 Finite Difference Newton Method

An alternative to an analytical Jacobian matrix is to replace the exact deriva-tives by finite difference approximations. Even though nowadays software forsymbolic derivation is readily available, there are situations where one mightprefer to, or have to, resort to an approach which is easy to implement and onlyrequires function evaluations.

A circumstance where finite differences are certainly attractive occurs if theNewton algorithm is implemented on a SIMD computer. Such an example isdiscussed in Section 4.2.1.

We may approximate the partial derivatives in DF (x) by the forward differenceformula

(DF (x))·j ≈ F (x + hj ej)− F (x)hj

= J·j j = 1, . . . , n . (2.11)

2.7 Finite Difference Newton Method 33

The discretization error introduced by this approximation verifies the followingbound

‖J·j − (DF (x))·j‖2 ≤ γ

2max

j|hj | ,

where F a function satisfying Theorem 5. This suggests taking hj as small aspossible to minimize the discretization error.

A central difference approximation for DF (x) can also be used,

J·j =F (x + hj ej)− F (x− hj ej)

2hj. (2.12)

The bound of the discretization error is then lowered to maxj(γ/6)h2j at the

cost of twice as many function evaluations.

Finally, the choice of hj also has to be discussed in the framework of the numer-ical accuracy one can obtain on a digital computer. The approximation theorysuggests to take hj as small as possible to reduce the discretization error inthe approximation of DF . However, since the numerator of (2.11) evaluatesto function values that are close, a cancellation error might occur so that theelements Jij may have very few or even no significant digits.

According to the theory, e.g. Dennis and Schnabel [26, p. 97], one may choosehj so that F (x + hj ej) differs from F (x) in at least the leftmost half of itssignificant digits. Assuming that the relative error in computing F (x) is u,defined as in Section A.1, then we would like to have

‖fi(x + hj ej)− fi(x)‖‖fi(x)‖ ≤ √u ∀i, j .

The best guess is then hj =√

uxj in order to cope with the different sizes of theelements of x, the discretization error and the cancellation error. In the case ofcentral difference approximations, the choice for hj is modified to hj = u2/3 xj .

The finite difference Newton algorithm can then be expressed as follows.

Algorithm 15 Finite Difference Newton Method

Given F : Rn → R


for k = 0, 1, 2, . . . until convergenceEvaluate J(k) according to 2.11 or 2.12Solve J(k) s(k) = −F (x(k))x(k+1) = x(k) + s(k)

end

2.7.1 Convergence of the Finite Difference Newton Me-thod

When replacing the analytically evaluated Jacobian matrix by a finite differenceapproximation, it can be shown that the convergence of the Newton iterativeprocess remains quadratic if the finite difference step size is chosen to satisfyconditions specified in the following. (Proofs can be found, for instance, inDennis and Schnabel [26, p. 95] or Ortega and Rheinboldt [85, p. 360].)

2.8 Simplified Newton Method 34

If the finite difference step size h(k) is invariant with respect to the iterationsk, then the discretized Newton method shows only a linear rate of convergence.(We drop the subscript j for convenience.)

If a decreasing sequence h(k) is imposed, i.e. limk→∞ h(k) = 0, the methodachieves a superlinear rate of convergence.

Furthermore, if one of the following conditions is verified,

there exist constants c1 and k1 such that |h(k)| ≤ c1‖x(k) − x∗‖ ∀k ≥ k1 ,there exist constants c2 and k2 such that |h(k)| ≤ c2‖F (x(k))‖ ∀k ≥ k2 ,

(2.13)then the convergence is quadratic, as it is the case in the classical Newtonmethod.

The limit condition on the sequence h(k) may be interpreted as an improve-ment in the accuracy of the approximations of DF as we approach x∗. Con-ditions (2.13) ensure a tight approximation of DF and therefore lead to thequadratic convergence of the method. In practice, however, none of the con-ditions (2.13) can be tested as neither x∗ nor c1 and c2 are known. They arenevertheless important from a theoretical point of view since they show thatfor good enough approximations of the Jacobian matrix, the finite differenceNewton method will behave as well as the classical Newton method.

2.8 Simplified Newton Method

To avoid the repeated evaluation of the Jacobian matrix DF (x(k)) at each step k,one may reuse the first evaluation DF (x(0)) for all subsequent steps k = 1, 2, . . . .

This method is called simplified Newton method and it is attractive when thelevel of nonlinearity of F is not too high, since then the Jacobian matrix doesnot vary too much.

Another advantage of this simplification is that the linear system to be solved ateach step is the same for different right-hand sides, leading to significant savingsin the computational work.

As discussed before, the computationally expensive steps in the Newton methodare the evaluation of the Jacobian matrix and the solution of the correspondinglinear system. If a direct method is applied in the simplified method, these twosteps are carried out only once and, for subsequent iterations, only the forwardand back substitution phases are needed.

In the one dimensional case, this technique corresponds to a parallel-chordmethod. The first chord is taken to be the tangent to the point at coordi-nates (x(0), F (x(0))) and for the next iterations this chord is simply shifted in aparallel way.

To improve the convergence of this method, DF may occasionally be reevaluatedby choosing an integer increasing function p(k) with values in the interval [0, k]and the linear system DF (x(p(k)))s(k) = −F (x(k)) solved.

In the extreme case where p(k) = 0 , ∀k we have the simplified Newton methodand, at the other end, when p(k) = k , ∀k we have the classical Newton

2.9 Quasi-Newton Methods 35

method. The choice of the function p(k), i.e. the reevaluation scheme, has to bedetermined experimentally.

Algorithm 16 Simplified Newton Method

Given F : Rn → R


for k = 0, 1, 2, . . . until convergenceCompute DF (xp(k)) if neededSolve for s(k) DF (xp(k)) s(k) = −F (x(k))x(k+1) = x(k) + s(k)

end

2.8.1 Convergence of the Simplified Newton Method

The kind of simplification presented leads to a degradation of the speed ofconvergence as the Jacobian matrix is not updated at each step.

However, one may note that for some macroeconometric models the nonlinear-ities are often such that this type of techniques may prove advantageous com-pared to the classical Newton iterations because of the computational savingsthat can be made.

In the classical Newton method, the direction s(k) = −(DF (x(k))−1F (x(k)) is aguaranteed descent direction for the function f(x) = 1

2F (x)′F (x) = 12‖F (x)‖22

sinceDf(x) = DF (x)′F (x)

and

Df(x(k))′s(k) = −F (x(k))′DF (x(k))(DF (x(k)))−1F (x(k)) (2.14)= −F (x(k))′F (x(k)) < 0 for all F (x(k)) = 0 . (2.15)

In the simplified Newton method the direction of update is

s(k) = −(DF (x(0)))−1DF (x(k)) ,

which is a descent direction for the function f(x) as long as the matrix

(DF (x(0)))−1DF (x(k)) ,

is positive definite. If s(k) is not a descent direction, then the Jacobian matrixhas to be reevaluated at x(k) and the method restarted form this point.

2.9 Quasi-Newton Methods

The methods discussed previously did not use the exact evaluation of the Ja-cobian matrix but resorted to approximations. We will limit our presentationin this section to Broyden’s method which belongs to the class of so calledQuasi-Newton methods.

Quasi-Newton methods start either with an analytical or with a finite differ-ence evaluation for the Jacobian matrix at the starting point x(0), and therefore

2.9 Quasi-Newton Methods 36

compute x(1) like the classical Newton method does. For the successive steps,DF (x(0))—or an approximation J (0) to it—is updated using (x(0), F (x(0))) and(x(1), F (x(1))). The matrix DF (x(1)) then can be approximated at little addi-tional cost by a secant method.

The secant approximation A(1) satisfies the equation

A(1)(x(1) − x(0)) = F (x(1))− F (x(0)) . (2.16)

Matrix A(1) is obviously not uniquely defined by relation (2.16).

Broyden [20] introduced a criterion which leads to choosing—at the generic stepk—a matrix A(k+1) defined as

A(k+1) = A(k) +(y(k) −A(k)s(k)) s(k)′

s(k)′s(k)(2.17)

where y(k) = F (x(k+1))− F (x(k))and s(k) = x(k+1) − x(k) .

Broyden’s method updates matrix A(k) by a rank one matrix computed onlyfrom the information of the current step and the preceding step.

Algorithm 17 Quasi-Newton Method using Broyden’s Update

Given F : Rn → R


Evaluate A(0) by DF (x(0)) or J(0)

for k = 0, 1, 2, . . . until convergenceSolve for s(k) A(k) s(k) = −F (x(k))x(k+1) = x(k) + s(k)

y(k) = F (x(k+1)) − F (x(k))A(k+1) = A(k) + ((y(k) − A(k)s(k)) s(k)′)/(s(k)s(k)′)

end

Broyden’s method may generate sequences of matrices A(k)k=0,1,... which donot converge to the Jacobian matrix DF (x∗), even though the method producesa sequence x(k)k=0,1,... converging to x∗.

Dennis and More [25] have shown that the convergence behavior of the methodis superlinear under the same conditions as for Newton-type techniques. Theunderlying reason that enables this favorable behavior is that ‖A(k)−DF (x(k))‖stays sufficiently small.

From a computational standpoint, Broyden’s method is particularly attractivesince the solution of the successive linear systems A(k)s(k) = −F ((k)) can bedetermined by updating the initial factorization of A(0) for k = 1, 2, . . . . Suchan update necessitates O(n2) operations, therefore reducing the original O(n3)cost of a complete refactorization.

Practically, the QR factorization update is easier to implement than the LUupdate, see Gill et al. [47, pp. 125–150]. For sparse systems, however, theadvantage of the updating process vanishes.

A software reference for Broyden’s method is MINPACK by More, Garbow andHillstrom available on NETLIB.

2.10 Nonlinear First-order Methods 37

2.10 Nonlinear First-order Methods

The iterative techniques for the solution of linear systems described in sec-tions 2.4.1 to 2.4.4 can be extended to nonlinear equations.

If we interpret the stationary iterations in algorithms 1 to 4 in terms of obtainingx

(k+1)i as the solution of the j-th equation with the other (n− 1) variables held

fixed, we may immediately apply the same idea to the nonlinear case.

The first issue is then the existence of a one-to-one mapping between the setof equations fi , i = 1, . . . , n and the set of variables xi , i = 1, . . . , n.This mapping is also called a matching and it can be shown that its existenceis a necessary condition for the solution to exist, see Gilli [50] or Gilli andGarbely [51].

A matching m must be provided in order to define the variable m(i) that hasto be solved from equation i. For the method to make sense, the solution ofthe i-th equation with respect to xm(i) must exist and be unique. This solutioncan then be computed using a one dimensional solution algorithm—e.g. a onedimensional Newton method.

We can then formulate the nonlinear Jacobi algorithm.

Algorithm 18 Nonlinear Jacobi Method

Given a matching m and a starting point x(0) ∈ Rn

Set up the equations so that m(i) = i for i = 1, . . . , nfor k = 0, 1, 2, . . . until convergence

for i = 1, . . . , nSolve for xi

fi(x(k)1 , . . . , xi, . . . , x

(k)n ) = 0

and set x(k+1)i = xi

endend

The nonlinear Gauss-Seidel is obtained by modifying the “solve” statement.

Algorithm 19 Nonlinear Gauss-Seidel Method

Given a matching m and a starting point x(0) ∈ Rn

Set up the equations so that m(i) = i for i = 1, . . . , nfor k = 0, 1, 2, . . . until convergence

for i = 1, . . . , nSolve for xi

fi(x(k+1)1 , . . . , x

(k+1)i−1 , xi, x

(k)i+1, . . . , x

(k)n ) = 0

and set x(k+1)i = xi

endend

In order to keep notation simple, we will assume from now on that the equationsand variables have been set up so that we have m(i) = i for i = 1, . . . , n.

The nonlinear SOR and FGS6 algorithms are obtained by a straightforward6As already mentioned in the linear case, the FGS method should be considered as a second

2.10 Nonlinear First-order Methods 38

modification of the corresponding linear versions.

If it is possible to isolate xi from fi(x1, . . . , xn) for all i, then we have a normal-ized system of equations. This is often the case in systems of equations arisingin macroeconometric modeling. In such a situation, each variable is isolated asfollows,

xi = gi(x1, . . . , xi−1, xi+1, . . . , xn), i = 1, 2, . . . , n . (2.18)

The “solve” statement in Algorithm 18 and Algorithm 19 is now dropped sincethe solution is given in an explicit form.

2.10.1 Convergence

The matrix form of these nonlinear iterations can be found by linearizing theequations around x(k), which yields

A(k)x(k) = b(k) ,

where A(k) = DF (x(k)) and b(k) denotes the constant part of the linearizationof F . As the path of the iterates x(k)k=0,1,2,... yields to different matricesA(k) and vectors b(k), the nonlinear versions of the iterative methods can nolonger be considered stationary methods. Each system A(k)x(k) = b(k) will havea different convergence behavior not only according to the splitting of A(k) andthe updating technique chosen, but also because the values of the elements inmatrix A(k) and vector b(k) change from an iteration to another.

It follows that convergence criteria can only be stated for starting points x(0)

within a neighborhood of the solution x∗. Similarly to what has been presentedin Section 2.4.6, we can evaluate the matrix B that governs the convergenceat x∗ and state that, if ρ(B) < 1, then the method is likely to converge. Thedifficulty is that now the eigenvalues, and hence the spectral radius, vary withthe solution path. The same is also true for the optimal values of the parametersω and γ of the SOR and FGS methods.

In such a framework, Hughes Hallett [69] suggests several ways of computingapproximate optimal values for γ during the iterations. The simplest form is totake

γk = (1± |x(k)i /x

(k−2)i |)−1 ,

the sign being positive if the iterations are cycling and negative if the itera-tions are monotonic; xi is the element which violates the most the convergencecriterion.

One should also constrain γk to lie in the interval [0, 2], which is a necessarycondition for the FGS to converge. To avoid large fluctuations of γk, one maysmooth the sequence by the formula

γk = αkγk + (1− αk)γk−1 ,

where αk is chosen in the interval [0, 1]. We may note that such strategies canalso be applied in the linear case to automatically set the value for γ.

order method.

2.11 Solution by Minimization 39

2.11 Solution by Minimization

In the preceding sections, methods for the solution of nonlinear systems of equa-tions have been considered. An alternative to compute a solution of F (x) = 0is to minimize the following objective function

f(x) = ‖F (x)‖a , (2.19)

where ‖ · ‖a denotes a norm in Rn.

A reason that motivates such an alternative is that it introduces a criterionto decide whether x(k+1) is a better approximation to x∗ than x(k). As at thesolution F (x∗) = 0, we would like to compare the vectors F (x(k+1)) and F (x(k)),and to do so we compare their respective norms. What is required7 is that

‖F (x(k+1))‖a < ‖F (x(k))‖a ,

which then leads us to the minimization of the objective function (2.19).

A convenient choice is the standard euclidian norm, since it permits an analyticaldevelopment of the problem. The minimization problem then reads

minx

f(x) =12F (x)′F (x) , (2.20)

where the factor 1/2 is added for algebraic convenience.

Thus methods for nonlinear least-squares problem, such as Gauss-Newton orLevenberg-Marquardt, can immediately be applied to this framework. Sincethe system is square and has a solution, we expect to have a zero residualfunction f at x∗.

In general it is advisable to take advantage of the structure of F to directlyapproach the solution of F (x) = 0. However, in some circumstances, resorting tothe minimization of f(x) constitutes an interesting alternative. This is the casewhen the nonlinear equations contain numerical inaccuracies preventing F (x) =0 from having a solution. If the residual f(x) is small, then the minimizationapproach is certainly preferable.

To devise a minimization algorithm for f(x), we need the gradient Df(x) andthe Hessian matrix D2f(x), that is

Df(x) = DF (x)′F (x) (2.21)D2f(x) = DF (x)′DF (x) + Q(x) (2.22)

with Q(x) =n∑

i=1

fi(x)D2fi(x) .

We recall that F (x) = [f1(x) . . . fn(x)]′, each fi(x) is a function from Rn into R

and that each D2fi(x) is therefore the n× n Hessian matrix of fi(x).

The Gauss-Newton method approaches the solution by computing a Newtonstep for the first order conditions of (2.20), Df(x) = 0. At step k the Newton

7The iterate x(k+1) computed for instance by a classical Newton step does not necessarilysatisfy this requirement.

2.11 Solution by Minimization 40

direction s(k) is determined by

D2f(x(k)) s(k) = −Df(x(k)) .

Replacing (2.21) and (2.22) in the former expression we get

(DF (x(k))′DF (x(k)) + Q(x(k)))s(k) = −DF (x(k))′F (x(k)) . (2.23)

For x(k) sufficiently close to the solution x∗, the term Q(x(k)) in the Hessianmatrix is negligible and we may obtain the approximate s

(k)GN called the Gauss-

Newton step from

DF (x(k))′DF (x(k)) s(k)GN = −DF (x(k))′F (x(k)) . (2.24)

Computing s(k)GN using (2.24) explicitly would require calculating the solution of a

symmetric positive definite linear system. With such an approach the conditionof the linear system involving DF (x(k))′DF (x(k)) is squared compared to thefollowing alternative.

The system (2.24) constitutes the set of normal equations, and its solution canbe obtained by solving

DF (x(k)) s(k)GN = −F (x(k))

via a QR factorization.

We notice that this development leads to the same step as in the classical New-ton method, see Algorithm 14. It is worth mentioning that the Gauss-Newtonmethod does not yield the same iterates as the Newton method for a directminimization of f(x).

The Levenberg-Marquardt method is closely related to Gauss-Newton and to amodification of Newton’s method for nonlinear equations that is globally con-vergent, see Section 2.12. The Levenberg-Marquardt step s

(k)LM is computed as

the solution of

(DF (x(k))′DF (x(k)) + λkI) s(k)LM = −DF (x(k))′F (x(k)) .

It can be shown that solving this equation for s(k)LM is equivalent to compute

s(k)LM = argmins‖F (x(k)) + DF (x(k)) s‖2

subject to ‖s‖2 ≤ δ .

This method is therefore a trust-region technique which is presented in Sec-tion 2.12.

We immediately see that if λk is zero, then s(k)LM = s

(k)GN ; whereas when λk be-

comes very large, s(k)LM is the steepest descent update for minimizing f at x(k),

−Df(x(k)) = −DF (x)′F (x).

We may also note that every solution of F (x) = 0 is a solution to problem (2.20).However, the converse is not true since there may be local minimizers of f(x).Such a situation is illustrated in Figure 2.1 and can be explained by recallingthat the gradient of f(x) given by Equation (2.21), may vanish either whenF (x) = 0 or when DF (x) is singular.

2.12 Globally Convergent Methods 41

0 2 4 6-2

0

2

4

6

8

10F(x)

0 2 4 60

10

20

30

40f(x) = F(x)'*F(x)/2

Figure 2.1: A one dimensional function F (x) with a unique zero and its corre-sponding function f(x) with multiple local minima.

2.12 Globally Convergent Methods

We saw in Section 2.6 that the Newton method is quadratically convergent whenthe starting guess x(0) is close enough to x∗. An issue with the Newton methodis that when x(0) is not in a convergent neighborhood of x∗, the method maynot converge at all.

There are different strategies to overcome this difficulty. All of them first con-sider the Newton step and modify it only if it proves unsatisfactory. This ensuresthat the quadratic behavior of the Newton method is maintained near the so-lution.

Some of the methods proposed resort to a hybrid strategy, i.e. they allow toswitch their search technique to a more robust method when the iterate is notclose enough to the solution.

It is possible to devise a hybrid method by combining a Newton-like methodand a quasirandom search (see e.g. Hickernell and Fang [65]), whereas an al-ternative is to expand the radius of convergence and try to diminish the com-putational cost by switching between a Gauss-Seidel and a Newton-like methodas in Hughes Hallett et al. [71]. Other such hybrid methods can be imaginedby turning to a more robust—though less rapid—technique when the Newtonmethod does not provide a satisfactory step.

The second modification amounts to building a model-trust region and we takesome combination of the Newton direction for F (x) = 0 and the steepest descentdirection for minimizing f(x).

2.12.1 Line-search

As already mentioned, a criterion to decide whether a Newton step is acceptableis to impose a decrease in f(x). We also know that

s(k) = (DF (x(k)))−1F (x(k))


is a descent direction for f(x) since Df(x(k))′s(k) < 0, see Equation 2.14 onpage 35.

The idea is now to adjust the length of s(k) by a factor ωk to provide a stepωks(k) that leads to a decrease in f . The simple condition f(x(k+1)) < f(x(k))is not sufficient to ensure that the sequence of iterates x(k)k=0,1,2,... will leadto x∗. The issue is that either the decreases of f could be too small compared tothe length of the steps or that the steps are too small compared to the decreaseof f .

The problem can be fixed by imposing bounds on the decreases of ω. We recallthat ω = 1 first has to be tried to retain the quadratic convergence of themethod. The value of ω is chosen in a way such as it minimizes a model builton the information available. Let us define

g(ω) = f(x(k) + ωs(k)) ,

where s(k) is the usual Newton step for solving F (x) = 0. We know the valuesof

g(0) = f(x(k)) = (1/2)F (x(k))′F (x(k))

andDg(0) = Df(x(k))′s(k) = F (x(k))′DF (x(k))s(k) .

Since a Newton step is tried first we also have the value of g(1) = f(x(k) +s(k)).

If the inequality

g(1) > g(0) + αDg(0) , α ∈ (0, 0.5) (2.25)

is satisfied, then the decrease in f is too small and a backtrack along s(k) isintroduced by diminishing ω. A quadratic model of g is built—using the infor-mation g(0), g(1) and Dg(0)—to find the best approximate ω. The parabola isdefined by

g(ω) = (g(1)− g(0)−Dg(0))w2 + Dg(0)w + g(0) ,

and is illustrated in Figure 2.2.

The minimum value taken by g(ω), denoted ω, is determined by

Dg(ω) = 0 =⇒ ω =−Dg(0)

2(g(1)− g(0)−Dg(0)).

Lower and upper bounds for ω are usually set to constrain ω ∈ [0.1, 0.5] so thatvery small or too large step values are avoided.

If x(k) = x(k) + ωs(k) still does not satisfy (2.25), a further backtrack is per-formed. As we now have a new item of information about g, i.e. g(ω), a cubicfit is carried out.

The line-search can therefore be formalized as follows.


-0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

w

g(w

)

Figure 2.2: The quadratic model g(ω) built to determine the minimum ω.

Algorithm 20 Line-search with backtrack

Choose 0 < α < 0.5 (α = 10−4)and 0 < l < u < 1 (l = 0.1, u = 0.5)wk = 1while f(x(k) + wks(k)) > f(x(k)) + αωkDf(x(k))′s(k)

Compute wk by cubic interpolation(or quadratic interpolation the first time)

end x(k+1) = x(k) + ωks(k)

A detailed version of this algorithm is given in Dennis and Schnabel [26, Algo-rithm A6.3.1].

2.12.2 Model-trust Region

The second alternative to modify the Newton step is to change not only itslength but also its direction.

The Newton step comes from a local model of the nonlinear function f aroundx(k). The model-trust region explicitly limits the step length to a region wherethis local model is reliable. We therefore impose to s(k) to lie in such a regionby solving

s(k) = argmins

12‖DF (x(k))s + F (x(k))‖22 (2.26)

subject to ‖s‖2 ≤ δk

for some δk > 0. The objective function (2.26) may also be written

12s′DF (x(k))′DF (x(k))s + F (x(k))′DF (x(k))s +

12F (x(k))′F (x(k)) (2.27)

and problems arise when the matrix DF (x(k))′DF (x(k)) is not safely positivedefinite, since the Newton step that minimizes (2.27) is

s(k) = −(DF (x(k))′DF (x(k)))−1DF (x(k))′F (x(k)) .

2.13 Stopping Criteria and Scaling 44

We can detect that this matrix becomes close to singularity for instance bychecking when κ2 ≥ u−1/2, where u is defined as the unit roundoff of the com-puter, see Section A.1.

In such a circumstance we decide to perturb the matrix by adding a diagonalmatrix to it and get

DF (x(k))DF (x(k)) + λkI where λk =√

n u ‖DF (x(k))′DF (x(k))‖1 . (2.28)

This choice of λk can be shown to satisfy

1n√

u≤ κ2(DF (x(k))′DF (x(k)) + λkI)− 1 ≤ u−1/2 ,

when DF (x(k)) ≥ u1/2.

Another theoretical motivation for a perturbation such as (2.28) is the followingresult:

limλ→0+

(J ′J + λI)−1J ′ = J+ ,

where J+ denotes the Moore-Penrose pseudoinverse. This can be shown usingthe SVD decomposition.

2.13 Stopping Criteria and Scaling

In all the algorithms presented in the preceding sections, we did not spec-ify a precise termination criterion. Almost all the methods would theoreti-cally require an infinite number of iterations to reach the limit of the sequencex(k)k=0,1,2.... Moreover even the techniques that should converge in a finitenumber of steps in exact arithmetic may need a stopping criterion in a finiteprecision environment.

The decision to terminate the algorithm is of crucial importance since it deter-mines which approximation to x∗ the chosen method will ultimately produce.

To devise a first stopping criterion, we recall that the solution x∗ of our problemmust satisfy F (x∗) = 0. As an algorithm produces approximations to x∗ andas we use a finite precision representation of numbers, we should test whetherF (x(k)) is sufficiently close to the zero vector. A second way of deciding to stopis to test that two consecutive approximations, for example x(k) and x(k−1), areclose enough. This leads to considering two kinds of possible stopping criteria.

The first idea is to test ‖F (x(k))‖ < εF for a given tolerance εF > 0, but this testwill prove inappropriate. The differences in the scale of both F and x largelyinfluence such a test. If εF = 10−5 and if any x yields an evaluation of F in therange [10−8, 10−6], then the method may stop at an arbitrary point. However,if F yields values in the interval [10−1, 102], the algorithm will never satisfy theconvergence criterion. The test ‖F (x(k))‖ ≤ εF will also very probably take intoaccount the components of x differently when x is badly scaled, i.e. when theelements in x vary widely.

The remedy to some of these problems is then to scale F and to use the infinitynorm. The scaling is done by dividing each component of F by a value di

2.13 Stopping Criteria and Scaling 45

selected so that fi(x)/di is of magnitude 1 for values of x not too close to x∗.Hence, the test ‖SF F‖∞ ≤ εF where SF = diag(1/d1, . . . , 1/dn) should be safe.

The second criterion tests whether the sequence x(k)k=0,1,2,... stabilizes itselfsufficiently to stop the algorithm. We might want to perform a test on therelative change between two consecutive iterations such as

‖x(k) − x(k−1)‖‖x(k−1)‖ ≤ εx .

To avoid problems when the iterates converge to zero, it is recommended to usethe criterion

maxi

ri ≤ εx with ri =|x(k)

i − x(k−1)i |

max|x(k)i |, xi

,

where xi > 0 is an estimate of the typical magnitude of xi. The number ofexpected correct digits in the final value of x(k) is approximately − log10(εx).

In the case where a minimization approach is used, one can also devise a test onthe gradient of f , see e.g. Dennis and Schnabel [26, p. 160] and Gill et al. [46,p. 306].

The problem of scaling in macroeconometric models is certainly an importantissue and it may be necessary to rescale the model to prevent computationalproblems. The scale is usually chosen so that all the scaled variables have magni-tude 1. We may therefore replace x by x = Sxx where Sx = diag(1/x1, . . . , 1/xn)is a positive diagonal matrix.

To analyze the impact of this change of variables, let us define F (x) = F (S−1x x)

so that we get

DF (x) = (S−1x )′ DF (S−1

x x)= S−1

x DF (x) ,

and the classical Newton step becomes

s = −(DF (x))−1F (x)= −(S−1

x DF (x))−1F (x) .

This therefore results in scaling the rows of the Jacobian matrix. We typicallyexpect that such a modification will allow a better numerical behavior by avoid-ing some of the issues due to large differences of magnitude in the numbers wemanipulate.

We also know that an appropriate row scaling will improve the condition numberof the Jacobian matrix. This is linked to the problem of finding a preconditioner,since we could replace matrix Sx in the previous development by a generalnonsingular matrix approximating the inverse of (DF (x))′.

Chapter 3

Solution of LargeMacroeconometric Models

As already introduced in Chapter 1, we are interested in the solution of a non-linear macroeconometric model represented by a system of equations of theform

F (y, z) = 0 ,

where y represents the endogenous variables of the model at hand.

The macroeconometric models we study are essentially large and sparse. Thisallows us to investigate interesting properties that follow on from sparse struc-tures.

First, we can take advantage of the information given in the structure to solvethe model efficiently. An obvious task is to seek for the blocktriangular de-composition of a model and take into account this information in the solutionprocess. The result is both a more efficient solution and a significant contribu-tion to a better understanding of the model’s functioning. We essentially deriveorderings of the equations for first-order iterative solution methods.

The chapter is organized in three sections. In the first section, the model’s struc-ture will be analyzed using a graph-theoretic approach which has the advantageof allowing an efficient algorithmic implementation.

The second section presents an original algorithm for computing minimal essen-tial sets which are used for the decomposition of the interdependent blocks ofthe model.

The results of the two previous sections provide the basis for the analysis of apopular technique used to solve large macroeconometric models.

3.1 Blocktriangular Decomposition of the Jacobian Matrix 47

3.1 Blocktriangular Decomposition of the Jaco-

bian Matrix

The logical structure of a system of equations is already defined if we knowwhich variable appears in which equation. Hence, the logical structure is notconnected to a particular quantification of the model. Its analysis will provideimportant insights into the functioning of a model, revealing robust propertieswhich are invariant with respect to different quantifications.

The first task consists in seeking whether the system of equations can be solvedby decomposing the original system of equations into a sequence of interde-pendent subsystems. In other words, we are looking for a permutation of themodel’s Jacobian matrix in order to get a blocktriangular form. This step is par-ticularly important as macroeconometric models almost always allow for such adecomposition.

Many authors have analyzed such sparse structures: some of them, e.g. Duff etal. [30], approach the problem using incidence matrices and permutations, whileothers, for instance Gilbert et al. [44], Pothen and Fahn [88] and Gilli [50], usegraph theory. This presentation follows the lines of the latter papers, since webelieve that the structural properties often rely on concepts better handled andanalyzed with graph theory. A clear discussion about the uses of graph theoryin macromodeling can be found in Gilli [49] and Gilli [50].

We first formalize the logical structure as a graph and then use a methodologybased on graphs to analyse its properties. Graphs are used to formalize relationsexisting between elements of a set. The standard notation for a graph G is

G = (X, A) ,

where X denotes the set of vertices and A is the set of arcs of the graph. An arcis a couple of vertices (xi, xj) defining an existing relation between the vertexxi and the vertex xj .

What is needed to define the logical structure of a model is its deterministicpart formally represented by the set of n equations

F (y, z) = 0 .

In order to keep the presentation simpler, we will assume that the model hasbeen normalized, i.e. a different left-hand side variable has been assigned to eachequation, so that we can write

yi = gi(y1, . . . , yi−1, yi+1, . . . , yn, z), i = 1, . . . , n .

We then see that there is a link going from the variables in the right-handside to the variable on the left-hand side. Such a link can be very naturallyrepresented by a graph G = (Y, U) where the vertices represent the variables yi,i = 1, . . . , n and an arc (yj, yi) represents the link. The graph of the completemodel is obtained by putting together the partial graphs corresponding to allsingle equations.

We now need to define the adjacency matrix AG = (aij) of our graph G = (Y, U).We have

aij =

1 if and only if the arc (yj , yi) exists,0 otherwise.

3.2 Orderings of the Jacobian Matrix 48

We may easily verify that the adjacency matrix AG has the same nonzero patternas the Jacobian matrix of our model. Thus, the adjacency matrix contains allthe information about the existing links between the variables in all equationsdefining the logical structure of the model.

We already noticed that an arc in the graph corresponds to a link in the model.A sequence of arcs from a vertex yi to a vertex yj is a path, which correspondsto an indirect link in the model, i.e. there is a variable yi which has an effecton a variable yj through the interaction of a set of equations.

A first task now consists in finding the sets of simultaneous equations of themodel, which correspond to irreducible diagonal matrices in the blockrecursivedecomposition; in the graph, this corresponds to the strong components. Astrong component is the largest possible set of interdependent vertices, i.e. theset where any ordered pair for vertices (yi, yj) verifies a path from yi to yj anda path from yj to yi. The strong components define a unique partition amongthe equations of the model into sets of simultaneous equations. The algorithmsthat find the strong components of a graph are standard and can be foundin textbooks about algorithmic graph theory or computer algorithms, see e.g.Sedgewick [93, p. 482] or Aho et al. [2, p. 193].

Once the strong components are identified, we need to know in what orderthey appear in the blocktriangular Jacobian matrix. A technique consists inresorting to the reduced graph the vertices of which are the strong componentsof the original graph and where there will be an arc between the new vertices,if there is at least one arc between the vertices corresponding to the two strongcomponents considered. The reduced graph is without circuits, i.e. there are nointerdependencies and therefore it will be possible to number the vertices in away that there does not exist arcs going from higher numbered vertices to lowernumbered vertices. This ordering of the corresponding strong components inthe Jacobian matrix then exhibits a blocktriangular pattern. We may mentionthat Tarjan’s algorithm already identifies the strong components in the orderdescribed above.

The solution of the complete model can now be performed by considering thesequence of submodels corresponding to the strong components. The effort putinto the finding the blocktriangular form is negligible compared to the complex-ity of the model-solving. Moreover, the increased knowledge of which parts ofthe model are dependent or independent of others is very helpful in simulationexercises.

The usual patterns of the blocktriangular Jacobian matrix corresponding tomacroeconometric models exhibits a single large interdependent block, which isboth preceeded and followed by recursive equations. This pattern is illustratedin Figure 3.1.

3.2 Orderings of the Jacobian Matrix

Having found the blocktriangular decomposition, we now switch our attentionto the analysis of the structure of an indecomposable submodel. Therefore,the Jacobian matrices considered in the following are indecomposable and the


..............................................

...................

.....................................

0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

.......

..

..

.......

..

..

.......

..

..

.......

..

..

.......

Figure 3.1: Blockrecursive pattern of a Jacobian matrix.

corresponding graph is a strong component.

Let us consider a directed graph G = (V, A) with n vertices and the correspond-ing set C = c1, . . . , cp of all elementary circuits of G. An essential set S ofvertices is a subset of V which covers the set C, where each circuit is consid-ered as the union set of vertices it contains. According to Guardabassi [57], aminimal essential set is an essential set of minimum cardinality, and it can beseen as a problem of minimum cover, or as a problem of minimum transversalof a hypergraph with edges defined by the set C.

For our Jacobian matrix, the essential set will enable us to find orderings suchthat the sets S and F in the matrices shown in Figure 3.2 are minimal. The setS corresponds to a subset of variables and the set F to a subset of feedbacks(entries above the diagonal of the Jacobian matrix). The set S is also called theessential feedback vertex set and the set F is called the essential feedback arcset.

..............................................................................................................................................................................................................................

......................................................................................................................................................................................................................

.........................................................

0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

S

(a)

..............................................................................................................................................................................................................................................................................................................

×

×

×

×

F

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.......

(b)

Figure 3.2: Sparsity pattern of the reordered Jacobian matrix.

Such minimum covers are an important tool in the study of large scale inter-dependent systems. A technique often used to understand and to solve suchcomplex systems consists in representing them as a directed graph which canbe made feedback-free by removing the vertices belonging to an essential set.1

1See, for instance, Garbely and Gilli [42] and Gilli and Rossier [54], where some aspects ofthe algorithm presented here have been discussed.


However, in the theory of complexity, this problem of finding minimal essen-tial sets—also referred to as the feedback-vertex-set problem—is known to beNP-complete2 and we cannot hope to obtain a solution for all graphs.

Hence, the problem is not a new one in graph theory and system analysis,where several heuristic and non-heuristic methods have been suggested in theliterature. See, for instance, Steward [94], Van der Giessen [98], Reid [89],Bodin [16], Nepomiastchy et al. [83], Don and Gallo [27] for heuristic algorithms,and Guardabassi [58], Cheung and Kuh [22] and Bhat and Kinariwala [14] fornonheuristic methods.

The main feature of the algorithm presented here is to give all optimal solutionsfor graphs corresponding in size and complexity to the commonly used large-scale macroeconomic models in reasonable time.

The efficiency of the algorithm is obtained mainly by the generation of only asubset of the set of all elementary circuits from which the minimal covers arethen computed iteratively by considering one circuit at a time. Section 3.2.1describes the iterative procedure which uses Boolean properties of minimalmonomial, Section 3.2.1 introduces the appropriate transformations necessaryto reduce the size of the graph and Section 3.2.1 presents an algorithm for thegeneration of the subset of circuits necessary to compute the covers.

Since any minimal essential set of G is given by the union of minimal essentialsets of its strong components, we will assume, without loss of generality, that Ghas only one strong component.

3.2.1 The Logical Framework of the Algorithm

The particularity of the algorithm consists in the combination of three points:a procedure which computes covers iteratively by considering one circuit at atime, transformations likely to reduce the size of the graph, and an algorithmwhich generates only a particular small subset of elementary circuits.

Iterative Construction of Covers

Let us first consider an elementary circuit ci of G as the following set of vertices

ci =⋃

j∈Ci

vj , i = 1, . . . , p , (3.1)

where Ci is the index set for the vertices belonging to circuit ci. Such a circuitci can also be written as a sum of symbols representing the vertices, i.e.∑

j∈Ci

vj , i = 1, . . . , p . (3.2)

What we are looking for are covers, i.e. sets of vertices selected such as at leastone vertex is in each circuit ci, i = 1, . . . , p. Therefore, we introduce the product

2Proof for the NP-completeness are given in Karp [72], Aho et al. [2, pp. 378-384], Gareyand Johnson [43, p. 192] and Even [32, pp. 223-224].


of all circuits represented symbolically in (3.2) :

p∏i=1

(∑j∈Ci

vj) , (3.3)

which can be developed in a sum of K monomials of the form:

K∑k=1

(∏

j∈Mk

vj) , (3.4)

where Mk is the set of indices for vertices in the k-th monomial. To eachmonomial

∏j∈Mk

vj corresponds the set⋃

j∈Mkvj of vertices which covers

the set C of all elementary circuits.

Minimal covers are then obtained by considering the vertices vi as Booleanvariables and applying the following two properties as simplification rules, wherea and b are Boolean variables:

• Idempotence: a + a = a and a · a = a

• Absorption: a + a · b = a and a · (a + b) = a .

After using idempotence for simplification of all monomials, the minimal cov-ers will be given by the set of vertices corresponding to the monomials withminimum cardinality |Mk|.We will now use the fact that the development of the expression (3.3) can becarried out iteratively, by considering one circuit at a time. Step r in thisdevelopment is then:

r∏j=1

(∑i∈Cj

vi) ·∑

i∈Cr+1

vi , r = 1, . . . , p− 1 . (3.5)

Considering the set of covers E = e obtained in step r− 1, we will constructthe set of covers E∗ which also accounts for the new circuit cr+1. Denoting nowby Cr+1 the set of vertices forming circuit cr+1, we partition the set of verticesV and the set of covers E as follows:

V1 = v|v ∈ Cr+1 and v ∈ e ∈ E (3.6)V2 = Cr+1 − V1 (3.7)E1 = e|e ∈ E and e ∩ Cr+1 = ∅ (3.8)E2 = E − E1 (3.9)

with V1 as the vertices of the new circuit cr+1 that are already covered, V2 asthose which are not covered, E1 as the satisfactory covers and E2 as those coverswhich must be extended. This partition is illustrated by means of a graph wherevertices represent the sets and where there is an edge if the corresponding setshave common elements.


Cr+1

V1

V2

V3

E1

E2

............................................................................................................................

.....................

......................................................................................................................................................

...............................................................................................................................................

...............................................................................................................................................

..........................

..........................

..........................

..........................

V3 is the set given by V − V1 − V2 and the dotted edge means that those setsmay have common elements.

Let us now discuss the four possibilities of combining an element of V1, V2 withan element of E1, E2, respectively:

1. for v ∈ V1 and ek ∈ E1, we have by definition, ek ∪ v = ek, which impliesE1 ⊆ E∗;

2. for v ∈ V1 and ei ∈ E2, we have ei ∪ v ∈ E∗ under the constraintek ⊆ ei ∪ v for all ek ∈ E1;

3. for v ∈ V2 and ek ∈ E1, we have ek ⊂ ek ∪ v, implying ek ∪ v ∈ E∗;

4. for v ∈ V2 and ei ∈ E2, we have ei ∪ v ∈ E∗, since such covers cannotbe eliminated by idempotence or absorption.

The new set of covers E∗ we seek is then

E∗ = E1 ∪A ∪B (3.10)

where the covers of set A are those defined in point 4 and the covers of set Bare those defined in point 2. Set A can be computed automatically, whereasthe construction of the elements of set B necessitates the verification of thecondition

ek ⊆ ei ∪ v for all ek ∈ E1, ei ∈ E2, v ∈ V1 . (3.11)

For a given vertex v ∈ V1, the verification of (3.11) can be limited to sets ek

verifyingv ∈ ek and 1 < card(ek) ≤ card(ei) .

Since ek ∈ E1 and ei ∈ E2 are covers, verifying ek ⊆ ei, it follows that

v ∈ ek ⇒ ek ⊆ ei ∪ v .

And for sets ek with cardinality 1, we verify

ek = v ⇒ ek ⊂ ei ∪ v .

Condensation of the Graph

In many cases, the size of the original graph can be reduced by appropriatetransformations. Subsequently, we present transformations which reduce thesize of G while preserving the minimal covers. Considering the graph G = (V, A),we define three transformations of G:


• Transformation 1: Let vi ∈ V be a vertex verifying a single outgoing arc(vi, vj). Transform predecessors of vi into predecessors of vj and removevertex vi from G.

• Transformation 2: Let vi ∈ V be a vertex verifying a single ingoing arc(vj , vi). Transform successors of vi into successors of vj and remove vertexvi from G.

• Transformation 3: Let vi ∈ V be a vertex verifying an arc of the form(vi, vi). Store vertex vi and remove vertex vi from G.

Repeat these transformations in any order as long as possible. The transformedgraph will then be called the condensed graph of G. Such a condensed graph isnot necessarily connected.

The situations described in the transformations 1 to 3 are illustrated in Fig-ure 3.3.

case 1

vi vj

.................................................. ................................................................. ...........

.................................................. ...........

................................................................ ...........

....................................................................................................................... .................................................................. ...........

....................................

............................

case 2

vi vj.......................

....................................................................................................... .............................................................

...........................................................................

....................................................................................................................... .................................................................. ...........

....................................

............................

case 3

vi..........

......................................................................................................

..............

Figure 3.3: Situations considered for the transformations.

Proposition 1 The union of the set of vertices stored in transformation 3, withthe set of vertices constituting a minimal cover for the circuits of the condensedgraph, is also a minimal cover for the original graph G.

Proof. In case 1 and in case 2, every circuit containing vertex vi must alsocontain vertex vj , and therefore vertex vi can be excluded from covers. Obvi-ously, vertices eliminated from the graph in transformation 3 must belong toevery cover. The loop of any such given vertex vi absorbs all circuits containingvi, and therefore vertex vi can be removed.

Generation of Circuits

According to idempotence and absorption rules, it is obvious that it is unneces-sary to generate all the elementary circuits of G, since a great number of themwill be absorbed by smaller subcircuits.

Given a circuit ci as defined in (3.1), we will say that circuit ci absorbs circuitcj , if and only if ci ⊂ cj .

Proposition 2 Given the set of covers E corresponding to r circuits cj, j =1, . . . , r, let us consider an additional circuit cr+1 in such a way that thereexists a circuit cj, j ∈ 1, . . . , r verifying cj ⊂ cr+1. Then, the set of coversE∗ corresponding to the r + 1 circuits cj, j = 1, . . . , r + 1 is E.


Proof. By definition, any element of ei ∈ E will contain a vertex of circuitcj . Therefore, as cj ⊂ cr+1, the set of partitions defined in (3.6–3.9) will verifycj ⊂ V1 ⇒ E1 = E ⇒ E2 = ∅ and E∗ = E1 = E.

We will now discuss an efficient algorithm for the enumeration of only thosecircuits which are not absorbed. This point is the most important, as we canonly expect efficiency if we avoid the generation of many circuits from the setof all elementary circuits, which, of course, is of explosive cardinality.

In order to explore the circuits of the condensed graph systematically, we firstconsider the circuits containing a given vertex v1, then consider the circuitscontaining vertex v2 in the subgraph V − v1, and so on. This corresponds tothe following partition of the set of circuits C:

C =n−1⋃i=1

Cvi , (3.12)

where Cvi denotes the set of all elementary circuits containing vertex vi in thesubgraph with vertex set V − v1, . . . , vi−1. It is obvious that Cvi ∩ Cvj = ∅

for i = j and that some sets Cvi will be empty.

Without loss of generality, let us start with the set of circuits Cv1 . The followingdefinition characterizes a subset of circuits of Cv1 which are not absorbed.

Definition 2 The circuit of length k + 1 defined by the sequence of vertices[v1, x1, . . . , xk, v1] is a chordless circuit if G contains neither arcs of the form(xi, xj) for j − i > 1, nor the arc (xi, v1) for i = k. Such arcs, if they exist, arecalled chords.

In order to seek the chordless circuits containing the arc (v1, vk), let us considerthe directed tree T = (S, U) with root v1 and vertex set S = Adj(v1)∪Adj(vk).The tree T enables the definition of a subset

AT = AVT ∪AS

T (3.13)

of arcs of the graph G = (V, A). The set

AVT = (xi, xj)|xi ∈ V − S and xj ∈ S (3.14)

contains arcs going from vertices not in the tree to vertices in the tree. The set

AST = (xi, xj)|xi, xj ∈ S and (xi, xj) ∈ U (3.15)

contains the cross, back and forward arcs in the tree. The tree T is shown inFigure 3.4, where the arcs belonging to AT are drawn in dotted lines. The setRvi denotes the adjacency set Adj(vi) restricted to vertices not yet in the tree.

Proposition 3 A chordless circuit containing the arc (v1, vk) cannot containarcs (vi, vj) ∈ AT .

Proof. The arc (v1, vk) constitutes a chord for all paths [v1, . . . , vk]. The arc(vk, vi), vi ∈ Rvk

constitutes a chord for all paths [vk, . . . , vi].


root

level 1

level 2

· · ·

· · ·︸︷︷︸Rvk

v1

• vk •

• • •

vk+1

.............................................................................................................................................................................................................................

...........

...................................................................................................................................

............................................................................................................................................................................................................................. ...........

..........................................................................................................................

...................................................................................................................................................................................................

.......................................................................................................................................

...........................................................................................................................................

......... ......... ......... ......... ......... ......... .........

......... ......... ......... ............... .........

......... ......... ......... ......... ......... ......... ........... .........

.......................................................

......... ......... ......... ......... ......... ............... .....................................................................................

......................................................................................................................................

.......................................................

Figure 3.4: Tree T = (S, U).

From Proposition 3, it follows that all ingoing arcs to the set of vertices adjacentto vertex v1 can be ignored in the search-algorithm for the circuits in Cv1 . Forthe circuits containing the arc (v1, vk), the same reasoning can be repeated, i.e.that all ingoing arcs to the set of vertices adjacent to vertex vk can be ignored.Continuing this procedure leads to the recursive algorithm given hereafter.

Algorithm 21 Chordless circuits

Input: The adjacency sets Adj(vi) of graph G = (V, A).Output: All chordless circuits containing vertex v.begin

initialize: k = 1; circuit(1) = v1;Rv1 = Adj(v1); S = Adj(v1);chordless(Rv1 );

endchordless(Rv):

1. for all i ∈ Rv

k = k + 1; circuit(k) = i; Ri = ∅;2. if any j ∈ Adj(i) and j ∈ circuit(2 : k) then goto 53. for all j ∈ Adj(i) and j ∈ S do

if j = circuit(1) thenoutput circuit(n), n = 1 : k;goto 5

endS = S ∪ j; Ri = Ri ∪ j;

end4. chordless(Ri);5. k = k − 1;

end6. S = S − Rv;

end

Algorithm 21 then constructs recursively the directed tree T = (S, U), which

3.3 Point Methods versus Block Methods 56

obviously evolves continuously.

The loop in line 1 goes over all vertices of a given level. The test in line 2 detectsa chordless circuit not containing vertex v1. Such a circuit is not reported asit will be detected while searching circuits belonging to some Cvi = Cv1 . Forvertex vk, the loop in line 3 constructs Rvk

, the set of restricted successors, andexpands the tree. The recursive call in line 4 explores the next level. In line 5,we replace the last vertex in the explored path by the next vertex vk+1 in thelevel. Finally, in line 6, all vertices of a given level have been explored and weremove the vertices of the set Rvk

from the tree.

3.2.2 Practical Considerations

The algorithm which generates the chordless circuits certainly remains NP, andthe maximum size of a graph one can handle depends on its particular structure.For nonreducible graphs, i.e. graphs to which none of the transformations de-scribed in Section 3.2.1 apply, we experimented that an arc density of about 0.2characterizes the structures that are the most difficult to explore. The algo-rithm handles such nonreducible graphs, with up to about 100 vertices. Forapplications, as those encountered in large scale systems with feedback, thiscorresponds to much greater problems, i.e. models with 200 to 400 interde-pendent variables, because, in practice, the corresponding graphs are alwayscondensable. This, at least, is the case for almost all macroeconomic modelsand, to our best knowledge, it has not yet been possible to compute minimalessential sets for such large models.

3.3 Point Methods versus Block Methods

Two types of methods are commonly used for numerical solution of macroe-conometric models: nonlinear first-order iterative techniques and Newton-likemethods. To be efficient, both methods have to take into account the sparsityof the Jacobian matrix. For first-order iterations, one tries to put the Jaco-bian matrix into a quasi-triangular form, whereas for Newton-like methods, itis interesting to reorder the equations such as to minimize the dimension of thesimultaneous block, i.e. the essential set S, to which the Newton algorithm isthen applied. In practice, this involves the computation of a feedback arc setfor the first method and a set of spike variables for the latter method, which isdiscussed in the previous section.

The second method can be considered as a block method, as it combines the useof a Newton technique for the set of spike variables with the use of first-orderiterations for the remaining variables. Whereas Newton methods applied tothe complete system, are insensitive to different orderings of the equations, theperformance of the block method will vary for different sets of spike variableswith the same cardinality.

Block methods solve subsets of equations with an inner loop and execute anouter loop for the complete system. If the size of the subsystems reduces to asingle equation, we have a point method. The Gauss-Seidel method, as explained


in Algorithm 2, is a point method.

We will show that, due to the particular structure of most macroeconomic mod-els, the block method is not likely to constitute an optimal strategy. We alsodiscuss the convergence of first-order iterative techniques, with respect to order-ings corresponding to different feedback arc sets, leaving, however, the questionof the optimal ordering open.

3.3.1 The Problem

We have seen in Section 2.10 that for a normalized system of equations thegeneric iteration k for the point Gauss-Seidel method is written as

xk+1i = gi(xk+1

1 , . . . , xk+1i−1 , xk

i+1, . . . , xkn) , i = 1, . . . , n . (3.16)

It is then obvious that (3.16) could be solved within a single iteration, if theentries in the Jacobian matrix of the normalized equations, corresponding to thexk

i+1, . . . , xkn, for each equation i = 1, . . . , n, were zero.3 Therefore, it is often

argued that an optimal ordering for first-order iterative methods should yielda quasi-triangular Jacobian matrix, i.e. where the nonzero entries in the uppertriangular part are minimum. Such a set of entries corresponds to a minimumfeedback arc set discussed earlier. For a given set F , the Jacobian matrix canthen be ordered as shown on panel (b) of Figure 3.2 already given before.

The complexity of Newton-like algorithms is O(n3) which promises interestingsavings in computation if n can be reduced. Therefore, various authors, e.g.Becker and Rustem [11], Don and Gallo [27] and Nepomiastchy and Ravelli [82],suggest a reordering of the Jacobian matrix as shown on panel (a) of Figure 3.2.

The equations can therefore be partitioned into two sets

xR = gR(xR; xS) (3.17)fS(xS ; xR) = 0 , (3.18)

where xS are the variables defining the feedback vertex set S (spike variables).

Given an initial value for the variables xS , the solution for the variables xR isobtained by solving the equations gR recursively. The variables xR are thenexogenous for the much smaller subsystem fS , which is solved by means of aNewton-like method. These two steps of the block method in question are thenrepeated until they achieve convergence.

3.3.2 Discussion of the Block Method

The advantages and inconveniences of first-order iterative methods and Newton-like techniques have been extensively discussed in the literature. Recently, it hasbeen shown clearly by Hughes Hallett [70] that a comparison of the theoreticalperformance between these two types of solution techniques is not possible.

As already mentioned, the solution of the original system, after the introductionof the decomposition into the subsystems (3.17) and (3.18), is obtained by means

3Which then corresponds to a lower triangular matrix.


of a first-order iterative method, combined with an embedded Newton-like tech-nique for the subsystem fS . Thus, the solution not only requires convergencefor the subsystem fS , but also convergence for the successive steps over gR andfS . Nevertheless, compared with the complexity of the solution of the originalsystem by means of Newton-like methods, such a decomposition will almostalways be preferable.

The following question then arises: would it be interesting to solve the subsystemfS with a first-order iterative technique? In order to discuss this question, weestablish the operation count to solve the system in both cases. Using Newtonmethods for the subsystem fS , the approximate operation count is

kFgR

(p nR + kNfS

23n3

S) (3.19)

where kFgR

is the number of iterations over gR and fS , and kNfS

is the number ofiterations needed to solve the embedded subsystem fS . Clearly, the values forkF

gRand kN

fSare unknown prior to solution, but we know that kN

fScan be, at

best, equal to 2 . By solving subsystem fS with a first-order iterative technique,we will obtain the following operation count

kFgR

(p nR + kFfS

p nS) (3.20)

where kFfS

is the number of iterations needed to solve the embedded subsystemfS . Taking kN

fSequal to 2 will enable us to establish from (3.19) and (3.20) the

following inequalitykF

fS< 4

3 n2S/p (3.21)

which characterizes situations where first-order iterative techniques are alwayspreferable.

It might now be interesting to investigate whether a decomposition of the Ja-cobian is still preferable in such a case. The operation count for solving f witha first-order method in kF

f iterations is obviously

kFf p n . (3.22)

Then, using (3.22) and (3.20), we obtain the following inequality

kFf < kF

gR

(nR + kFfS

nS)n

(3.23)

which characterizes situations for which a first-order method for solving f in-volves less computation than the resolution of the decomposed system (3.17)and (3.18).

The fraction in expression (3.23) is obviously greater or equal to one. Theanalysis of the structure of Jacobian matrices corresponding to most of thecommonly used macroeconomic models shows that subsystem fS , correspondingto a minimum feedback vertex set, is almost always recursive. Thus, kF

fSis equal

to one and kFgR

, kFf are identical, and therefore it is not necessary to formalize

the solution of subsystem fS into separate step.


3.3.3 Ordering and Convergence for First-order Iterations

The previous section clearly shows the interest of first-order iterative methodsfor solving macroeconomic models. It is well-known that the ordering of theequations is crucial for the convergence of first-order iterations. The orderingwe considered so far, is the result of a minimization of the cardinality nS of thefeedback vertex set S. However, such an ordering is, in general, not optimal.Hereafter, we will introduce the notation used to study the convergence of first-order iterative methods in relation with the ordering of the equations.

A linear approximation for the normalized system is given by

x = Ax + b , (3.24)

with A = ∂g∂x′ , the Jacobian matrix evaluated at the solution x∗ and b repre-

senting exogenous and lagged variables. Splitting A into A = L+U with L andU , respectively a lower and an upper triangular matrix, system (3.24) can thenbe written as (I − L)x = Ux + b. Choosing Gauss-Seidel’s first-order iterations(see Section 2.10), the k-th step in the solution process of our system is

(I − L)x(k+1) = Ux(k) + b , k = 0, 1, 2, . . . (3.25)

where x(0) is an initial guess for x. It is well known that the convergence of(3.25) to the solution x∗ can be investigated on the error equation

(I − L)e(k+1) = Ue(k) ,

with e(k) = x∗ − x(k) as the error for iteration k. Setting B = (I − L)−1U andrelating the error e(k) to the original error e(0) we get

ek = Bke0 , (3.26)

which shows that the necessary and sufficient condition for the error to convergeto zero is that Bk converges to a zero matrix, as presented in Section 2.4.6. Thisis guaranteed if all eigenvalues λi of matrix B verify |λi| < 1.

The convergence then clearly depends upon a particular ordering of the equa-tions. If the Jacobian matrix A is a lower triangular matrix, then the algorithmwill converge within one iteration. This suggests choosing a permutation of Aso that U is as “small” as possible.

Usually, the “magnitude” of matrix U is defined in terms of the number ofnonzero elements. The essential feedback arc set, defined in Section 3.2, definessuch matrices U which are “small”. However, we will show that such a criterionfor the choice of matrix U is not necessarily optimal for convergence. In general,there are several feedback arc sets with minimum cardinality and the questionof which one to choose arises.

Without a theoretical framework from which to decide about such a choice, weresorted to an empirical investigation of a small macroeconomic model.4 Thesize n of the Jacobian matrix for this model is 28 and there exist 76 minimalfeedback arc sets5 ranging from cardinality 9 to cardinality 15. We solved the

4The City University Business School (CUBS) model of the UK economy [13].5These sets have been computed with the program Causor [48].


CUBS model using all the orderings corresponding to these 76 essential feedbackarc sets. We observed that convergence has been achieved, for a given period, inless than 20 iterations by 70% of the orderings. One ordering needs 250 iterationsto converge and surprisingly 8 orderings do not converge at all. Moreover, wedid not observe any relation between the cardinality of the essential feedback arcset and the number of iterations necessary to converge to the solution. Amongothers, we tried to weigh matrix U by means of the sum of the squared elements.Once again, we came to the conclusion that there is no relationship between theweight of U and the number of iterations.

Trying to characterize the matrix U which achieves the fastest convergence, wechose to systematically explore all possible orderings for a four equations linearsystem. We calculated λmax = maxi |λi| for each matrix B corresponding to theorderings. We observed that the n! possible orderings produce at most (n− 1)!distinct values for λmax. Again, we verified that neither the number of nonzeroelements in matrix U nor their value is related to the values for λmax.

1 2

3 4

................................................................................... ...........

..............................................................................................

............................................................................................................................................

..............................................................................................

................................................................................... ...........

...........................................................................................................................................................................................................................................................................

............................................ ...........

....................................................................................................................................................

.....................................................................................................................................................................

a

.5

−.4−1.3

2

.6

−.651 2 3 4

1 −1 0 0 −.65

2 .5 −1 0 0

3 −.4 −1.3 −1 0

4 .6 2 a −1

3 1 2 4

3 −1 −.4 −1.3 0

1 0 −1 0 −.65

2 0 .5 −1 0

4 a .6 2 −1

Figure 3.5: Numerical example showing the structure is not sufficient.

Figure 3.5 displays the graph representing the four linear equations and matricesA − I corresponding to two particular orderings—[1, 2, 3, 4] and [3, 1, 2, 4]—ofthese equations.

If the value of the coefficient a is .7, then the ordering [1, 2, 3, 4], correspondingto matrix U with a single element, has the smallest possible λmax = .56. If wetake the equations in the order [3, 1, 2, 4], we get λmax = 1.38.

Setting the value of coefficient a at −.7 produces the opposite. The ordering[3, 1, 2, 4] which does not minimize the number of nonzero elements in matrix Ugives λmax = .69, whereas the ordering [1, 2, 3, 4] gives λmax = 1.52.

This example clearly shows that the structure of the equations in itself is not asufficient information to decide on a good ordering of the equations.

Having analyzed the causal structure of many macroeconomic models, we ob-served that the subset of equations corresponding to a feedback vertex set (spikevariables) happens to be almost always recursive. For models of this type, it iscertainly not indicated to use a Newton-type technique to solve the submodelwithin the block method. Obviously, in such a case, the block method is equiv-alent to a first-order iterative technique applied to the complete system. Thecrucial question is then, whether a better ordering than the one derived fromthe block method exists.

3.4 Essential Feedback Vertex Sets and the Newton Method 61

3.4 Essential Feedback Vertex Sets and the New-

ton Method

In general the convergence of Newton-type methods is not sensitive to differentorderings of the Jacobian matrix. However, from a computational point of viewit can be interesting to reorder the model so that the Jacobian matrix shows thepattern displayed in Figure 3.2 panel (a). This pattern is obtained by computingan essential feedback vertex set as explained in Section 3.2.

One way to exploit this ordering is to condense the system into one with thesame size as S. This has been suggested by Don and Gallo [27], who approximatethe Jacobian matrix of the condensed system numerically and apply a Newtonmethod on the latter. The advantage comes from the reduction of the size ofthe system to be solved. We note that this amounts to compute the solution ofa different system, which has the same solution as the original one.

Another way to take advantage of the particular pattern generated the set S isthat the LU factorization of the Jacobian matrix needed in the Newton step, caneasily be computed. Since the model is assumed to be normalized, the entrieson the diagonal of the Jacobian matrix are ones. The columns not belonging toS remain unchanged in the matrices L and U , and only the columns of L andU corresponding to the set S must be computed.

We may note that both approaches do not need sets S that are essential feed-back vertex sets. However, the smaller the cardinality of S, the larger thecomputational savings.

Chapter 4

Model Simulation onParallel Computers

The simulation of large macroeconometric models may be considered as a solvedproblem, given the performance of the computers available at present. This isprobably true if we run single simulations on a model. However, if it comes tosolving a model repeatedly a large number of times, as is the case for stochasticsimulation, optimal control or the evaluation of forecast errors, the time neces-sary to execute this task on a computer can become excessively large. Anothersituation in which we need even more efficient solution procedures arises whenwe want to explore the behavior of a model with respect to continuous changesin the parameters or exogenous variables.

Parallel computers may be a way of achieving this goal. However, in order to beefficient, these computing devices require the code to be specifically structured.Examples of use of vector and parallel computers to solve economic problemscan be found in Nagurney [81], Amman [3], Ando et al. [4], Bianchi et al. [15]and Petersen and Cividini [87].

In the first section, the fundamental terminology and concepts of parallel com-puting that are generally accepted are presented. Among these, we find a tax-onomy of hardware, the interconnection network, synchronization and commu-nication issues, and some performance measures such as speed-up and efficiency.

In a second section, we will report simulation experiences with a medium sizedmacroeconometric model. Practical issues of the implementation of parallelalgorithms on a CM2 massively parallel computer are also presented.

4.1 Introduction to Parallel Computing

Parallel computation can be defined as the situation where several processorssimultaneously execute programs and cooperate to solve a given problem. Thereare many issues that arise when considering parallel numerical computation.

In our discussion, we focus on the case where processors are located in one

4.1 Introduction to Parallel Computing 63

computer and communicate reliably and predictably. The distributed compu-tation scheme where, for instance, several workstations are linked through anetwork is not considered even though there are similar issues regarding theimplementation of the methods in such a framework.

There are important distinctions to be made and discussed when consideringparallel computing, the first being the number of processors and the type ofprocessors of the parallel machine. Some parallel computers contain severalthousand processors and are called massively parallel. Some others containfewer processing elements (up to a few hundreds) that are more powerful andcan execute more complicated tasks. This kind of computing system is usuallycalled a coarse-grained parallel computer.

Second, a global control on the processors’ behavior can be more or less tight.For instance, certain parallel computers are controlled at each step by a front-end computer which sends the instructions and/or data to every processor ofthe system. In other cases, there may not be such a precise global control, butjust a processor distributing tasks to every processor at the beginning of thecomputation and gathering results at the end.

A third distinction is made at the execution level. The operations can be syn-chronous or asynchronous. In a synchronous model, there usually are phasesduring which the processors carry out instructions independently from the oth-ers. The communication can only take place at the end of a phase, thereforeinsuring that the next phase starts at each processor with an updated infor-mation. Clearly, the time a processor waits for its data as well as the time forsynchronizing the process may create an overhead.

An alternative is to allow an asynchronous execution where there is no constraintto wait for information at a given point. The exchange of information can bedone at any point in time and the old information is purged if it has not beenused before new information is made available. In this model of execution, it ismuch harder to ensure the convergence if numerical algorithms are carried out.The development of algorithms is a difficult task since a priori we have no clearway of checking precisely what will happen during the execution.

4.1.1 A Taxonomy for Parallel Computers

Parallel computers can be classified using Flynn’s taxonomy, which is basedupon levels of parallelization in the data and the instructions. These differentclasses are presented hereafter.

A typical serial computer is a Single Instruction Single Data (SISD) machine,since it processes one item of data and one operation at a time.

When several processors are present, we can carry out the same instruction ondifferent data sets. Thus, we get the Single Instruction Multiple Data (SIMD)category also called data parallel. In this case, the control mechanism is presentat each step since every processor is told to execute the same instruction. Weare also in the presence of a synchronous execution because all the processorscarry out their job in unison and none of them can execute more instructionsthan another.


Symmetrically, Multiple Instruction Single Data (MISD) computers would pro-cess a single stream of data by performing different instructions simultaneously.

Finally, the Multiple Instruction Multiple Data (MIMD) category seems themost general as it allows an asynchronous execution of different instructionson different data. It is of course possible to constrain the execution to besynchronous by blocking the processes and by letting them wait for all othersto reach the same stage.

Nowadays, the MIMD and SIMD categories seem to give way to a new classnamed SPMD or Same Program Multiple Data. In this scheme, the system iscontrolled by a single program and combines the ease of use of SIMD with theflexibility of MIMD. Each processor executes a SIMD program on its data butit not constrained by a totally synchronous execution with the other proces-sors. The synchronization can actually take place only when processors need tocommunicate some information to each other. We may note that this can beachieved by programming a MIMD machine in a specific way. Thus, the SPMDis not usually considered as a new category of parallel machines but may beviewed as a programming style.

Network Structures

We essentially find two kinds of memory systems in parallel computers; first ashared memory system, where the main memory is a global resource availableto all processors. This is illustrated by Figure 4.1, where M stands for memorymodules and P for processors. A drawback of such a system is the difficulty inmanaging simultaneous write instructions given by distinct processors.

M M M

Interconnection

P P P· · ·

· · ·

Figure 4.1: Shared memory system.

A second memory system is a local memory, where each processor has its ownmemory; it is also called a distributed memory system and is illustrated inFigure 4.2. In the following, we will focus our discussion on distributed memorysystems.

The topology of the interconnection between the processing units is a hardwarecharacteristic.

The various ways the processors can be connected determines how the commu-


M M M

Interconnection

P P P

· · ·

Figure 4.2: Distributed memory system.

nication takes place in the system. This in turn may influence the choices forparticular algorithms.

The interconnection network linking the processors also plays a critical part inthe communication time. We will now describe now some possible architecturesfor connecting the processing elements together.

The communication network must try to balance two conflicting goals: on onehand, a short distance between any pair of processors in order to minimizethe communication costs and, on the other hand, a low number of connectionsbecause of physical and construction costs constraints.

The following three points can be put forward:

• A first characteristic of networks is their diameter defined as the maxi-mum distance between any pair of processors. The distance between twoprocessors is measured as the minimum number of links between them.This determines the time an item of information takes to travel from aprocessor to another.

• A second characteristic is the flexibility of the network, i.e. the possibilityof mapping a given topology into another that may better fit the problemor the algorithm.

• The network topology should also be scalable in order to allow other pro-cessing units to be added to extend the processing capabilities.

The simplest network is the linear array where the processors are aligned andconnected only with their immediate neighbors. This scheme tends to be usedin distributed computing rather than in parallel computing, but it is useful asworst-case benchmark for evaluating a parallel algorithm. Figure 4.3 illustratessuch a linear array.

...........................................................................................................................

... ...........................................................................................................................

... ...........................................................................................................................

... ...........................................................................................................................

... ...........................................................................................................................

... ...........................................................................................................................

...

...............................................................................................................................................................................................................................................................................................................................................................................................................

Figure 4.3: Linear Array.


This network can be improved by connecting the first and last processors ofthe linear array, thus creating a ring, see Figure 4.4. For a ring the times forcommunication are at best halved compared to the linear array.

............................................................................................................................

..

..............................................................................................................................

..............................................................................................................................

............................................................................................................................

..

............................................................................................................................

..

..............................................................................................................................

........

........

........

........

........

........

........

........

........

..............................................................................................................................................................................................................................................................................................................................................................................................................................

..........................

..........................

......................

Figure 4.4: Ring.

Another possibility is to arrange the processors on a mesh, that is a grid indimension 2. The definition of the mesh can be extended to dimension d, bystating that only neighbor processors along the axes are connected. Figure 4.5shows a mesh of 6 processors in dimension 2.

...........................................................................................................................

...

..............................................................................................................................

..............................................................................................................................

...........................................................................................................................

...

..............................................................................................................................

...........................................................................................................................

...

................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Figure 4.5: Mesh.

In a torus network, the maximum distance is approximately cut in half comparedto a mesh by adding connections between the processors along the edges to wrapthe mesh around. Figure 4.6 illustrates a torus based on the mesh of Figure 4.5.

..............................................................................................................................

............................................................................................................................

.. ............................................................................................................................

..

..............................................................................................................................

............................................................................................................................

..

..............................................................................................................................

................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

...................................................

....................................................................................................................................

.....................................................................................................................................................

..................................

Figure 4.6: Torus.

Another important way of interconnecting the system is the hypercube topology.A hypercube of dimension d can be constructed recursively from 2 hypercubesof dimension d−1 by connecting their corresponding processors. A hypercube ofdimension 0 is a single processor. Figure 4.7 shows hypercubes up to dimension4. Each of the 2d processors in a d-cube has therefore log2 p connections andthe maximum distance between two processors is also log2 p.

The last topology we consider here is the complete graph, where every processoris connected to each other, see Figure 4.8. Even if this topology is the best interms of maximum distance, it is unfortunately not feasible for a large numberof processors since the number of connections grows quadratically.


Figure 4.7: Hypercubes.

..............................................................................................................................

...........................................................................................................................

...

...........................................................................................................................

...

..............................................................................................................................

..............................................................................................................................

...........................................................................................................................

...

........

........

........

........

........

........

........

........

........

.........

.....................................................................................................................................................................................................................................................................................................................................................................................................................

..........................

..........................

......................

.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

...................................

...................................

...................................

...................................

..............................................................................................................................................................................................

..........................

..........................

..........................................................................................................................................................................................................................................................................................................................................................................

....................................

...................................

...................................

...................................

...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Figure 4.8: Complete graph.

4.1.2 Communication Tasks

The new problem introduced by multiple processing elements is the communi-cation. The total time used to solve a problem is now the sum of of the timespent for computing and the time spent for communicating:

T = Tcomp + Tcomm .

We cannot approach parallel computing without taking the communication is-sues into account, since these may prevent us from achieving improvements overserial computing. When analyzing and developing parallel algorithms, we wouldlike to keep the computation to communication ratio, i.e. Tcomp/Tcomm, as largeas possible. This seems quite obvious, but is not easy to achieve since a largernumber of processors tends to reduce the computation time, while increasing thecommunication time (this is also known as the min-max problem). This trade-off challenges the programmer to find a balance between these two conflictinggoals.

The complexity measures—defined in Appendix A.3—allow us to evaluate the


time and communication complexities of an algorithm without caring about theexact time spent on a particular computer in a particular environment. Thesemeasures are functions of the size of the problem that the algorithm solves.

We now describe a small set of communication tasks that play an important rolein a large number of algorithms. The basic communication functions implementsend and receive operations, but one quickly realizes that some standard com-munications tasks of higher level are needed. These are usually optimized for agiven architecture and are provided by the compiler manufacturer.

Single Node Broadcast

The first standard communication task is the single node broadcast. In this case,we want to send the same packet of information from a given processor, alsocalled node, to all other processors.

Multinode Broadcast

An immediate generalization is the multinode broadcast where each node simul-taneously sends the same information to all others. Typically, this operationtakes place in iterative methods when a part of the problem is solved by a node,which then sends the results of its computation to all other processors beforestarting the next iteration.

Single Node and Multinode Accumulation

The dual problem of the single node and multinode broadcast are respectivelythe single node and multinode accumulation. Here, the packets are sent fromevery node to a given node and these packet can be combined along the path ofthe communication. The simplest example of such a task is to think of it as theaddition of numbers computed at each processors. The partial sums of thesenumbers are combined at the various nodes to finally lead to the total sum atthe given accumulation node. The multinode accumulation task is performedby carrying out a single node accumulation at each node simultaneously.

Single Node Scatter

The single node scatter operation is accomplished by sending different packetsfrom a node to every other node. This is not to be considered as a single nodebroadcast since different information is dispatched to different processors.

Single Node Gather

As in the previous cases, there is a dual communication task called the singlenode gather that collects different items of information at a given node fromevery other node.


Problem Linear Array Hypercube

Single Node Broadcast Θ(p) Θ(log p)Single Node Scatter Θ(p) Θ(p/ log p)Multinode Broadcast Θ(p) Θ(p/ log p)Total Exchange Θ(p2) Θ(p)

Table 4.1: Complexity of communication tasks on a linear array and a hypercubewith p processors.

Total Exchange

Finally, the total exchange communication task involves sending different pack-ets from each node to every other node.

Complexity of Communication Tasks

All these communication tasks are linked through a hierarchy. The most generaland most difficult problem is the total exchange. The multinode accumulationand multinode broadcast tasks are special cases of the total exchange. Theyare simpler to carry out and are ranked second in the hierarchy. Then comethe single node scatter and gather which are again simpler to perform. At lastthe single node broadcast and accumulation tasks are the fourth in terms ofcommunication complexity.

The ranking in the hierarchy remains the same whatever topology is used forinterconnecting the processors. Of course, the complexity of the communicationoperations change according to the connections between the processors.

Table 4.1 gives the complexity of the communication tasks with respect to theinterconnection network used.

4.1.3 Synchronization Issues

As has already been mentioned, a parallel algorithm can carry out tasks syn-chronously or asynchronously. A synchronous behavior is obtained by settingsynchronization points in the code, i.e. statements that instruct the processorsto wait until all of them have reached this point. In the case were there existsa global control unit, the synchronization is done automatically. Another tech-nique is local synchronization. If the kind of information a processor needs ata certain point in execution is known in advance, the processors can continueits execution as soon as it has received this information. It is not necessary fora processor to know whether any other information sent has been received, sothere is no need to wait confirmation from other processors.

The problem of a synchronous algorithm is that slow communication betweenthe processors can be detrimental to the whole method. As depicted in Fig-ure 4.9, long communication delays may lead to excessively large total executiontimes. The idle periods are dashed in the figure.

Another frequent problem is that a heavier workload for a processor may degrade


Figure 4.9: Long communication delays between two processors.

the whole execution. Figure 4.10 shows that this happens although communi-cation speed is fast enough.

Figure 4.10: Large differences in the workload of two processors.

The communication penalty and the overall execution time of many algorithmscan often be substantially reduced by means of an asynchronous implemen-tation. For synchronous algorithms we a priori know in which sequence thestatements are executed. In contrast, for an asynchronous algorithm the se-quence of computations can differ from one execution to another thus leadingto differences in the execution.

4.1.4 Speedup and Efficiency of an Algorithm

In order to compare serial algorithms with parallel algorithms, we need to recalla few definitions. The speedup for a parallel implementation of an algorithmusing p processors and solving a problem of size n is generally defined as

Sp(n) =T ∗(n)Tp(n)

,

where Tp(n) is the time needed for parallel execution with p processors andT ∗(n) is the optimal time for serial execution. As this optimal time is generallyunknown, T ∗(n) is replaced by T1(n), the time required by a single processorto execute the particular parallel algorithm. An ideal situation is characterizedby Sp(n) = p. The efficiency of an algorithm is then defined by the ratio

Ep(n) =Sp(n)

p=

T1(n)p Tp(n)

,

which ranges from 0 to 1 and measures the fraction of time a processor is notstanding idle in the execution process.

4.2 Model Simulation Experiences 71

4.2 Model Simulation Experiences

In this section, we will present a practical experience with solution algorithmsexecuted in a SIMD environment. These results have been published in Gilliand Pauletto [53] and the presentation closely follows the original paper.

4.2.1 Econometric Models and Solution Algorithms

The econometric models we consider here for solution are represented by asystem of n linear and nonlinear equations

F (y, z) = 0 ⇐⇒ fi(y, z) = 0 , i = 1, 2 . . . , n , (4.1)

where F : Rn × R

m → Rn is differentiable in the neighborhood of the solution

y∗ ∈ Rn and z ∈ R

m are the lagged and the exogenous variables. In practice,the Jacobian matrix DF = ∂F/∂y′ of an econometric model can often be putinto a blockrecursive form, as shown in Figure 3.1, where the dark shadingsindicate interdependent blocks and the light shadings the existence of nonzeroelements.

The solution of the model then concerns only interdependent submodels. Theapproach presented hereafter, applies both to solution of macroeconometricmodels and to any large system of nonlinear equations having a structure similarto the one just described.

Essentially, two types of well-known methods are commonly used for the numer-ical solution of such systems: first-order iterative techniques and Newton-likemethods. The algorithm have already been introduced as Algorithm 18 andAlgorithm 19 for Jacobi and Gauss-Seidel, and Algorithm 14 for Newton. Here-after, these algorithms are presented again with slightly different layouts as wewill deal with a normalized system of equations.

First-order Iterative Techniques

The main first-order iterative techniques are the Gauss-Seidel and the Jacobiiterations.

For these methods, we consider the normalized model as shown in Equation (2.18),i.e.

yi = gi(y1, . . . , yi−1, yi+1, . . . , yn, z) , i = 1, . . . , n .

In the Jacobi case, the generic iteration k can be written

y(k+1)i = gi(y

(k)1 , . . . , y

(k)i−1, y

(k)i+1, . . . , y

(k)n , z) , i = 1, . . . , n (4.2)

and in the Gauss-Seidel case the generic iteration k uses the i − 1 updatedcomponents of the vector y(k+1) as soon as they are available, i.e.

y(k+1)i = gi(y

(k+1)1 , . . . , y

(k+1)i−1 , yk

i+1, . . . , y(k)n , z) , i = 1, . . . , n . (4.3)

In order to be operational the algorithm must also specify a termination criterionfor the iterations. These are stopped if the changes in the solution vector y(k+1)


become small enough. For instance, when the following condition is verified

εi =|y(k)

i − y(k−1)i |

|y(k−1)i |+ 1

< η i = 1, 2, . . . , n ,

where η is a given tolerance. This criterion is similar to the one introduced inSection 2.13 for appropriately scaled variables.

First-order iterative algorithms then can be summarized into the three state-ments given in Algorithm 22. The only difference in the code between the Jacobiand the Gauss-Seidel algorithms appears in Statement 2.

Algorithm 22 First-order Iterative Method

do while ( not converged )1. y0 = y1

2. Evaluate all equations3. not converged = any(|y1 − y0|/(|y0| + 1) > η)

end do

Jacobi uses different arrays for y1 and for y0, i.e.

y1i = gi(y0

1 , . . . , y0i−1, y

0i+1, . . . , y

0n, z) ,

whereas Gauss-Seidel overwrites y0 with the computed values for y1 and there-fore the same array is used in the expression

y1i = gi(y1

1 , . . . , y1i−1, y

0i+1, . . . , y

0n, z) .

Obviously, the logical variable “not converged” and the array y1 have to beinitialized before entering the loop of the algorithm.

Newton-like Methods

When solving the system with Newton-like methods, the general step k in theiterative process can be written as Equation (2.10), i.e.

y(k+1) = y(k) − (DF (y(k), z))−1F (y(k), z) . (4.4)

Due to the fact that the Jacobian matrix DF is large but very sparse, as far aseconometric models are concerned, Newton-like and iterative methods are oftencombined into a hybrid method (see for instance Section 3.3.) This consistsin applying the Newton algorithm to a subset of variables only, which are thefeedback or spike-variables.

Two types of problems occur at each iteration (4.4) when solving a model witha Newton-like method:

(a) the evaluation of the Jacobian matrix DF (yk, z)

(b) the solution of the linear system involving (DF (y(k), z))−1F (y(k), z).


The Jacobian matrix has not to be evaluated analytically and the partial deriva-tives of hi(y) are, in most algorithms, approximated by a quotient of differ-ences as given in Equation (2.11). The seven statements given in Algorithm 23schematize the Newton-like method, and we use the same initialization as forAlgorithm 22.

Algorithm 23 Newton Method

do while ( not converged )1. y0 = y1

2. X = hI + ι y0′ matrix X contains elements xij

3. for i = 1 : n, evaluate aij = fi(x.j , z), j = 1 : n and fi(y0, z)

4. J = (A − ι F (y0, z)′)/h5. solve J s = F (y0, z)6. y1 = y0 + s7. not converged = any(|y1 − y0|/(|y0| + 1) > η)

end do

4.2.2 Parallelization Potential for Solution Algorithms

The opportunities existing for a parallelization of solution algorithms dependupon the type of computer used, the particular algorithm selected to solve themodel and the kind of use of the model. We essentially distinguish betweensituations where we produce one solution at a time and situations where wewant to solve a same model for a large number of different data sets.

In order to compare serial algorithms with parallel algorithms, we will use theconcepts of speedup and efficiency introduced in Section 4.1.4.

MIMD Computer

A MIMD (Multiple Instructions Multiple Data) computer possesses up to severalhundreds of fairly powerful processors which communicate efficiently and havethe ability to execute a different program.

The Jacobi algorithm. First-order iterative methods present a natural setup for parallel execution. This is particularly the case for the Jacobi method,where the computation of the n equations in statement 2 in Algorithm 22, con-sists of n different and independent tasks. They can be executed on n processorsat the same time. If we consider that the solution of one equation necessitatesone time unit, the speedup for a parallel execution of the Jacobi method isT1(n)/Tn(n) = n and the efficiency is 1, provided that we have n processors atour disposal.

This potential certainly looks very appealing, if parallel execution of the Jacobialgorithm is compared to a serial execution. In practice however, Gauss-Seideliterations are often much more attractive than the Jacobi method, which, ingeneral, converges very slowly. On the contrary, the advantage of the Jacobimethod is that its convergence does not depend on particular orderings of the


equations.

The Gauss-Seidel algorithm. In the case of Gauss-Seidel iterations, thesystem of equations (4.3) defines a causal structure among the variables yk+1

i , i =1, . . . , n. Indeed, each equation i defines a set of causal relations going from theright-hand side variables yk+1

j , j = 1, . . . , i−1 to the left-hand side variable yk+1i .

This causal structure can be formalized by means of a graph G = (Y k+1, A),where the set of vertices Y k+1 = yk+1

1 , . . . , yk+1n represents the variables and

A is the set of arcs. An arc yk+1j → yk+1

i exists if the variable yk+1j appears

in the right-hand side of the equation explaining yk+1i . The way the equations

(4.3) are written1 results in a graph G without circuits, i.e. a directed acyclicgraph (DAG). This implies the existence of a hierarchy among the vertices,which means that they can be partitioned into a sequence of sets, called levels,where arcs go only from lower numbered levels to higher numbered levels andwhere there are no arcs between vertices in a same level. As a consequence, allvariables in a level can be updated in parallel. If we denote by q the numberof levels existing in the DAG, the speedup for a parallel execution of a singleiteration is Sp(n) = n

q and the efficiency is np q .

Different orderings can yield different DAGs and one might look for an orderingwhich minimizes the number of levels2 in order to achieve the highest possiblespeedup. However, such an ordering can result in a slower convergence (a largernumber of iterations) and the question of an optimal ordering certainly remainsopen.

The construction of the DAG can best be illustrated by means of a small systemof 12 equations the incidence matrix of the Jacobian matrix g′ of which is shownon the left-hand side of Figure 4.11. We then consider a decomposition L+U ofthis incidence matrix, where L is lower triangular and U is upper triangular. Thematrix on the right-hand side in Figure 4.11 has an ordering which minimizes3

the elements in U .

Matrix L then corresponds to the incidence matrix of our directed acyclic graphG presented before, and the hierarchical ordering of the vertices into levels isalso shown in Figure 4.11. We can notice the absence of arcs between vertices ina same level and, therefore, the updating of the equations corresponding to thesevertices constitutes different tasks which can all be executed in parallel. Accord-ing to the number of vertices in the largest level, which is level 3, the speedup forthis example, when using 5 processors, is S5(12) = T1(12)/T5(12) = 12/4 = 3with efficiency 12/(5 · 4) = 0.6. This definition neglects the time needed to com-municate the results from the processors which update the equations in a levelto the processors which need this information to update the equations in thenext level. Therefore, the speedup one can expect in practice will be inferior.

1A variable yk+1j in the right-hand side of the equation explaining yk+1

i always verifiesi > j.

2This would correspond to a minimum coloring of G.3In general, such an ordering achieves a good convergence of the Gauss-Seidel iterations,

see Section 3.3.


1 1 . . . . . . . . 1 1 12 . 1 . . 1 . . 1 . . . .3 . . 1 . . 1 1 1 . 1 . .4 . 1 . 1 . . 1 . . 1 . .5 1 . . 1 1 . . . 1 . . .6 . 1 . . . 1 1 . . . . .7 . . . . . . 1 . . 1 . .8 . 1 . . . . 1 1 . . . .9 . 1 . . . . . . 1 . 1 .

10 . . 1 . . . . . . 1 . .11 . 1 . . . . . . . . 1 .12 1 . . 1 . 1 . . . . . 1

1 2 3 4 5 6 7 8 9 1 1 10 1 2

10 1 . . . . . . . . 1 . .2 . 1 . . 1 . . . . . . 17 1 . 1 . . . . . . . . .

11 . 1 . 1 . . . . . . . .8 . 1 1 . 1 . . . . . . .6 . 1 1 . . 1 . . . . . .4 1 1 1 . . . 1 . . . . .1 1 . . 1 . . . 1 . . 1 .9 . 1 . 1 . . . . 1 . . .3 1 . 1 . 1 1 . . . 1 . .

12 . . . . . 1 1 1 . . 1 .5 . . . . . . 1 1 1 . . 1

1 2 7 1 8 6 4 1 9 3 1 50 1 2

level 1

level 2

level 3

level 4

1

2

3

45

6

78 9

1011

12.......................................................

...........................................................................................

..............................................................................................................................................................................................................

...........

..........................................................................................................................................................

.......................................................

...................................................................................................................................

............................................................................................................................................... ...........

......................................................................... ...........

........................................................

...................................................................

........................................................ ...........

...................................................................

................................................................................................................................

........................................................ ...........

...................................................................................................... ...........

........................................................ ...........

...................................................................

...................................................................

...............................................................................................................................................................................................................

...................................................................................................................................................................... ...........

.............................................................................................................................................

.........................................................

............................................................................. ...........

Figure 4.11: Original and ordered Jacobian matrix and corresponding DAG.

SIMD Computer

A SIMD (Single Instruction Multiple Data) computer usually has a large numberof processors (several thousands), which all execute the same code on differentdata sets stored in the processor’s memory. SIMDs are therefore data parallelprocessing machines. The central locus of control is a serial computer calledthe front end. Neighboring processors can communicate data efficiently, butgeneral interprocessor communication is associated with a significant loss ofperformance.

Single Solution. When solving a model for a single period with the Jacobi orthe Gauss-Seidel algorithm, there are not any possibilities of executing the samecode with different data sets. Only the Newton-like method offers opportunitiesfor data parallel processing in this situation. This concerns the evaluation ofthe Jacobian matrix and the solution of the linear system.4 We notice that thecomputations involved in approximating the Jacobian matrix are all indepen-dent. We particularly observe that the elements of the matrices of order n inStatements 2 and 4 in Algorithm 23 can be evaluated in a single step with aspeedup of n2. For a given row i of the Jacobian matrix, we need to evaluatethe function hi for n + 1 different data sets (Statement 3 Algorithm 23). Sucha row can then be processed in parallel with a speedup of n + 1.

4We do not discuss here the fine grain parallelization for the solution of the linear system,for which we used code from the CMSSL library for QR factorization, see [97].


Repeated Solution of a Same Model. For large models, the parallel im-plementation for the solution of a model can produce considerable speedup.However, these techniques really become attractive if a same model has to besolved repeatedly for different data sets. In econometric analysis, this is thecase for stochastic simulation, optimal control, evaluation of forecast errors andlinear analysis of nonlinear models. The extension of the solution techniques tothe case where we solve the same model many times is immediate. For first-order iterative methods, equations (4.3) of the Gauss-Seidel method for instance,become

y(k+1)ir = gi(y

(k+1)1r , . . . , y

(k+1)i−1,r , y

(k)i+1,r, . . . , y

(k)nr , Z) ,

i = 1, . . . , nr = 1, . . . , p

(4.5)

where the subscript r accounts for the different data sets. One equation gi canthen be computed at the same time for all p data sets.5 For the Jacobi method,we proceed in the same way.

In a similar fashion, the Newton-like solution can be carried out simultaneouslyfor all p different data sets. The p Jacobian matrices are represented by a 3dimensional array J , where the element Jijr represents the derivation of equationhi, with respect to variable yj , and evaluated for the r-th data set. Once again,the matrix J(i, 1 : n, 1 : p) can be computed at the same time for all elements.For the solution of the linear system, the aforementioned software from theCMSSL library can also exploit the situation, where all the p linear problemsare presented at once.

4.2.3 Practical Results

In order to evaluate the interest of parallel computing in the econometric prac-tice of model simulation, we decided to solve a real world medium-sized nonlinearmodel. Our evaluation not only focuses on speedup but also comments on theprogramming needed to implement the solution procedures on a parallel com-puter. We used the Connection Machine CM2, a so-called massively parallelcomputer, equipped with 8192 processors. The programming language used isCM Fortran. The performances are compared against serial execution on a SunELC Sparcstation.

The model we solved is a macroeconometric model of the Japanese economy,developed at the University of Tsukuba. It is a dynamic annual model, con-stituted by 98 linear and nonlinear equations and 53 exogenous variables. Thedynamic is generated by the presence of endogenous variables, lagged over oneand two periods. The Jacobian matrix of the model, put into its blockrecur-sive form, has the pattern common to most macroeconometric models, i.e. alarge fraction of the variables are in one interdependent block preceeded andfollowed by recursive equations. In our case, the interdependent block contains77 variables. Six variables are defined recursively before the block followed by15 variables which do not feed back on the block. Figure 4.12 shows the block-recursive pattern of the Jacobian matrix where big dots represent the nonzeroelements and small dots correspond to zero entries.

5Best performance will be obtained if the number of data sets equals the number of availableprocessors.


··················· · ··············································································· ·· · ··········································································· ·· ··· · ········································································ ···· ······ ······································································ ····· ······· ··································································· ······ ········· ·································································· ······· ········· ······························································· · ········ ········· ······························································· ········· ········ ····························································· ·········· ··········· ························································· ·· ··· ······· ······································································· ············ ·········· ···················································· ············· ················· ··················································· ·············· ······ ·········· ················································· · ············· ······· ··························································· ············· · ················· ················································ ················· ················· ··········································· ·············· ·· ··············· ················································ ··················· ···················· ······································ ············ ······· ························ ····································· ····················· ························ ···································· ······ ············ ···························································· ···· ······ ··········· ······················· ··································· ························ ······················· ·································· ························· ······················· ································· ········ ········· ······· ························································ ··························· ······················ ······························ ········· ·················· ························ ····························· ······· ····················· ··· ················································· ····· ···· ··················· ···················································· ······························· ·············· ······· ···························· ································ ······················ ··························· ············· ·················· ················································ ············· ··················· ····················· ·························· ····················· ············· ··············································· ············ ······················· ···················· ························ ·········· ·························· ····················· ······················· ····································· ············································ ······· ···· ········ ················ ····· ········· ···· ····················· ··································· ·· ·········································· ········································· ···················· ··················· ····················· ···················· ····················· ·················· ···································· ·· ·· ···················· ················· ············· ······························ ····················· ················ · ····· ·································· ····················· ··············· ···················· ························· ···································· ···················· ·························· ··································· ············· ··················· ············· ·································· ············· ·················· ·············· ································· ···················· ····························· ································ ······ ········· ·································· ······························· ··················································· ······························ ············· ·················· ·················· ····························· ······················································ ············· ·············· ································· ····················· ···· ······················ ················· ······································ ··· ······················ ············ ············································ ··· ····················· ······················································· ·· ························ ····· ····················································· ······················· ················· ·········································· ········ ············· ····································· ······················· ····················· ····································· ······················· ···················· ··················· ····················· ····················· ··················· ································································ ····· ············ ············ ···················································· ················· ············································· ···················· ················ ······ ···························································· ··· ··········· ·································································· · ·············· ················· ··················································· ············· ······································································ · ········· ···················································· ·················· ··········· ········································································ · ······· ························· ··············································· ·· ······ ······················· ··· ·············································· ······· ············ ··················································· ·········· ······ ························ ·········· ····· ··············· ·················· ······ ·········································································· ······· ····· ············································································· ·········· ··································································· ······ ··················· ······ ··················································· ······· ························· ·················································· · ········ ··························· ······································· ·········· ········· ··································· ····· ··············· ···················· ·········· ··································· ··· ····································· ··········· ···································································· ········· ············ ··········································································· ·· ············· ··········································································· · ·············· ·································································· ··········· ······· ······· ············································· ································ ······· ·· ······ ··········································· ·································· ·········· ······ ·············································································· ·········· · ······ ············· ······························ ·············· ····· ··········· ·········· · ·

Figure 4.12: Blockrecursive pattern of the model’s Jacobian matrix.

The model’s parameters have been estimated on data going back to 1972 and wesolved the model for the 10 periods from 1973 to 1982, for which the Jacobi andGauss-Seidel methods did converge. The average number of iterations neededfor the interdependent block to converge and the total execution time on a SunELC Sparcstation, which is rated with 24 Mips and 3.5 MFlops, is reported inthe Table 4.2.

Method Average iterations Execution time (seconds)

Jacobi 511 4.5

Gauss-Seidel 20 0.18

Table 4.2: Execution times of Gauss-Seidel and Jacobi algorithms.

Execution on a MIMD Computer

In the following, the theoretical performance of a parallelization of the solutionalgorithms is presented. According to the definition given above, we assume thatevery step in the algorithms needs one time unit, independently from whetherit is executed on a single or on a multiple processor machine. This, of course,neglects not only the difference existing between the performance of processors


but also the communication time between the processors.

From the Jacobian matrix in Figure 4.12 we can count respectively 5, 1, 77, 10,2, 2, 1 variables in the seven levels from top to bottom. As the average numberof iterations for the interdependent block is 511 for the Jacobi algorithm, a oneperiod solution of the model necessitates on average 5+1+511∗77+10+2+2+1 =39368 equations to be evaluated, which, on a serial machine, is executed in39368 time units. On a MIMD computer, this task could be executed with 77processors in 1 + 1 + 511 + 1 + 1 + 1 + 1 = 517 time units, which results in aspeedup of 39368/517 = 76.1 and an efficiency of 76.1/77 = .99.

For the Gauss-Seidel algorithm, the average number of equations to solve for oneperiod is 5+1+20∗77+10+2+2+1 = 1561, which then necessitates 1561 timeunits for serial execution. The order in which the equations are solved has beenestablished by choosing a decomposition L + U of the Jacobian matrix whichminimizes the number of nonzero columns in matrix U . Figure 4.13 reproducesthe incidence matrix of matrix L, which shows 18 levels and where big and smalldots have the same meaning as in Figure 4.12.

····································· ····· ··· ····· ····· ···· ······· ··· ········ ··· ········ · ·· ··········· ·· ············ ·· ········· ····· · ······ ········· ··· ···· ······· ·· ···· ··· ······· ··· ····· ·· ············ ······ · ············· ······ · ·············· ······ · ··············· ········ ········· ····· ········ · ······ ········· ········ ·· ····· ·········· ········ ····· ·· ··········· ······· ······ ··········· ········ ········ ············ ········ ····· ·· ·· ·· ········· ········ ········ ··· · ········· ········ ········ ··· · ··········· ········ ········ ···· ··········· ········ ········ ····· · ·· ·· ······ ········ ········ ····· ····· ······ ········ ········ · ··· ···· · ········· ··· ···· ········ ····· ···· · ·········· ········ ········ ····· ···· ·· ··· ······ ········ ········ ····· ···· · · · ·· ······ ········ ········ ····· ···· · · ··· ······ ········ ········ ····· ···· · · · ···· ······ ········ ········ ····· ···· · · · ····· ······ ········ ········ ····· ···· · · ······ ······ ········ ········ ····· ···· · · · ·············· ········ ········ ····· ···· ·· ··· ··· ··········· ········ ········ ····· ···· ·· ··· ···· ·· ········· ········ ········ ····· ···· ·· ··· ····· · ·········· ········ · ······ ····· ···· ·· ··· ······ ··········· ·· ····· ········ ····· ···· ·· ··· ······ ············ ········ ········ ····· ···· ·· ··· ··· ··· ·········· · ········ ··· ···· ····· ·· · ·· ··· ······· · ··· ········· ········ ········ ····· ···· ·· ··· ······· ··· · ····· · ·· ·· · ·· · ······ ··· · ···· ·· ··· ······· ··· · ··········· ········ ········ ····· ···· ·· ··· ······· ··· · ············ ········ ········ ····· ···· ·· ··· ······· ····· ············ ········ ········ ····· ···· ·· ··· ······· ·· ·· · ··· ········· ········ ········ ··· · ···· ·· ··· ······ ····· ·· ·· ·········· ····· ·· ········ ····· ·· · ·· ··· ······· ····· ··· · ········ ·· ········ ··· ···· ····· ···· ·· ··· ······· ····· ···· ············ ········ ········ ····· ···· ·· ··· ······· ····· ····· ············ ········ ······ · ····· ···· ·· ··· ······· ····· ···· ···· ····· ··· ····· ·· ········ ····· ·· · ·· ··· ······· ····· ··· · ···· ·········· ······· ········ ····· ··· ·· ·· ······· ····· ··· · ···· ··········· ········ ········ ····· ···· ·· ··· ······· ···· ····· ····· ··········· ········ ········ ···· ···· ·· ··· ······· ····· ····· ····· · ·· ······· · ········ ········ ····· ··· ·· ·· ······· ····· ····· ··· · ··· ·········· ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· · ······· · ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ·· ········· ········ ········ ····· ···· ·· ··· · ····· ····· ····· ····· ···· ·· ·········· ······ · ········ ··· · ···· ·· ··· · ····· ····· ····· ····· ···· ·· ········ · ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ··········· ········ ···· ··· ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· · ·· ········· ········ ········ ····· ···· ·· ··· · ····· ····· ····· ····· ···· ··· ·· · ·········· ·· ····· ········ ····· ···· ·· ··· ······· ···· ····· ····· ···· ··· ·· ··········· ········ ········ ····· ···· ·· ··· ·· ···· ····· ····· ····· ···· ··· ··· ············ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· · · ···· ············ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ···· ··· · ········· ········ ········ ····· ···· ·· ··· ······· ····· ····· · ··· ···· ··· ···· ····· ········· ·· ····· ········ ····· ···· ·· ·· ······· ···· ····· ····· ···· ··· ·· ····· · ········· ·· ····· ········ ····· ··· ·· ··· ······· ····· ····· ····· ···· ·· ·· ····· · ·········· ········ ········ ····· ···· ·· ··· ······ ····· ····· ····· ···· ··· ···· ····· · ··········· ········ ········ ····· ···· ·· ·· ······· ····· ····· ····· ···· ··· ···· ····· · ············ ········ ········ ····· ···· ·· ··· ······· ····· ····· ····· ···· ··· ···· ····· ·· ············ ········ ········ ···· ···· ·· ··· ······· · · ····· ····· · ·· ··· ···· ·· · ·· ···· ·

Figure 4.13: Matrix L for the Gauss-Seidel algorithm.

The maximum number of variables in a level of this incidence matrix is 8. Thesolution of the model on a MIMD computer then necessitates 1 + 1 + 20 ∗ 18 +1 + 1 + 1 + 1 = 366 time units, giving a speedup of 1561/366 = 4.2 and anefficiency of 4.2/8 = .52 .

In serial execution, the Gauss-Seidel is 25 times faster than the Jacobi, but whenthese two algorithms are executed on a MIMD computer, this factor reduces to1.4 . We also see that, for this case, the parallel Jacobi method is 3 times faster


than the serial Gauss-Seidel method, which means that we might be able tosolve problems with the Jacobi method in approximatively the same amount oftime as with a serial Gauss-Seidel method.

If the model’s size increases, the solution on MIMD computers becomes moreand more attractive. For the Jacobi method, the optimum is obviously reachedif the number of equations in a level equals the number of available processors.For the Gauss-Seidel method, we observe that the ratio between the size ofmatrix L and the number of its levels has a tendency to increase for largermodels. We computed this ratio for two large macroeconometric models, MPSand RDX2.6 For the MPS model, the largest interdependent block is of size 268and has 28 levels, giving a ratio of 268/28 = 9.6 . For the RDX2 model, thisratio is 252/40 = 6.3 .

Execution on a SIMD Computer

The situation where we solve repeatedly the same model is ideal for the CM2SIMD computer. Due to the array-processing features implemented in CMFORTRAN, which naturally map onto the data parallel architecture7 of theCM2, it is very easy to get an efficient parallelization for the steps in our algo-rithms which concern the evaluation of the model’s equations, as it is the casein Statement 2 in Figure 22 for the first-order algorithms.

Let us consider one of the equations of the model, which, coded in FORTRANfor a serial execution, looks like

y(8) = exp(b0+ b1 ∗ y(15) + b3 ∗ log(y(12) + b4 ∗ log(y(14))) .

If we want to perform p independent solutions of our equation, we define thevector y to be a two-dimensional array y(n,p), where the second dimensioncorresponds to the p different data sets. As CM FORTRAN treats arrays asobjects, the p evaluations for the equation given above, is simply coded asfollows:

y(8, :) = exp(b0+ b1 ∗ y(15, :) + b3 ∗ log(y(12, :) + b4 ∗ log(y(14, :)))

and the computations to evaluate the p components of y(8,:) are then executedon p different processors at once.8 To instruct the compiler that we want the pdifferent data sets, corresponding to the columns of matrix y in the memoriesof p different processors we use a compiler directive.9

Repeated solutions of the model have then been experienced in a sensitivityanalysis, where we shocked the values of some of the exogenous variables andobserved the different forecasts. The empirical distribution of the forecasts then

6See Helliwell et al. [63] and Brayton and Mauskopf [19].7The essence of the CM system is that it stores array elements in the memories of separate

processors and operates on multiple elements at once. This is called data parallel processing.For instance, consider a n×m array B and the statement B=exp(B). A serial computer wouldexecute this statement in nm steps. The CM machine, in contrast, provides a virtual processorfor each of the nm data elements and each processor needs to perform only one computation.

8The colon indicates that the second dimension runs over all columns.9The statement layout y(:serial,:news) instructs the compiler to lay out the second

dimension of array y across the processors.


provides an estimate for their standard deviation. The best results, in terms ofspeedup, will be obtained if the number of repeated solutions equals the numberof processors available in the computer.10 We therefore generated 8192 differentsets of exogenous variables and produced the corresponding forecasts.

The time needed to solve the model 8192 times for the ten time periods withthe Gauss-Seidel algorithm is reported in Table 4.3, where we also give the timespent in executing the different statements.

CM2 Sun ELC

Statements seconds seconds Speedup

1. y0 = y1 1.22 100

2. Evaluate all equations 5.44 599

3. not converged = any(|y1 − y0|/(|y0| + 1) > η) 13.14 366

Total time 22.2 1109 50

Modified algorithm 12.7 863 68

Table 4.3: Execution time on CM2 and Sun ELC.

At this point, the ratio of the total execution times gives us a speedup of ap-proximatively 50, which, we will see, can be further improved.

We observe that the execution of Statement 3 on the CM2 takes more thantwice the time needed to evaluate all equations. One of the reasons is that thisstatement involves communication between the processors. Statements 1 and 3together, which in the algorithm serve exclusively to check for convergence, needapproximatively as much time as two iterations over all equations. We thereforesuggest a modified algorithm, where the first test for convergence takes placeafter k1 iterations and all subsequent tests every k2 iterations.11

Algorithm 24 Modified First-order Iterative Methods

do i = 1 : k1, Evaluate all equations, enddodo while ( not converged )

1. y0 = y1

2. do i = 1 : k2, Evaluate all equations, enddo3. not converged = any(|y1 − y0|/(|y0| + 1) > η)

enddo

According to the results reported in the last row of Table 4.3, such a strategyincreases the performance of the execution on both machines, but not in equalproportions, which then leads to a speedup of 68.

We also solved the model with the Newton-like algorithm on the Sun and CM2.We recall, what we already mentioned in Section 4.2.1, that for sparse Jacobianmatrices like ours, the Newton-like method is, in general, only applied to the

10If the data set is larger then the set of physical processors, each processor processes morethan one data set consecutively.

11The parameters k1 and k2 depend, of course, on the particular problem and can be guessedin preliminary executions. For our application, we choose k1 = 20 and k2 = 2 for the executionon the CM2, and k1 = 20, k2 = 1 for the serial execution.


subset of spike variables. However, the cardinality of the set of spike variablesfor the interdependent block is five and a comparison of the execution times forsuch a small problem would be meaningless. Therefore, we solved the completeinterdependent block without any decomposition, which certainly is not a goodstrategy.12 The execution time concerning the main steps of the Newton-likealgorithm needed to solve the model for ten periods is reported in Table 4.4.

Seconds

Statements Sun CM2

2. X = hI + ι y0′ 0.27 0.23

3. for i = 1 : n, evaluate aij = fi(x.j , z), j = 1 : n and fi(y0, z) 2.1 122

4. J = (A − ι F (y0, z)′)/h 1.1 0.6

5. solve J s = F (y0, z) 19.7 12

Table 4.4: Execution time on Sun ELC and CM2 for the Newton-like algorithm.

In order to achieve the numerical stability of the evaluation of the Jacobian ma-trix, Statements 2, 3 and 4 have to be computed in double precision. Unfortu-nately, the CM2 hardware is not designed to execute such operations efficiently,as the evaluation of an arithmetic expression in double precision is about 60times slower as the same evaluation in single precision. This inconvenience doesnot apply to the CM200 model. By dividing the execution time for Statement 3by a factor of 60, we get approximatively the execution time for the Sun. Sincethe Sun processor is about 80 times faster than a processor on the CM2 andsince we have to compute 78 columns in parallel, we therefore just reached thepoint from which the CM200 would be faster. Statements 2 and 4 operate onn2 elements in parallel and therefore their execution is faster on the CM2. Fromthese results, we conclude that the CM2 is certainly not suited for data paral-lel processing in double precision. However, with the CM200 model significantspeedup will be obtained if the size of the model to be solved becomes superiorto 80.

12This problem has, for instance, been discussed in Gilli et al. [52]. The execution time forthe Newton method can therefore not be compared with the execution time for the first-orderiterative algorithms.

Chapter 5

Rational ExpectationsModels

Nowadays, many large-scale models explicitly include forward expectation vari-ables that allow for a better accordance with the underlying economic theoryand also provide a response to the Lucas critique. These ongoing efforts gaverise to numerous models currently in use in various countries. Among others,we may mention MULTIMOD (Masson et al. [79]) used by the InternationalMonetary Fund and MX-3 (Gagnon [41]) used by the Federal Reserve Board inWashington; model QPM (Armstrong et al. [5]) from the Bank of Canada; modelQuest (Brandsma [18]) constructed and maintained by the European Commis-sion; models MSG and NIGEM analyzed by the Macro Modelling Bureau at theUniversity of Warwick.

A major technical issue introduced by forward-looking variables is that the sys-tem of equations to be solved becomes much larger than in the case of conven-tional models. Solving rational expectation models often constitutes a challengeand therefore provides an ideal testing ground for the solution techniques dis-cussed earlier.

5.1 Introduction

Before the more recent efforts to explicitly model expectations, macroeconomicmodel builders have taken into account the forward looking behavior of agentsby including distributed lags of the variables in their models. This actuallycomes down to supposing that individuals predict a future or current variableby only looking at past values of the same variable. Practically, this explainseconomic behavior relatively well if the way people form their expectations doesnot change.

These ideas have been explored by many economists, which has lead to theso-called “adaptive hypothesis.” This theory states that the individual’s expec-tations react to the difference between actual values and expected values of thevariable in question. If the level of the variable is not the same as the forecast,

5.1 Introduction 83

the agents use their forecasting error to revise their expectation of the nextperiod’s forecast. This setting can be summarized in the following equation1

xet − xe

t−1 = λ(xt−1 − xet−1) with 0 < λ ≤ 1 ,

where xet is the forecasted and non observable value of variable x at period t.

By rearranging terms we get

xet = λxt−1 + (1− λ)xe

t−1 ,

i.e. the forecasted value for the current period is a weighted average of the truevalue of x at period t− 1 and the forecasted value at period t− 1. Substitutinglagged values of xe

t when λ = 1 in this expression yields to

xet = λ

∞∑i=1

(1− λ)i−1xt−i .

The speed at which the agents adjust is determined by parameter λ.

In this framework, individuals constantly underpredict the value of the vari-able whenever the variable constantly increases. This behavior is labelled “notrational” since agents make systematically errors in their expected level of x.It is therefore argued that individuals would not behave that way since theycan do better. As many other economists, Lucas [77] criticized this assumptionand proposed models consistent with the underlying idea that agents do theirbest with the information they have. The “rational expectation hypothesis” isan alternative which takes into account such criticisms. We assume that indi-viduals use all information efficiently and do not systematically make errors intheir predictions. Agents use the knowledge they have to form views about thevariables entering their model and produce expectations of the variables theywant to use to make their decisions.

This role of expectations in economic behavior has been recognized for a longtime. The explicit use of the term “rational expectations” goes back to the workof Muth in 1961 as reprinted in Lucas and Sargent [75]. The rational expec-tations are the true mathematical expectations of future and current variablesconditional on the information set available to the agents. This information setat date t− 1 contains all the data on the variables of the model up to the endof period t− 1, as well as the relevant economic model itself.

The implications of the rational expectation hypothesis is that the policies thegovernment may try to impose are ineffective since agents forecast the changein the variables using the same information basis. Even in the short term,systematic interventions of monetary or fiscal policies only affect the generallevel of prices but not the real output or employment.

A criticism often made against this hypothesis is that obtaining the informationis costly and therefore not all information may be used by the agents. Moreovernot all agents may build a good model to reach their forecasts. An answer tothese points is that even if not all individuals are well informed and producegood predictions, some individuals will. An arbitrage process can therefore

1The model is sometimes expressed as xet − xe

t−1 = λ(xt − xet−1) . In this case, the expec-

tation formation is somewhat different but the conclusions are similar for our purposes.

5.1 Introduction 84

take place and the agents who have the correct information can make profitsby selling the information to the ill-informed ones. This process will lead theeconomy to the rational expectation equilibrium.

Fair [33] summarizes the main characteristics of rational expectation (RE) mod-els such as those of Lucas [76], Sargent [91] and Barro [9] in the three followingpoints,

• the assumption that expectations are rational, i.e. consistent with themodel,

• the assumption that information is imperfect regarding the current stateof the economy,

• the postulation of an aggregate supply equation in which aggregate supplyis a function of exogenous terms plus the difference between the actual andthe expected price level.

These characteristics imply that government actions have an influence on thereal output only if these actions are unanticipated. The second assumptionallows for an effect of government actions on the aggregate supply since thedifference between the actual and expected price levels may be influenced. Asystematic policy is however anticipated by the agents and cannot affect theaggregate supply.

One of the key conclusions of new-classical authors is that existing macroecono-metric models were not able to provide guidance for economic policy.

This critique has lead economists to incorporate expectations in the relationshipsof small macroeconomic models as Sargent [92] and Taylor [96]. By now thisidea has been adopted as a part of new-classical economics.

The introduction of the rational expectation hypothesis opened new researchtopics in econometrics that has given rise to important contributions. Accordingto Beenstock [12, p. 141], the main issues of the RE hypothesis are the following:

• Positive economics, i.e. what are the effects of policy interventions if ra-tional expectations are assumed?

• Normative economics, i.e. given that expectations are rational, ought thegovernment take policy actions?

• Hypothesis testing, i.e. does evidence in empirical studies support therational expectations hypothesis? Are such models superior to otherswhich do not include this hypothesis?

• Techniques, i.e. how does one solve a model with rational expectations?What numerical issues are faced? How can we efficiently solve large modelswith forward-looking variables?

These problems, particularly the first three, have been addressed by many au-thors in the literature.

New procedures for estimation of the model’s parameters were set forth in Mc-Callum [80], Wallis [101], Hansen and Sargent [62], Wickens [102], Fair and

5.1 Introduction 85

Taylor [34] and Nijman and Palm [84]. The presence of forward looking vari-ables also creates a need for new solution techniques for simulating such models,see for example Fair and Taylor [34], Hall [61], Fisher and Hughes Hallett [37],Laffargue [74] and Boucekkine [17].

In the next section, we introduce a formulation of RE models that will be usedto analyze the structure of such models. A second section discusses issues ofuniqueness and stability of the solutions.

5.1.1 Formulation of RE Models

Using the conventional notation (see for instance Fisher [36]) a dynamic modelwith rational expectations is written

fi(yt, yt−1, . . . , yt−r, yt+1|t−1, . . . , yt+h|t−1, zt) = 0 , i = 1, . . . , n , (5.1)

where yt+j|t−1 is the expectation of yt+j conditional on the information avail-able at the end of period t − 1, and zt represents the exogenous and randomvariables. For consistent expectations, the forward expectations yt+j|t−1 haveto coincide with the next period’s forecast when solving the model conditionalon the information available at the end of period t− 1. These expectations aretherefore linked forward in time and, solving model (5.1) for each yt conditionalon some start period 0 requires each yt+j|0, for j = 1, 2, . . . , T−t, and a terminalcondition yT+j|0, j = 1, . . . , h.

We will now consider that the system of equations contains one lag and onelead, that is, we set r = 1 and h = 1 in (5.1) so as to simplify notation. Inorder to discuss the structure of our system of equations, it is also convenientto resort to a linearization. System (5.1) becomes therefore

Dtyt + Et−1yt−1 + At+1yt+1|t−1 = zt (5.2)

where we have Dt = ∂f∂y′

t, Et−1 = − ∂f

∂y′t−1

and At+1 = − ∂f∂y′

t+1|t−1. Stacking up

the system for period t + 1 to period t + T we get

Et+0 Dt+1 At+2

Et+1 Dt+2 At+3

. . .. . .

. . .

Et+T−2 Dt+T−1 At+T

Et+T−1 Dt+TAt+T+1

yt+0

yt+1

yt+2

...yt+T−1

yt+T

yt+T+1

=

zt+1

zt+2

...zt+T−1

zt+T

(5.3)

and a consistent expectations solution to (5.3) is then obtained by solving

Jy = b ,

5.1 Introduction 86

where J is the boxed matrix in Equation (5.3), y = [yt+1 . . . yt+T ]′ and

b =

zt+1

...zt+T

+

−Et+0

...0

yt+0 +

0

...−At+T+1

yt+T+1

are the stacked vectors of endogenous respectively exogenous variables whichcontain the initial condition yt+0 and the terminal conditions yt+T+1.

5.1.2 Uniqueness and Stability Issues

To study the dynamic properties of RE models, we begin with the standard pre-sentation of conventional models containing only lagged variables. The struc-tural form of the general linear dynamic model is usually written as

B(L)yt + C(L)xt = ut ,

where B(L) and C(L) are matrices of polynomials in the lag operator L and ut

a vector of error terms. The matrices are respectively of size n × n and n× g,where n is the number of endogenous variables and g the number of exogenousvariables, and defined as

B(L) = B0 + B1L + · · ·+ BrLr , C(L) = C0 + C1L + · · ·+ CsL

s .

The model is generally subject to some normalization rule i.e. diag(B0) =[1 1 . . . 1]. The dynamic of the model is stable if the determinantal polyno-mial |B(z)| = det(

∑rj=0 Bjz

j) has all its roots λi , i = 1, 2, . . . , nr outside theunit circle. The endogenous variables yt can be expressed as

yt = −B(L)−1C(L)xt + B(L)−1ut . (5.4)

Assuming the stability of the system, the distributed lag on the exogenous vari-ables xt is non-explosive. If there exists at least one root with modulus less thanunity, the solution for yt is explosive. Wallis [100] writes Equation (5.4) using|B(L)| the determinant of the matrix polynomial B(L) and b(L) the adjointmatrix of B(L)

yt = − b(L)|B(L)|C(L)xt +

b(L)|B(L)|ut . (5.5)

To study stability, let us write the following polynomial in factored form

|B(z)| =nr∑i=1

βizi = βnr

nr∏i=1

(z − λi) , (5.6)

where λi , i = 1, 2, . . . , nr are the roots of |B(z)| and βnr is the coefficient ofthe term znr. Assuming βnr = 0, the polynomial defined in (5.6) has the sameroots as the polynomial

nr∏i=1

(z − λi) . (5.7)

5.1 Introduction 87

When β1 = 0, none of the λi , i = 1, . . . , nr can be zero. Therefore, we mayexpress a polynomial with the same roots as (5.7) by the following expression

nr∏i=1

(1− z

λi) =

nr∏i=1

(1− µiz) with µi = 1/λi . (5.8)

Assuming that the model is stable, i.e. that |λi| > 1 , i = 1, 2, . . . , nr, we havethat |µi| < 1 , i = 1, 2, . . . , nr.

We now consider the expression 1/|B(z)| that arises in (5.5) and develop theexpansion in partial fractions of the ratio of polynomials 1/

∏(1 − µiz) (the

numerator being the polynomial ‘1’) that has the same roots as 1/|B(z)|:

1∏nri=1(1 − µiz)

=nr∑i=1

(ci

(1 − µiz)

). (5.9)

For a stable root, say |λ| > 1 (|µ| < 1), the corresponding term of the righthand side of (5.9) may be expanded into a polynomial of infinite length

1(1− µz)

= 1 + µz + µ2z2 + · · · (5.10)

= 1 + z/λ + z2/λ2 + · · · . (5.11)

This corresponds to an infinite series in the lags of xt in expression (5.5). In thecase of an unstable root, |λ| < 1 (|µ| > 1), the expansion (5.11) is not defined,but we can use an alternative expansion as in [36, p. 75]

1(1 − µz)

=−µ−1z−1

(1− µ−1z−1)(5.12)

= − (λz−1 + λ2z−2 + · · ·) . (5.13)

In this formula, we get an infinite series of distributed leads in terms of Equa-tion (5.5) by defining L−1xt = Fxt = xt+1. Therefore, the expansion is depen-dent on the future path of xt. This expansion will allow us to find conditionsfor the stability in models with forward looking variables.

Consider the model

B0yt + A(F )yt+1|t−1 = C0xt + ut , (5.14)

where for simplicity no polynomial lag is applied to the current endogenous andexogenous variables. As previously, yt+1|t−1 denotes the conditional expectationof variable yt+1 given the information set available at the end of period t − 1.The matrix polynomial in the forward operator F is defined as

A(F ) = A0 + A1F + · · ·+ AhFh ,

and has dimensions n×n. The forward operator affects the dating of the variablebut not the dating of the information set so that

Fyt+1|t−1 = yt+2|t−1 .

5.1 Introduction 88

When solving the model for consistent expectations, we set yt+1|t−1 to the so-lution of the model, i.e. yt+i. In this case, we may rewrite model (5.14) as

A(F )yt = C0xt + ut , (5.15)

with A(F ) = B0 + A(F ). The stability of the model is therefore governed bythe roots γi of the polynomial |A(z)|. When for all i we have |γi| > 1, themodel is stable and we can use an expansion similar to (5.11) to get an infinitedistributed lead on the exogenous term xt. On the other hand, if there existunstable roots |γi| < 1 for some i, we may resort to expansion (5.13) whichyields an infinite distributed lag.

As exposed in Fisher [36, p. 76], we will need to have |γi| > 1 , ∀i in order toget a unique stable solution. To solve (5.15) for yt+i , i = 1, 2, . . . , T we mustchoose values for the terminal conditions yt+T+j , j = 1, 2, . . . , h. The solutionpath selected for yt+i , i = 1, 2, . . . , T depends on the values of these terminalconditions. Different conditions on the yt+T+j , j = 1, 2, . . . , h generate differ-ent solution paths. If there exists an unstable |γi| < 1 for some i, then partof the solution only depends on lagged values of xt. One may therefore freelychoose some terminal conditions by selecting values of xt+T+j , j = 1, . . . , h,without changing the solution path. This hence would allow for multiple stabletrajectories. We will therefore usually require the model to be stable in orderto obtain a unique stable path.

The general model includes lags and leads of the endogenous variables and maybe written

B(L)yt + A(F )yt+1|t−1 + C(L)xt = ut ,

or equivalentlyD(L, F )yt + C(L) = ut ,

where D(L, F ) = B(L)+A(F ) and we suppose the expectations to be consistentwith the model solution. Recalling that L = F−1 in our notation, the stabilityconditions depend on the roots of the determinantal polynomial |D(z, z−1)|. Toobtain these roots, we resort to a factorization of matrix D(z, z−1). When thereis no zero root, we may write

D(z, z−1) = U(z)W V (z−1) ,

where U(z) is a matrix polynomial in z, W is a nonsingular matrix and V (z−1)is a matrix polynomial in z−1. The roots of |D(z, z−1)| are those of |U(z)| and|V (z−1)|. Since we assumed that there exists no zero root, we know that if|V (z−1)| = v0 + v1z

−1 + · · · + vhz−h has roots δi , i = 1, . . . , h then |V (z)| =vh + vh−1z + · · · + v0z

h has roots 1/δi , i = 1, . . . , h. The usual stabilitycondition is that |U(z)| and |V (z)| must have roots outside the unit circle.This, therefore, is equivalent to saying that |U(z)| has roots in modulus greaterthan unity, whereas |V (z−1)| has roots less than unity.

With these conditions, the model must have as many unstable roots as there areexpectation terms, i.e. h. In this case, we can define infinite distributed lags andleads that allow the selection of a unique stable path of yt+j , j = 1, 2, . . . , Tgiven the terminal conditions.

5.2 The Model MULTIMOD 89

5.2 The Model MULTIMOD

In this section, we present the model we used in our solution experiments. First,a general overview of the model is given in Section 5.2.1; then, the equations thatcompose an industrialized country are presented in Section 5.2.2. Section 5.2.3briefly introduces the structure of the complete model.

5.2.1 Overview of the Model

MULTIMOD (Masson et al. [79]) is a multi-region econometric model developedby the International Monetary Fund in Washington. The model is available uponrequest and is therefore widely used for academic research.

MULTIMOD is a forward-looking dynamic annual model and describes the eco-nomic behavior of the whole world decomposed into eight industrial zones andthe rest of the world. The industrial zones correspond to the G7 and a zonecalled “small industrial countries” (SI), which collects the rest of the OECD. Therest of the world comprises two developing zones, i.e. high-income oil exporters(HO) and other developing countries (DC). The model mainly distinguishesthree goods, which are oil, primary commodities and manufactures. The short-run behavior of the agents is described by error correction mechanisms withrespect to their long-run theory based equilibrium. Forward-looking behavior ismodeled in real wealth, interest rates and in the price level determination.

The specification of the models for industrialized countries is the same andconsists of 51 equations each. These equations explain aggregate demand, taxesand government expenditures, money and interest rates, prices and supply, andinternational balances and accounts.

Consumption is based on the assumption that households maximize the dis-counted utility of current and future consumption subject to their budget con-straint. It is assumed that the function is influenced by changes in disposableincome wealth and real interest rates. The demand for capital follows Tobin’stheory, i.e. the change in the net capital stock is determined by the gap betweenthe value of existing capital and its replacement cost. Conventional import andexport functions depending upon price and demand are used to model trade.This makes the simulation of a single country model meaningful. Trade flows forindustrial countries are disaggregated into oil, manufactured goods and primarycommodities. Government intervention is represented by public expenditures,money supply, bonds supply and taxes. Like to most macroeconomic models,MULTIMOD describes the LM curve by conventional demand for money bal-ances and a money supply consisting in a reaction function of the short-terminterest rate to a nominal money target, except for France, Italy and the smallerindustrial countries where it is assumed that Central Banks move short-terminterest rates to limit movements of their exchange rates with respect to theDeutsche Mark. The aggregate supply side of the model is represented by an in-flation equation containing the expected inflation rate. A set of equation coversthe current account balance, the net international asset or liability position andthe determination of exchange rates.

As already mentioned, the rest of the world is aggregated into two regions, i.e.


capital exporting countries (mainly represented by high-income oil exporters)and other developing countries. The main difference between the developingcountry model and other industrial countries is that the former is finance con-strained and therefore its imports and its domestic investment depend on itsability to service an additional debt. The output of the developing region isdisaggregated into one composite manufactured good, oil, primary commoditiesand one nontradable good. The group of high income oil exporters do not faceconstraints on their balance of payments financing and have a different structureas oil exports constitute a large fraction of their GNP.

Many equations are estimated on pooled data from 1965 to 1987. As a result, theparameters in many behavioral equations are constrained to be the same acrossthe regions except for the constant term. The MULTIMOD model is designedfor evaluating effects of economic policies and other changes in the economicenvironment around a reference path given exogenously. Thus, the model isnot designed to provide so-called baseline forecasts. The model is calibratedthrough residuals in regression equations to satisfy a baseline solution.

5.2.2 Equations of a Country Model

Hereafter, we reproduce the set of 51 equations defining the country model ofJapan. The country models for the other industrial zones (CA, FR, GR, IT,UK, US, SI) all have the same structure. Variables are preceded by a two-letterlabel of indicating the country the variables belong to. These labels are listed inTable 5.1. The model for capital exporting countries (HO) contains 10 equationsand the one for the other developing countries 33. A final set of 14 equationsdescribes the links between the country models.

Label CA FR GR IT

Country Canada France Germany ItalyLabel JA UK US SI

Country Japan United Kingdom United States Small IndustrialLabel HO DC

Country Capital Exporting Other Developing

Table 5.1: Labels for the zones/countries considered in MULTIMOD.

Each country’s coefficients are identified by a one-letter prefix that is the firstletter of the country name (except for the United Kingdom where the symbol Eis used). The notation DEL(n : y) defines the n-th difference of variable y, e.g.DEL(1:y) = y − y(−1).

JC: DEL(1:LOG(JA_C)) = JC0 + JC1*LOG(JA_W(-1)/JA_C(-1))+JC2*JA_RLR +JC3*DEL(1:LOG(JA_YD)) +JC4*JA_DEM3 +JC5*DUM80 + RES_JA_C

JCOIL: DEL(1:LOG(JA_COIL)) = JCOIL0 + JCOIL1*DEL(1:LOG(JA_GDP)) +JCOIL2*DEL(1:LOG(POIL/JA_PGNP)) +JCOIL3*LOG(POIL(-1)/JA_PGNP(-1)) +JCOIL4*LOG(JA_GDP(-1)/JA_COIL(-1)) + RES_JA_COIL

JK: DEL(1:LOG(JA_K)) = JK0 + JK1*LOG(JA_WK/JA_K(-1)) +JK2*LOG(JA_WK(-1)/JA_K(-2)) + RES_JA_K

JINV: JA_INVEST = DEL(1:JA_K) + JA_DELTA*JA_K(-1) + RES_JA_INVEST


JXM: DEL(1:LOG(JA_XM)) = JXM0 + JXM1*DEL(1:JA_REER) +JXM2*DEL(1:LOG(JA_FA)) +JXM3*LOG(JA_XM(-1)/JA_FA(-1)) +JXM4*JA_REER(-1) +JXM5*TME +JXM6*TME**2 + RES_JA_XM

JXA: JA_XMA = JA_XM + T1*(WTRADER-TRDER)JXT: JA_XT = JA_XMA + JA_XOILJIM: DEL(1:LOG(JA_IM)) = JIM0 +

JIM1*DEL(1:JIM7*LOG(JA_A) + (1 - JIM7)*LOG(JA_XMA)) +JIM2*DEL(1:LOG(JA_PIMA/JA_PGNPNO)) +JIM3*LOG(JA_PIMA(-1)/JA_PGNPNO(-1)) +JIM4*(UIM7*LOG(JA_A(-1))+(1 - UIM7)*LOG(JA_XMA(-1)) - LOG(JA_IM(-1))) +JIM5*TME +JIM6*TME**2 + RES_JA_IM

JIOIL: JA_IOIL = JA_COIL + JA_XOIL - JA_PRODOIL + RES_JA_IOILJIC: DEL(1:LOG(JA_ICOM)) = JIC0 + JIC2*DEL(1:LOG(PCOM/JA_ER/JA_PGNP)) +

JIC1*DEL(1:LOG(JA_GDP)) +JIC3*LOG(PCOM(-1)/JA_ER(-1)/JA_PGNP(-1)) +JIC4*LOG(JA_GDP(-1)) +JIC5*LOG(JA_ICOM(-1)) + RES_JA_ICOM

JIT: JA_IT = JA_IM + JA_IOIL + JA_ICOMJA: JA_A = JA_C + JA_INVEST + JA_G + RES_JA_AJGDP: JA_GDP = JA_A + JA_XT - JA_IT + RES_JA_GDPJGNP: JA_GNP = JA_GDP + JA_R*(JA_NFA(-1)+JA_NFAADJ(-1))/JA_PGNP + RES_JA_GNPJW: JA_W = JA_WH + JA_WK + (JA_M + JA_B + JA_NFA/JA_ER)/JA_PJWH: JA_WH = JA_WH(1)/(1+URBAR+0.035)+

((1-JA_BETA)*JA_GDP*JA_PGNP - JA_TAXH)/JA_P + URPREM*JA_WKJWK: JA_WK = JA_WK(1)/(1 + JA_RSR + (JA_K/JA_K(-1) - 1)) +

(JA_BETA*JA_GDP*JA_PGNP - JA_TAXK)/JA_P -(JA_DELTA+URPREM)*JA_WK

JYD: JA_YD = (JA_GDP*JA_PGNP - JA_TAX)/JA_P - JA_DELTA*JA_K(-1)JGE: JA_GE = JA_P*JA_G + JA_R*JA_B(-1) + JA_GEXOGJTAX: JA_TAX = JA_TRATE*(JA_PGNP*JA_GNP - JA_DELTA*JA_K(-1)*JA_P +

JA_R*JA_B(-1) + RES_JA_TAX*JA_PGNP)JTAXK: JA_TAXK = UDUMCT*JA_BETA*JA_TAX +

(1-UDUMCT)*JA_CTREFF*JA_BETA*JA_GDP*JA_PGNPJTAXH: JA_TAXH = JA_TAX-JA_TAXKJTRATE: DEL(1:JA_TRATE) = JA_DUM*(TAU1*((JA_B - JA_BT)/(JA_GNP*JA_PGNP)) +

TAU2*(DEL(1:JA_B - JA_BT)/(JA_GNP*JA_PGNP))) +TAU3*(JA_TRATEBAR(-1) - JA_TRATE(-1)) + RES_JA_TRATE

JB: DEL(1:JA_B)+DEL(1:JA_M) = JA_R*JA_B(-1) +(JA_P*JA_G - JA_TAX + JA_GEXOG) + RES_JA_B*JA_P

JGDEF: JA_GDEF = DEL(1:JA_B + JA_M)JM: LOG(JA_M/JA_P) = JM0 + JM1*LOG(JA_A) +

JM2*JA_RS + JM4*LOG(JA_M(-1)/JA_P(-1)) + RES_JA_MJRS: DEL(1:JA_RS) - JR3*(JA_RS(1)-JA_RS(-1)) = JR1*(LOG(JA_MT/JA_M)/JM2) + RES_JA_RSJRL: JA_RL/100 = ((1 + JA_RS/100)*(1 + JA_RS(1)/100)*(1 + JA_RS(2)/100)*

(1 + JA_RS(3)/100)*(1 + JA_RS(4)/100))**0.2 - 1 + RES_JA_RLJR: JA_R = 0.5*JA_RS(-1)/100 + 0.5*(JA_RL(-3)+JA_RL(-2)+JA_RL(-1))/3/100)JRLR: JA_RLR = (1 + JA_RL/100)/(JA_P(5)/JA_P)**0.2 - 1JRSR: JA_RSR = (1 + JA_RS/100)/(JA_P(1)/JA_P) - 1JRR: JA_RR = (0.8*JA_RS(-1) + 0.2*(JA_RL(-3) + JA_RL(-2) + JA_RL(-1))/3)/100JPGNP: DEL(1:LOG(JA_PGNPNO)) = DEL(1:LOG(JA_PGNPNO(-1))) +

JP1*(JA_CU/100-1) +JP2*DEL(1:LOG(JA_P/JA_PGNPNO)) -JP3*DEL(1:LOG(JA_PGNPNO(-1)/JA_PGNPNO(1))) + RES_JA_PGNP

JPNO: JA_PGNPNO = (JA_GDP*JA_PGNP - JA_PRODOIL*POIL)/(JA_GDP - JA_PRODOIL)JP: JA_PGNP = (JA_P*JA_A + JA_XT*JA_PXT - JA_IT*JA_PIT)/JA_GDP + RES_JA_P*JA_PGNPJPFM: LOG(JA_PFM) = 0.5*(W11*LOG(JA_ER/UE87) - LOG(JA_ER/UE87) +

W12*LOG(JA_PXM*JA_ER/JE87) + L12*LOG(JA_PGNPNO*JA_ER/JE87) +W13*LOG(GR_PXM*GR_ER/GE87) + L13*LOG(GR_PGNPNO*GR_ER/GE87) +W14*LOG(CA_PXM*CA_ER/CE87) + L14*LOG(CA_PGNPNO*CA_ER/CE87) +W15*LOG(FR_PXM*FR_ER/FE87) + L15*LOG(FR_PGNPNO*FR_ER/FE87) +W16*LOG(IT_PXM*IT_ER/IE87) + L16*LOG(IT_PGNPNO*IT_ER/IE87) +W17*LOG(UK_PXM*UK_ER/EE87) + L17*LOG(UK_PGNPNO*UK_ER/EE87) +W18*LOG(SI_PXM*SI_ER/SE87) + L18*LOG(SI_PGNPNO*SI_ER/SE87) +W19*LOG(RW_PXM*RW_ER/RE87) + L19*LOG(DC_PGNP*RW_ER/RE87))

JPXM: DEL(1:LOG(JA_PXM)) = JPXM0 + JPXM1*DEL(1:LOG(JA_PGNPNO)) +(1-JPXM1)*DEL(1:LOG(JA_PFM)) +JPXM2*LOG(JA_PGNPNO(-1)/JA_PXM(-1)) + RES_JA_PXM

JPXT: JA_PXT = (JA_XMA*JA_PXM + POIL*JA_XOIL)/JA_XTJPIM: JA_PIM = (S11*JA_PXM + S21*JA_PXM*JA_ER/JE87 +

S31*GR_PXM*GR_ER/GE87 + S41*CA_PXM*CA_ER/CE87 +

5.3 Solution Techniques for RE Models 92

S51*FR_PXM*FR_ER/FE87 + S61*IT_PXM*IT_ER/IE87 +S71*UK_PXM*UK_ER/EE87 + S81*SI_PXM*SI_ER/SE87 +S91*RW_PXM*RW_ER/RE87)/(JA_ER/UE87)*(1 + RES_JA_PIM)

JPIMA: JA_PIMA = JA_PIM + T1*(WTRADE - TRDE)/JA_IMJPIT: JA_PIT = (JA_IM*JA_PIMA + JA_IOIL*POIL + JA_ICOM*PCOM)/JA_ITJYCAP: JA_YCAP = JY87*(UBETA*(JA_K/JK87)**(-JRHO) +

(1 - JBETA)*((1 + JPROD)**(TME - 21)*(1 + RES_JA_YCAP/(1-JBETA))*JA_LF/JL87)**(-JRHO))**((-1)/JRHO)

JBETA: JA_BETA = JBETA*(JA_YCAP/JA_K/(JY87/JK87))**JRHOJLF: JA_LF = JA_POP*JA_PART/(1 + JA_DEM3)JCU: JA_CU = 100*JA_GDP/JA_YCAPJNFA: DEL(1:JA_NFA) = (JA_XT*JA_PXT - JA_IT*JA_PIT)*JA_ER +

JA_R*(JA_NFA(-1) + JA_NFAADJ(-1)) + RES_JA_NFAJTB: JA_TB = JA_XT*JA_PXT - JA_IT*JA_PITJCAB: JA_CURBAL = DEL(1:JA_NFA)JER: 1+US_RS/100 = (1 + JA_RS/100)*(JA_ER(1)/JA_ER) + RES_JA_ERJREER: JA_REER = LOG(JA_PXM) - LOG(JA_PFM)JMERM: JA_MERM = EXP(-(V12*LOG(JA_ER/JE87) + V13*LOG(GR_ER/GE87) +

V14*LOG(CA_ER/CE87) + V15*LOG(FR_ER/FE87) +V16*LOG(IT_ER/IE87) + V17*LOG(UK_ER/EE87) +V18*LOG(SI_ER/SE87)))

JFA: JA_FA = (JA_A*UE87)**L11*(JA_A*JE87)**L12*(GR_A*GE87)**L13*(CA_A*CE87)**L14*(FR_A*FE87)**L15*(IT_A*IE87)**L16*(UK_A*EE87)**L17*(SI_A*SE87)**L18*((HO_A+DC_A)*RE87)**L19/UE87

5.2.3 Structure of the Complete Model

As already described in Section 5.2.1, the country models are linked togetherby trade equations. This then results into a 466 equation model. The linkagesbetween country models are schematized in Figure 5.1.

US

JA

CA

UK

FR

GR

IT

SI HO

DC

.............................

..............................

....................

..............................

....................

....................

...........................

...........................

...........................

...........................

...........................

...........................

...........................

...

..................................................

...................................................

...................................................

...................................................

....................................

..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

................................................................................................................................................................................................

........................................................................................................................

.............................

..........................

...

..................................................

..................................................

....................

...............................................................................................................................................................................................................................................................................................................................................................................................................................................

..............................................................................................................................................................................................................................................................

...............................................................................................................................................................................................................................................

................................................................................................................................................................................................

........................................................................................................................

.....................................................................................................................................................

................................................................................................................................................................................................

...............................................................................................................................................................................................................................................

...............................................................................................................................................................................................................................................................

...............................................................................................................................................................................................................................................

................................................................................................................................................................................................

.....................................................................................................................................................

................................................................................................................................................................................................

...............................................................................................................................................................................................................................................

...............................................................................................................................................................................................................................................................

...............................................................................................................................................................................................................................................

.............................

........................................................................................................................

................................................................................................................................................................................................

...............................................................................................................................................................................................................................................

..............................................................................................................................................................................................................................................................

.............................

........................................................................................................................

................................................................................................................................................................................................

...............................................................................................................................................................................................................................................

.............................

........................................................................................................................

................................................................................................................................................................................................

......................................................

..................................................

.............................................

.............................

Figure 5.1: Linkages of the country models in the complete version of MULTI-MOD.

The incidence matrix of matrix D corresponding to the Jacobian matrix ofcurrent endogenous variables is given in Figure 5.2.

5.3 Solution Techniques for RE Models

In the following section, we first present the classical approach to solve modelswith forward looking variables, which essentially amounts to the Fair-Taylorextended path algorithm given in Section 5.3.1.


0 100 200 300 400

0

50

100

150

200

250

300

350

400

450

nz = 2260

Figure 5.2: Incidence matrix of D in MULTIMOD.

An alternative approach presented in Section 5.3.2 consists in stacking up theequations for a given number of time periods. In such a case, it is necessary toexplicitly specify terminal conditions for the system. This system can then beeither solved with a block iterative method or a Newton-like method. The com-putation of the Newton step requires the solution of a linear system for which weconsider different solution techniques in Section 5.3.4. Section 5.3.3 introducesblock iterative schemes and possibilities of parallelization of the algorithms.

5.3.1 Extended Path

Fair’s and Taylor’s extended path method (EP) [34] constitutes the first oper-ational approach to the estimation and solution of dynamic nonlinear rationalexpectation models.

The method does not require the setting of terminal conditions as explained inSection 5.1.1. In the case where terminal conditions are known, the methoddoes not need type III iterations and the correct answer is found after type IIiterations converged.

The method can be described as follows:

1. Choose an integer k as an initial guess for the number of periods beyondthe horizon h for which expectations need to be computed to obtain asolution within a prescribed tolerance.

Choose initial values for yt+1|t−1, . . . , yt+2h+k|t−1.

2. Solve the model dynamically to obtain new values for the variables

yt+1|t−1, . . . , yt+h+k|t−1 .

Fair and Taylor suggest using Gauss-Seidel iterations to solve the model.(Type I iterations).

3. Check the vectors yt+1|t−1, . . . , yt+h+k|t−1 for convergence. If any of thesevalues have not converged, go to step 2. (Type II iterations).


4. Check the set of vectors yt+1|t−1, . . . , yt+h+k|t−1 for convergence with thesame set that most recently reached this step. If the convergence criterionis not satisfied, then extend the solution horizon by setting k to k + 1 andgo to step 2. (Type III iterations).

5. Use the computed values of yt+1|t−1, . . . , yt+h+k|t−1 to solve the model forperiod t.

The method treats endogenous leads as predetermined, using initial guesses,and solves the model period by period over some given horizon. This solutionproduces new values for the forward-looking variables (leads). This process isrepeated until convergence of the system. Fair and Taylor call these iterations“type II iterations” to distinguish them from the standard “type I iterations”needed to solve the nonlinear model within each time period. The type IIiterations generate future paths of the expected endogenous variables. Finally,in a “type III iteration”, the horizon is augmented until this extension does notaffect the solution within the time range of interest.

Algorithm 25 Fair-Taylor Extended Path Method

Choose k and initial values yit+r , r = 1, 2, . . . , k + 2h , i = I, II, III

repeat until [yIIIt yIII

t+1 . . . yIIIt+h] converged

repeat until [yIIt yII

t+1 . . . yIIt+h+k] converged

for i = t, t + 1, . . . , t + h + krepeat until yI

i convergedset yII

i = yIi

endendset [yIII

t . . . yIIIt+h] = [yII

t . . . yIIt+h]

k = k + 1end

5.3.2 Stacked-time Approach

An alternative approach consists in stacking up the equations for successive timeperiods and considering the solution of the system written in Equation (5.3).

According to what has been said in Section 3.1, we first begin to analyze thestructure of the incidence matrix of J , which is

J =

D A 0 · · · 0E D A · · · 00 E D · · · 0...

. . ....

0 · · · D

,

where we have dropped the time indices as the incidence matrices of Et+j , Dt+j

and At+j j = 1, . . . , T are invariant with respect to j.

As matrix D, and particularly matrices E and A are very sparse, it is likely thatmatrix J can be rearranged into a blocktriangular form J∗. We know that as aconsequence it would then be possible to solve parts of the model recursively.


When rearranging matrix J , we do not want to destroy the regular pattern of Jwhere the same set of equations is repeated in the same order for the successiveperiods. We therefore consider the incidence matrix D∗ of the sum of the threematrices E + D + A and then seek its blocktriangular decomposition

D∗ =

D∗11

.... . .

D∗p1 . . . D∗

pp

. (5.16)

Matrix D∗ corresponds to the structure of a system where all endogenous vari-ables have been transformed into current variables by removing lags and leads.Therefore, if matrix D∗ can be put into a blocktriangular form by a permuta-tion of its rows and columns as shown in (5.16), where all submatrices D∗

ii areindecomposable, there also exists a blockrecursive decomposition J∗ of matrixJ and the solution of system (5.3) can be obtained by solving the sequence of psubproblems in

J∗y∗ = b∗ ,

where y∗ and b∗ have been rearranged according to J∗. The variables solved insubproblem i− 1 will then be exogenous in subproblem i for i = 2, . . . , p.

Let us illustrate this with a small system of 13 equations containing forwardlooking variables and for which we give the matrices E, D and A already par-titioned according to the blocktriangular pattern of matrix D.

E =

1147

61

1259

10133

2

8

11

4 7 6 1 12

5 9 10

13

3 2 8

· · · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·

D =

1147

61

1259

10133

2

8

11

4 7 6 1 12

5 9 10

13

3 2 8

··· ·· · · · · · ·· · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · · ·· · · · · · · · · ·

A =

1147

61

1259

10133

2

8

11

4 7 6 1 12

5 9 10

13

3 2 8

· · · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · · · ·· · · · · · · · · · ·· · · · · · · · · · · · ·

The sum of these three matrices define D∗ which has the following blocktrian-gular pattern

D∗ =

1147

61

1259

1013328

11

4 7 6 1 12

5 9 10

13

3 2 8

· · · · · · ·· · · · · · ·· · · · · · · ·· · · · · · ·· · · · · · · ·· · · · · · ·· · · · · · · · ·· · · · · · ·· · · · · · · ·· · · · · · · · ·

With respect to D, the change in the structure of D∗ consists in variables 6and 2 being now in an interdependent block. The pattern of D∗ shows that thesolution of our system of equations can be decomposed into a sequence of threesubproblems.


Matrix D∗ defines matrices E11, D11, E22, D22 and A22. Matrix A11, E33 andA33 are empty matrices and matrix D33 is constituted by a single element.

E11 =1147

11

4 7

· · ·· ·· · D11 =1147

11

4 7

··

E22 =

6

11259

10133

2

6 1 12

5 9 10

13

3 2

· · · · · · · ·· · · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · · ·· · · · · · ·· · · · · · · · ·· · · · · · · · ·

D22 =

6

11259

10133

2

6 1 12

5 9 10

13

3 2

· · · ·· · · ·· · · · ·· · · · · ·· · · · ·· · · · · ·· · · · ·· · · · · · ·

A22 =

6

11259

10133

2

6 1 12

5 9 10

13

3 2

· · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · ·

Stacking the system, for instance, for three periods gives J∗

J∗ =

J∗

11

· · · J∗22

· · · · · · J∗33

=

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

· · ·· ·· · · · ·· ·· ·

······

· · · · · · · ·· · · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · · ·· · · · · · ·· · · · · · · · ·· · · · · · · · · · · · · · · · ·· · · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · · ·· · · · · · ·· · · · · · · · ·· · · · · · · · ·

· · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · · · · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · · ·· · · · · · · ·· · · · · · · · ·· · · · · · · · ·· · · · · · ·

· · · ·· · · ·· · · · ·· · · · · ·· · · · ·· · · · · ·· · · · ·· · · · · · ·· · · ·· · · ·· · · · ·· · · · · ·· · · · ·· · · · · ·· · · · ·· · · · · · ·

· · · ·· · · ·· · · · ·· · · · · ·· · · · ·· · · · · ·· · · · ·· · · · · · ·.

As we have seen in the illustration, the subproblems one has to consider in thep successive steps are defined by the partition of the original matrices D, E andA according to the p blocks of matrix D∗

E =

E11

.... . .

Ep1 . . . Epp

D =

D11

.... . .

Dp1 . . . Dpp

A =

A11

.... . .

Ap1 . . . App

(5.17)

and the subsystem to solve at step i is defined by the matrix J∗ii

J∗ii =

Dii −Aii

−Eii Dii −Aii

. . . . . . . . .−Eii Dii −Aii

−Eii Dii

. (5.18)


It is clear that for the partition given in (5.17), some of the submatrices Dii arelikely to be decomposable and some of the matrices Eii and Aii will be zero.As a consequence, matrix J∗

ii can have different kinds of patterns accordingto whether the matrices Eii and Aii are zero or not. These situations arecommented hereafter.

If matrix Aii ≡ 0, we have to solve a blockrecursive system which correspondsto the solution of a conventional dynamic model (case of J∗

11 in our example). Inpractice, this corresponds to a separable submodel for which there do not existany forward looking mechanisms. As a special case, the system can be com-pletely recursive if Dii is of order one, or a sequence of independent equationsor blocks if Eii ≡ 0. This is the case of J∗

33 in our example.

In the case where matrix Aii ≡ 0 and Eii ≡ 0 as for J∗22 in our example, we

have to solve an interdependent system. In practice, we often observe thata large part of the equations are contained in one block Dii, for which thecorresponding matrix Aii is nonzero, and one might think that little is gainedwith such a decomposition. However, even if the size of the blockrecursive partis relatively small compared to the rest of the model, this decomposition isnevertheless useful, as in some problems the number of periods T for which wehave to stack the model can go into the order of hundreds. The savings in termof size of the interdependent system to be solved may therefore not be negligible.

From this point onwards in our work, we will consider the solution of an inde-composable system of equations defined by J∗

ii.

5.3.3 Block Iterative Methods

We now consider the solution of a given system J∗ii and denote by y = [y1 . . . yT ]

the array corresponding to the stacked vectors for T periods. In the following wewill use dots to indicate subarrays. For instance, y·t designates the t-th columnof array y.

A block iterative method for solving an indecomposable J∗ii, consists in executing

K loops over the equations of each period. For K = 1, we have a point methodand, for an arbitrary K, the algorithm is formalized in Algorithm 26.

Algorithm 26 Incomplete Inner Loops

repeat until convergence1. for t = 1 to T2. for k = 1 to K3. y0

·t = y1·t

4. for i = 1 to n5. Evaluate y1

it

endend

endend

Completely solving the equations for each period constitutes a particular caseof this algorithm. In such a situation, we execute the loop over k until the


convergence of the equations of the particular period is reached, or equivalentlyset K to a corresponding large enough value. We may mention for block methodsthat other algorithms than Gauss-Seidel can be used for solving a given block t.

We notice that this technique is identical to the Fair-Taylor method withoutconsidering type III iterations, i.e. if the size of the system is kept fix and thesolution horizon is not extended.

As already discussed in Section 2.4.6, the convergence of nonlinear first-orderiterations has to be investigated on the linearized equations. To present theanalysis of the convergence of these block iterative methods, it is convenient tointroduce the notation hereafter.

When solving Cx = b the k-th iteration of a first-order iterative method isgenerally written as

Mxk+1 = Nxk + b ,

where matrices M and N correspond to a splitting of matrix C as introducedin Section 2.4.

The iteration k of the inner loop for period t is then defined by

Myk+1t = −Nyk

t − Eyk+1t−1 −Ayk

t+1 , (5.19)

where matrices M and N correspond to the chosen splitting of matrix Dii.When solving for a single period, variables related to periods other than t areconsidered exogenous and the matrix which governs the convergence in this caseis M−1N .

Let us now write the matrix that governs the convergence when solving thesystem for all T periods and where we iterate exactly K times for each of thesystems concerning the different periods. To do this we write yk+1

t in (5.19) asa function of y0

t which gives

y(K)t = (M−1N)Ky0

t −H(K)M−1Ey(K)t−1 −H(K)M−1Ay

(K)t+1 ,

where

H(K) = I +K−1∑i=1

(M−1N)i .

Putting these equations together for all periods we obtain the following matrix

(M−1N)K H(K)M−1A

H(K)M−1E (M−1N)K H(K)M−1A

. . .. . .

. . .

H(K)M−1E (M−1N)K H(K)M−1A

H(K)M−1E (M−1N)K

. (5.20)

In the special case where K = 1, we have H1 = I and we iterate over the stackedmodel. In the case where K is extended until each inner loop is completely solvedwe have (M−1N)K ≈ 0 and H(K) reaches its limit.


Serial Implementation

Fundamentally, our problem consists in solving a system as given in (5.3). Theextended path method increases the system period by period until the solutionstabilizes within a given range of periods. The other strategy fixes the size ofthe stacked model as well as the terminal conditions in (5.3) and solves thissystem at once.

As has already been mentioned, system (5.3) has a particular pattern since thelogical structure of matrices Et+j , Dt+j and At+j , j = 1, . . . , T is invariant withrespect to j. In the following, we will take advantage of this particularity of thesystem.

A first and obvious task is to analyze the structure of system (5.3) in order torearrange the equations into a blockrecursive form. This can be done withoutdestroying the regular pattern of matrix J as shown in Section 5.3.2.

We will now solve the country model for Japan in MULTIMOD. We recall thatall country models in the seven industrialized countries and the model coveringsmall industrial countries are identical in their structure. The same analysistherefore holds for all these submodels.

The model of our experiment has a maximum lag of three and lead variables upto five periods. We therefore have to solve a system the structure of which isgiven by

S =

Dt+1 At+21 At+3

2 At+43 At+5

4 At+65

Et+11 Dt+2 At+3

1 At+42 At+5

3 At+64 At+7

5Et+1

2 Et+21 Dt+3 At+4

1 At+52 At+6

3 At+74 At+8

5Et+1

3 Et+22 Et+3

1 Dt+4 At+51 At+6

2 At+73 At+8

4 At+95

Et+23 Et+3

2 Et+41 Dt+5 At+6

1 At+72 At+8

3 At+94

. . .

Et+33 Et+4

2 Et+51 Dt+6 At+7

1 At+82 At+9

3

. . . At+T−35

Et+43 Et+5

2 Et+61 Dt+7 At+8

1 At+92

. . . At+T−34 At+T−2

5

Et+53 Et+6

2 Et+71 Dt+8 At+9

1

. . . At+T−33 At+T−2

4 At+T−15

Et+63 Et+7

2 Et+81 Dt+9

. . . At+T−32 At+T−2

3 At+T−14 At+T

5

Et+73 Et+8

2 Et+91

. . . At+T−31 At+T−2

2 At+T−13 At+T

4

Et+83 Et+9

2

. . . Dt+T−3 At+T−21 At+T−1

2 At+T3

Et+93

. . . Et+T−31 Dt+T−2 At+T−1

1 At+T2

. . . Et+T−32 Et+T−2

1 Dt+T−1 At+T1

Et+T−33 Et+T−2

2 Et+T−11 Dt+T

.

The size of the submatrices is 45× 45 and the corresponding incidence matricesare given in Figure 5.3.

As already mentioned in the introduction of this section, we consider two funda-mental types of solution algorithms: first-order iterative techniques and Newton-like methods. For first-order iterative techniques we distinguish point methodsand block methods. The Fair-Taylor algorithm typically suggests a block itera-tive Gauss-Seidel method.


E

E

E

D

A

A to A

A

Figure 5.3: Incidence matrices E3 to E1, D and A1 to A5.

We first tried to execute the Fair-Taylor method using Gauss-Seidel for the innerloop. For this particular model, this method immediately fails as Gauss-Seideldoes not converge for the inner loop. The model is provided in a form suchthat the spectral radius governing the convergence is around 64000. We knowthat different normalizations and orderings of the equations change this spectralradius. However, it is very difficult to find the changes that make this radiusinferior to unity (see Section 3.3.3).

We used heuristic methods such as threshold accepting to find a spectral radiusof value 0.62, which is excellent. This then allowed us to apply the Gauss-Seidelfor the inner loop in the Fair-Taylor method.

For the reordered and renormalized model, it might now be interesting to look atthe spectral radii of the matrices which govern the convergence of the first-orderiterative point and block methods. Table 5.2 gives these values for different Tand for the matrices evaluated at the solution.

T point GS block GS

1 0.6221 0.62212 0.7903 0.2336

10 1.2810 0.743715 1.3706 0.848320 1.4334 0.8910

Table 5.2: Spectral radii for point and block Gauss-Seidel.

First, these results reveal that a point Gauss-Seidel method will not convergefor relevant values of T . Block Gauss-Seidel is likely to converge, however we


observe that this convergence slows down as T increases. Using the SOR or FGSmethod with ω < 1, which amounts to damping the iterates, would provide apossibility of overcoming this problem.

Parallel Implementation

The opportunities existing for a parallelization of solution algorithms dependon the type of computer used and the particular algorithm selected to solve themodel. We distinguish between two types of parallel computers: single instruc-tion multiple data (SIMD) and multiple instructions multiple data (MIMD)computers.

We will first discuss the parallelization potential for data parallel processing andthen the possibilities of executing different tasks in parallel. In this section, thetasks are split at the level of the execution of a single equation.

Data Parallelism. The solution of RE models with first-order iterative tech-niques offer large opportunities for data parallel processing. These opportunitiesare of two types.

In the case of a single solution of a model, the same equations have to be solvedfor all the stacked periods. This means that one equation has to be computedfor different data, a task which can be performed in parallel as in Algorithm 27.

Algorithm 27 Period Parallel

while not converged1. y0 = y1

2. for i = 1 to n3. for t = 1 to T in parallel do4. Evaluate y1

it

endend

end

To find a rational solution which uses the mathematical expectation of thevariables, or in the case of stochastic simulation or sensitivity analysis, we haveto solve the same model repeatedly for different data sets. This is a situationwhere we have an additional possibility for data parallel processing. In such acase, an equation is computed simultaneously for a data set which constitutes athree-dimensional array. The first dimension is for the n variables, the secondfor the T different time periods and the third dimension goes over S differentsimulations. Algorithm 28 presents such a possibility.


Algorithm 28 Parallel Simulations


2. for i = 1 to n3. for t = 1 to T and s = 1 to S in parallel do4. Evaluate y1

its

endend

end

With the array-processing features, as implemented for instance in High Per-formance Fortran [66], it is very easy to get an efficient parallelization for thesteps that concern the evaluation of the model’s equations in the above algo-rithms. According to the definitions given in Section 4.1.4, the speedup for suchan implementation is theoretically TS with an efficiency of one, provided thatTS processors are available.

Statement 1 and the computation of the stopping criteria can also be executedin parallel.

Task Parallelism. We already presented in Section 4.2.2 that all the equa-tions can be solved in parallel with the Jacobi algorithm and that for the Gauss-Seidel algorithm it is possible to execute sets of equations in parallel. Clearly,as mentioned above, in a stacked model a same equation is repeated for allperiods. Of course, such equations also fit the data parallel execution modelaforementioned.

It therefore appears that for the solution of a RE model, both kinds of paral-lelism are present. To take advantage of the data parallelism and the control ortask parallelism, one needs a MIMD computer.

In the following, we present execution models that solve different equations inparallel. For Jacobi, this parallelism is for all equations whereas for Gauss-Seidelit applies only to subsets of equations. Contrary to what has been done before,the two methods will now be presented separately.

For the Jacobi method, we then have Algorithm 29.

Algorithm 29 Equation Parallel Jacobi


2. for i = 1 to n in parallel do3. for t = 1 to T and s = 1 to S in parallel do4. Evaluate y1

its using y0··s

endend

end

Statement 2 corresponds to a control parallelism and Statement 3 to a dataparallelism. The theoretical speedup for this algorithm on a machine with nTS


processors is nTS.

For a decomposition of the equations into q levels L1, . . . , Lq, the Gauss-Seidelmethod is shown in Algorithm 30.

Algorithm 30 Equation-parallel Gauss-Seidel


2. for j = 1 to q3. for i ∈ Lj in parallel do4. for t = 1 to T and s = 1 to S in parallel do5. Evaluate y1

its using y1··s

endend

endend

Statement 3 corresponds to a control parallelism and Statement 4 to a data par-allelism. The theoretical speedup for the Gauss-Seidel algorithm using maxj |Lj |×TS processors is nTS

q .

In the Jacobi algorithm, Statement 1 updates the array for the computed valueswhich are then used in Statement 4. In the Gauss-Seidel algorithm, Statement 5overwrites the same array with the computed values. The update in Statement 1is only needed in order to be able to check for convergence.

Incomplete Inner Loops. When executing an incomplete inner loop we haveto make sure we update the variables for a period t in separate arrays because,during the execution of the K iterations for a given period t, variables belongingto periods others than t must remain unchanged. As the computations are madein parallel for all T periods, the data has to be replicated in separate arrays.Figure 5.4 illustrates how data has to be aligned in the computer’s memory.

The incomplete inner loop technique with Jacobi is shown in Algorithm 31.

Algorithm 31 Equation Parallel Jacobi with Incomplete Inner Loops


2. for t = 1 to T and s = 1 to S in parallel do y··s = y1··s end

3. for k = 1 to K4. for i = 1 to n in parallel do5. for t = 1 to T and s = 1 to S in parallel do6. Evaluate y1

its using y··tsend

end7. for t = 1 to T and s = 1 to S in parallel do y·tts = y1

·ts endend

end

In this algorithm y1 and y are different arrays. The former contains the updatesfor the n×T×S computed variables and the latter is an array with an additional


×

.................

........................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................

y···s

y1··s

y·tts

................................................................................................

y··ts

.....................................................

y1·ts

..................................................................

1

t

T

1

i

n

1 t T..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

·········

·········

·········

·········

·········

·········

·········

·········

·········

·········

··············

··············

··············

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

........

........

................................................................................................................................................................................................................................................................................................................................................................................................

······

······

······

······

·················

Figure 5.4: Alignment of data in memory.

dimension.

Statement 2 performs the replication of data. If we take the fourth dimension,which stands for the independent repetitions of the solution of the model con-stant, Statement 2 builds a cube by spreading a two-dimensional data array intothe third dimension.

In Statement 6, we update the equations using these separate data sets. State-ment 7 updates the diagonal y·tt, t = 1, . . . , T in the cube.

Algorithm 32 Equation Parallel Gauss-Seidel with Incomplete Inner Loops


2. for t = 1 to T and s = 1 to S in parallel do y··ts = y1··s end

3. for k = 1 to K4. for j = 1 to q5. for i ∈ Lj in parallel do6. for t = 1 to T and s = 1 to S in parallel do7. Evaluate yitts using y··ts

endend

endend

8. for t = 1 to T and s = 1 to S in parallel do y1·ts = y·tts end

end

Apart from the fact that only equations of a same level are updated in parallel,the only difference between the Jacobi and the Gauss-Seidel algorithm, residesin the fact that we use the same array for the updated variables and the variablesof the previous iteration (Statement 7).


The parallel version of Jacobi (Algorithm 29) and the algorithm which executesincomplete inner loops (Algorithm 31) have the same convergence as the corre-sponding serial algorithm (Algorithm 26). This is because Jacobi updates thevariables at the end of the iteration and therefore the order in which the equa-tions are evaluated does not matter. Clearly the spectral radius varies with thenumber T of stacked models.

The algorithm with the incomplete inner loops updates the data after havingexecuted the iterations for all the stacked models. This update between thestacked models corresponds to a Jacobi scheme. The convergence of this algo-rithm is then defined by the spectral radius of a matrix similar to the one givenin (5.20).

Application to MULTIMOD

For this experience, we only considered the Japan country model in isolation andsolved it for 50 periods, which results in a system of 51× 50 = 2500 equations.According to what has been discussed in Section 5.3.2 and Section 3.1, it is pos-sible to decompose this system into three recursive systems. The first contains1 × 50 equations and the last 5× 50 = 250 equations that are all independent.The nontrivial stacked system to solve has 45× 50 = 2250 equations.

In the previous section, we discussed in detail the several possibilities for solvingsuch a system in parallel at an equation level. It rapidly became obvious thatsuch a parallelization does not suit the SPMD execution model of the IBM SP1,since this approach seems not to work efficiently in such an environment for our“medium grain” parallelization.

We therefore chose to parallelize at model-level, which means that we considerfinding the solution of the different models related to the different periods t =1, . . . , T as the basic tasks that have to be executed in parallel. These taskscorrespond to the solution of systems of equations for which the structure of theJacobian matrices are given in Figure 5.3.

As a consequence, the algorithm becomes

Algorithm 33 Model Parallel Gauss-Seidel


2. for t = 1 to T and s = 1 to S in parallel do y··s = y1··s end

3. for t = 1 to T in parallel do4. for s = 1 to S solve model for y·tts end

endend

5. for t = 1 to T and s = 1 to S in parallel do y·tts = y1·ts end

endend

where each model in Statement 4 is solved on a particular processor with aconventional serial point Gauss-Seidel method.

Solving the system for T = 50 and S = 1 on 4 processors, we obtained a


surprising result, i.e. the execution time with four processors was larger thanthe execution time with one processor. To understand this result we have toanalyze in more detail the time needed to execute the different statements inAlgorithm 33.

Figure 5.5 monitors the elapsed time to solve the problem for S = 1 and S = 100with four processors and with a single processor. The sequence of solid segmentsrepresents the time spent execute Statement 4 in Algorithm 33. The intervalsbetween these segments correspond to the time spent to execute the other state-ments, which essentially concerns the communication between processors. Inparticular, Statement 2 broadcasts the new initial values to the processors andStatement 5 gathers the computed results.

0 10 20 30 40 50 60

single proc

proc 0

proc 1

proc 2

proc 3

Elapsed time in seconds

Proc

esso

r

R=100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

single proc

proc 0

proc 1

proc 2

proc 3

Elapsed time in seconds

Proc

esso

r

R=1

Figure 5.5: Elapsed time for 4 processors and for a single processor.

From Figure 5.5, it appears that, in the case for S = 1, the overhead for commu-nication with four processors is roughly more than three times the time spentfor computation. For S = 100, the communication time remains constant, whilethe computing time increases proportionally. Hence, as the communication timeis relatively small compared to the computing time, in this case we achieve aspeedup of almost four.

This experience confirms the crucial tradeoff existing between the level of paral-lelization and the corresponding intensity of communication. In our case, a par-allelization of the solution algorithm at equation-level appeared to be by far toofine-grained. Gains over a single processor execution start only at model-levelwhere the repetitive solutions are executed serially on the different processors.


5.3.4 Newton Methods

As already mentioned in Section 2.6.1, the computationally expensive steps forthe Newton method are the evaluation of the Jacobian matrix and the solutionof the linear system. However, if we look at these problems closer, we maydiscover that interesting savings can be obtained.

First, if we resort to an analytical calculation of the Jacobian matrix, we au-tomatically exploit the sparse structure since only the nonzero derivatives ofmatrices Ei, D and Aj , i = 1, . . . , r , j = 1, . . . , h will be computed.

Second, the linear systems in successive Newton steps all have an identical logicalstructure. This fact can be exploited to avoid a complete refactorization fromscratch of the system’s matrix.

Third, macroeconometric models are in general used for policy simulations orforecasting, which can be done in a deterministic way or better by means ofstochastic simulations. This implies that one has to solve a system of equationsthe logical structure of which does not change many times. As a consequence,some intermediate results of the computations in the solution algorithm remainidentical for all simulations. These results are therefore computed only onceand their computational cost can be considered as an overhead.

The following three sections present methodologies and applications of tech-niques that make the most of the features just discussed hereabove.

In the next section, we discuss a sparse elimination algorithm using a graph the-oretical framework. The linear system shows a multiple block diagonal patternthat is then utilized to set up a block LU factorization. Nonstationary iterativemethods are suited for the large and sparse systems we deal with and someexperiments with our application are also given.

The numerical experiments are again carried out on the MULTIMOD model wepresented in Section 5.2.

Sparse Elimination

Although efficient methods for the solution of sparse linear systems are by nowavailable (see Section 2.3.4), these methods are not widely used for economicmodeling. In the following, we will present a method for solving linear systemswhich only considers nonzero entries in the coefficient matrix.

The linear system is solved by substituting one variable after another in theequations. This technique is equivalent to a Gaussian elimination where theorder of the pivots is the order in which the variables are substituted. Wewill refer to this technique as sparse Gaussian elimination (SGE). In order toefficiently exploit the sparsity of the system of equations in this substitutionprocess, we represent the linear system of the Newton algorithm, say Ay = b,by means of an oriented graph. The adjacency matrix of this graph correspondsto the incidence matrix of our Jacobian matrix J . The arcs of the graph aregiven values. For an arc going from vertex i to vertex j the value is given bythe element Jji in the Jacobian matrix. The substitution of variables in theequations corresponds to an elimination of vertices in the graph.


Similar to what happens when substituting variables in a system of equations,the sum of direct and indirect links between the remaining vertices has to remainunchanged. In other words, this means that the flow existing between thevertices before and after the elimination has to remain the same.

Elimination of Vertices. Let us now formalize the elimination of a vertexk. We denote Pk the set of predecessors and Sk the set of successors of vertexk. For all vertices i ∈ Pk and j ∈ Sk, we create a new arc i

us−→ j, the valueus = uruv of which corresponds to the path i

ur−→ kuv−→ j. Then vertex k and

all its ingoing and outgoing arcs are dropped.

If before elimination of vertex k the graph contains an arc iuw−→ j with i ∈ Pk

and j ∈ Sk, then the new graph will contain parallel arcs. We replace theseparallel arcs by a single arc, the value of which corresponds to the sum of thetwo parallel arcs.

Elimination of Loops. If the sets Pk and Sk contain a same vertex i, theelimination of vertex k will create an arc which appears as a loop over vertex iin the new graph.

To illustrate how loops can be removed from a vertex, let us consider the fol-lowing simple case:

ki j.................................................................................................... ...........ur .................................................................................................... ...........

uv.......................................................................................

..............

ul

In terms of our equations this corresponds to:

yk = uryi + ulyk ,

yj = uvyk ,

and substituting yk gives us

yj =uruv

1− ulyi ,

from which we deduce that the elimination of the loop corresponds to one ofthe following modifications in the graph

ki j.................................................................................................... ...........ur .................................................................................................... ...........

uv1−ul

ki j.................................................................................................... ...........

ur1−ul .................................................................................................... ...........

uv .

We now describe the complete elimination process of the linear system Ay = b,where A is a real matrix of order n and GA the associated graph.

The elimination proceeds in n − 1 steps and, at step k, we denote by A(k) thecoefficient matrix of the linear system after substitution of k variables and GA(k)

is the corresponding graph.

In GA(n−1) , the variable yn depends only upon exogenous elements and thesolution can therefore be obtained immediately. Similarly, given yn we can solvefor variable yn−1 in GA(n−2) . This is continued until all variables are known.Hence, before eliminating variable yk, we store the equation defining yk. In the


graph GA(k−1) , this corresponds to the partial graph defined by the arcs ingoingvertex k.

At this point, it might clarify our point to illustrate the procedure by means ofa small example constituted by a system of four equations

f1 : f1(y1, y3, y4) = 0f2 : y2 = g2(y1, z1)f3 : y3 = g3(y1, y4, z2)f4 : f4(y2, y4) = 0

,∂F

∂y′ =

−1 0 u8 u6

u5 −1 0 0u7 0 −1 u3

0 u2 0 −1

,

∂F

∂z′ =

0 0u1 00 u4

0 0

,

which is supposed to represent the mix of explicit and implicit relations onefrequently encounters in real macroeconometric models.

GA :

y1

y2 y3y4z1 z2.................................................................................u1

.................................................................................u2

.................................................................................u3

.................................................................................

u6

................................................... ............... ...............u4

......................

......................

......................

......................

...................... ...............

u5

.......................................................................................................................

.............................................................................

....................................

u7

..............................................................................................................................................

u8

• Elimination of vertex y1 : Py1 = y3, y4 and Sy1 = y2, y3

y1 = u6y4 + u8y3

u9 = u8u5

u10 = u8u7

u11 = u6u5

u12 = u3 + u6u7

GA(1) : y2 y3y4z1 z2........................................................................u1 ........................................................................

u2 ........................................................................u12 .............................................. ............. .............

u4...........

.....................................................................................

.............

u11

.......................................................................................................................................................

........................

..............................

u9

.......................................................................................

..............

u10

• Elimination of vertex y2 : Py2 = z1, y3, y4 and Sy2 = y4

y2 = u1z1 + u11y4 + u9y3

u13 = u1u2

u14 = u9u2

u15 = u11u2

GA(2) : z1 y3y4 z2........................................................................u13 ........................................................................

u12 .............................................. ............. .............u4

...............................................................................

..............................

u14

.......................................................................................

..............

u15

.......................................................................................

..............

u10

• Elimination of vertex y3 : Py3 = z2, y4 and Sy3 = y4

c1 = 1 − u10

u16 = u14/c1GA(2) : z1 y3y4 z2........................................................................

u13 ........................................................................u12 .............................................. ............. .............

u4

...............................................................................

..............................

u16

.......................................................................................

..............

u15

y3 = u4z2 + u12y4

u17 = u4u16

u18 = u15 + u12u16

GA(3) : z1 y4 z2........................................................................u13 .............................................. ............. .............

u17.......................................................................................

..............

u18

• Solution for y4:


c2 = 1 − u18

y4 = (u13z1 + u17z2)/c2

Given the expressions for the forward substitutions and the backward substitu-tions, the linear system of our example can be solved executing 29 elementaryarithmetic operations.

Minimizing the Fill-in. It is well known that, for a sparse Gaussian elimi-nation, the choice of the variables to be substituted (pivots) is also conditionedby making sure an excessive loss of sparsity is avoided.

From the illustration describing the procedure of vertex elimination, we see thatan upper bound for the number of new arcs generated is given by d−k d+

k , theproduct of indegree and outdegree of vertex k. We therefore select a vertex ksuch that d−k d+

k is minimum over all remaining vertices in the graph.2

We reconsider the previous example in order to illustrate the effect of the choiceof the order in which the variables are eliminated.


y2 = u5y1 + u1z1

u9 = u1u2

u10 = u5u2GA(1) :

y1

z1 y3y4 z2........................................................................u9 ........................................................................

u3

........................................................................

u6

.............................................. ............. .............u4

...............................................................................................................

...................................................................

.................................

u7

..................................................................................................................................

u8...........

..................................................................................................... ...........

u10

• Elimination of vertex y3 : Py3 = z2, y1, y4 and Sy3 = y1

y3 = u7y1 + u3y4 + u4z2

u11 = u4u8

u12 = u7u8

u13 = u3u8 + u6

GA(2) :

y1

z1 z2y4........................................................................u9

........................................................................

u13

..................................................................................................................................

u11...........

.....................

................................................................................ ...........

u10

.....................................................................................................

u12


y4 = u10y1 + u9z1

u14 = u9u13

u15 = u10u13 + u12GA(3) :

y1

z1 z2..................................................................................................................................

u11

........................................................................................................................ ...........

u14

.......................................................................................

..............

u15

• Solution for y1:

c1 = 1 − u15

y1 = (u14z1 + u11z2)/c1

2This is known as the Markowitz criterion, see Duff et al. [30, p. 128].


For this order of vertices, the forward substitutions and the backward substitu-tions necessitate only 24 elementary arithmetic operations.

Condition of the Linear System. In order to achieve a good numericalsolution for a linear system, two aspects are of crucial importance. The first isthe condition of the problem. If the problem is not reasonably well conditionedthere is little hope of obtaining a satisfactory solution. The second problemconcerns the method used which should be numerically stable. This meansroughly that the errors due to the floating point computations are not excessivelymagnified. In the following, we will suggest practical guidelines which may provehelpful to enhance the condition of a linear system.

Let us recall that the linear system we want to solve is given by J (k)s = b.We choose to associate a graph to this linear system. In order to do this, itis necessary to select a normalization for the matrix J (k). Such a normaliza-tion corresponds to a particular row scaling of the original linear system. Thistransformation modifies the condition of the linear system. Our goal is to finda normalization for which the condition is likely to be good.

We recall that the condition κ of a matrix A in the frame of the solution oflinear system Ax = b is defined as

κp(A) = ‖A‖p ‖A−1‖p ≥ 1 ,

and when κ(A) is large the matrix is said to be ill-conditioned. We know thatκp varies with the choice of the norm p, however this does not significantly affectthe order of magnitude of the condition number. In the following, p is assumed∞ unless stated otherwise. Hence, we have

κ(A) = ‖A‖ ‖A−1‖ ,

with

‖A‖ = maxi=1,...,n

n∑j=1

|aij | .

Row scaling can significantly reduce the condition of a matrix. We thereforelook for a scaling matrix D for which

κ(D−1A) κ(A) .

From a theoretical point of view, the problem of finding D minimizing κ(D−1A)has been solved for the infinity norm, see Bauer [10], however the result cannotbe used in practice. Our concern is to suggest a practical solution to this problemby providing an efficient heuristic.

We certainly are aware that the problem of row scaling cannot be solved so farautomatically for any given matrix. A technique consists in choosing D suchthat each row in D−1A has approximately the same ∞-norm, see Golub andVan Loan [56, p. 125].

As we want to represent our system using a graph for which we need a nor-malization, the set of possible scaling matrices is finite. The idea about rows


having approximately the same ∞-norm will guide us in choosing a particularnormalization.

We therefore introduce a measure for the dispersion of the ∞-norm of the rowsof a matrix. Recall that for a vector x ∈ R

n, the ∞-norm is defined as:

‖x‖∞ = maxi=1,...,n

|xi| .

We compute a vector m with

mi = ‖ai·‖∞ i = 1, . . . , n ,

the ∞-norm of row i of matrix A. A measure for the relative range of the ele-ments of m is given by the ratio max(m)/ min(m). Since we are only interestedin the order of magnitude of the ratio, we take the logarithm to obtain

r = log(max(m))− log(min(m)) .

Due to the normalization, matrix A is such that each row contains an elementof value one and therefore min(m) ≥ 1 and r ≥ 0.

In order to explore the relation existing between κ and r in a practical situationwe took the Jacobian matrix of the MULTIMOD country model for Japan.The Jacobian matrix is of size 40 with 134 nonzero elements admitting 44707different normalizations. For each normalization r and κ2 have been computed.In Figure 5.6, r is plotted against the log of κ2 for each normalization.

Figure 5.6: Relation between r and κ2 in submodel for Japan for MULTIMOD.

This figure shows a clear relationship between r and κ2. This suggests a criterionfor the selection of a normalization corresponding to an appropriate scaling ofour matrix.


The selection of the normalization is then performed by seeking a matching ina bipartite graph3 so that it optimizes a criterion built upon the values of theedges, which in turn correspond to the entries of the Jacobian matrix of themodel.

Selection of Pivots. We know that the choice of the pivots in a Gaussianelimination is determinant for the numerical stability. For the SGE method,a partial pivoting strategy would require to resort once more to the bipartitegraph of the Jacobian matrix. In this case, for a given vertex, we choose newadjacent matchings such that the edge belonging to the new matching is ofmaximum magnitude.

A strategy for the achievement a reasonable fill-in is to apply a threshold suchas explained in Section 2.3.2. The Markowitz criterion is easy to implement inorder to select candidate vertices defining the edge in the new matching, sincethis criterion only depends on the degree of the vertices.

Table 5.3 summarizes the operation counts for a Gauss-Seidel method and aNewton method using two different sparse linear solvers. The Japan model issolved for a single period and the counts correspond to the total number ofoperations for the respective methods to converge. We recall that for the GSmethod, we needed to find a new normalization and ordering of the equations,which constitutes a difficult task. The two sparse Newton methods are similarin their computational complexity: the advantage of the SGE method is thatthis method reuses the symbolic factorizations of the Jacobian matrix betweensuccessive iterations.

NewtonStatement MATLAB SGE GS

2 1.5 1.5 9.73 11.5 11.5 -4 3.3 1.7 -

total 16.3 14.7 9.7

Table 5.3: Operation count in Mflops for Newton combined with SGE andMATLAB’s sparse solver, and Gauss-Seidel.

The total count of operations clearly favors the sparse Newton method, as thedifficult task of appropriate renormalizing and reordering is not required then.

Multiple Blockdiagonal LU

Matrix S in 5.3.3 is a block partitioned matrix and the LU factorization of sucha matrix can be performed at block level. Such a factorization is called a blockLU factorization. To take advantage of the blockdiagonal structure of matrix S,the block LU factorization can be adapted to a multiple blockdiagonal matrixwith r lower blocks and h upper blocks (see Golub and Van Loan [56, p. 171]).

3The association of a bipartite graph to the Jacobian matrix can be found in Gilli [50,p. 100].


The algorithm consists in factorizing matrix S given in (5.3.3) on page 99 wherer = 3 and h = 5 into S = LU :

L =

I

F t+11 I

F t+12 F t+2

1 I

F t+13 F t+2

2 F t+31 I

F t+23 F t+3

2 F t+41 I

. . .. . .

. . .. . .

, U =

Ut+1 Gt+21 Gt+3

2 Gt+43 Gt+5

4 Gt+65

Ut+2 Gt+31 Gt+4

2 Gt+53 Gt+6

4 Gt+75

Ut+3 Gt+41 Gt+5

2 Gt+63 Gt+7

4 Gt+85

. . .. . .

. . .. . .

. . .. . .

The submatrices in L and U can be determined recursively as done in Algo-rithm 34.

Algorithm 34 Block LU for Multiple Blockdiagonal Matrices

1. for k = 1 to T

2. for j = min(k − 1, r) down to 1

3. solve for F t+k−jj

F t+k−jj U t+k−j = Et+k−j

j −∑min(k−1,j,h)i=j+1 F t+k−i

i Gt+k−ji−j

end

4. U t+k = Dt+k −∑min(k−1,h)i=1 F t+k−i

i Gt+ki

5. for j = 1 to min(T − k, h)

6. Gt+k+jj = At+k+j

j −∑min(k−1,j,h)i=1 F t+k−i

i Gt+k+ji+1

end

end

In Algorithm 34 the loops of Statements 2 and 5 limit the computations to thebandwidth formed by the blockdiagonals. The same goes for the limits in thesums in Statements 3, 4 and 6.

In Statement 3, matrix F t+k−jj is the solution of a linear system and matrices

U t+k and Gt+k−jj , respectively in Statement 4 and 6, are computed as sums and

products of known matrices.

After the computation of matrices L and U , solution y can be obtained via blockforward and back substitution; this is done in Algorithm 35.


Algorithm 35 Block Forward and Back Substitution

1. ct+1 = bt+1

2. for k = 2 to T

3. ct+k = bt+k −∑min(k−1,r)i=1 F t+k−i

i ct+k−i

end

4. solve U t+T yt+T = ct+T for yt+T

5. for k = T − 1 down to 1

6. solve for yt+k

U t+k yt+k = ct+k −∑min(T−k,h)i=1 Gt+k+i

i yt+k+i

end

Statements 1, 2 and 3 concern the forward substitution and Statements 4, 5and 6 perform the back substitution. The F matrices in Statement 3 are al-ready available after computation of the loop in Statement 2 of Algorithm 34.Therefore the forward substitution could be done during the block LU factor-ization.

From a numerical point of view, block LU does not guarantee the numericalstability of the method even if Gaussian elimination with pivoting is used tosolve the subsystems. The safest method consists in solving the Newton step inthe stacked system with a sparse linear solver using pivoting. This procedure isimplemented in portable TROLL [67], which uses MA28, see Duff et al. [30].

Structure of the Submatrices in L and U . To a certain extent Block LUalready takes advantage of the sparsity of the system as the zero submatricesare not considered in the computations. However, we want to go further andexploit the invariance of the incidence matrices, i.e. their structure. As matricesEt+k

i , Dt+k and At+kj have respectively the same sparse structure for all k, it

comes that the structure of matrices F t+ki , U t+k and Gt+k

j is also sparse andpredictable.

Indeed, if we execute the block LU Algorithm 34, we observe that the matricesF t+k

i , U t+k and Gt+kj involve identical computations for min(r, h) < k < T −

min(r, h). These computations involve sparse matrices and may therefore beexpressed as sums of paths in the graphs corresponding to these matrices.

The computations involving structurally identical matrices can be performedwithout repeating the steps used to analyze the structure of these matrices.

Parallelization. The block LU algorithm proceeds sequentially to computethe different submatrices in L and U . The same goes for the block forward andback substitution. In these procedures, the information necessary to execute agiven statement is always linked to the result of immediately preceeding state-ments, which means that there are no immediate and obvious possibilities ofparallelizing the algorithm.

On the contrary, the matrix computations within a given statement offer ap-pealing opportunities for parallel computations. If, for instance, one uses the


SGE method suggested in Section 5.3.4 for the solution of the linear system,the implementation of a data parallel execution model to perform efficientlyrepeated independent solutions of the model turns out to be straightforward.

To identify task parallelism, we can analyze the structure of the operationsdefined by the algebraic expressions discussed above. In order to do this, we seekfor sets of expressions which can be evaluated independently. We illustrate thisfor the solution of the linear system presented on page 110. The identificationof the parallel tasks can be performed efficiently by resorting to a representationof the expressions by means of a graph as shown in Figure 5.7.

u11 u10 u12 u13 u9

u15 u14

y1

y4 y2

y3

..................................................................................................................................................

.................................................

...................... ...........................

................................................................................ ................. ...............

.................................................

...................... ...........................

.....................................

..................................................................................................................... ................. ................

...................... ...........................

.................................................

.....................................

Figure 5.7: Scheduling of operations for the solution of the linear system ascomputed on page 110.

We see that the solution can be computed in five steps, which each consist ofone to five parallel tasks.4

The elimination of a vertex ytk (corresponding to variable yt

k) in the SGE al-gorithm described in Section 5.3.4 allows interesting developments in a stackedmodel. Let us consider a vertex yt

k for which there exists no vertex ysj ∈ Pyt

k∪Syt

k

for all j and s = t. In other words all the predecessors and successors of ytk be-

long to the same period t. The operations for the elimination of such a vertexare independent and identical for all the T variables yt

k, t = 1, . . . , T . Hence thecomputation of the arcs defining the new graph resulting from the eliminationof such a vertex are executed only once and then evaluated (possibly in parallel)for the different T periods. For the elimination of vertices yt

k with predecessorsand successors in period t + 1, it is again possible to split the model into in-dependent pieces concerning periods [t, t + 1], [t + 2, t + 3] etc. for which theelimination is again performed independently. This process can be continueduntil elimination of all vertices.

Nonstationary Iterative Methods

This section reports results on numerical experiments with different nonstation-ary iterative solvers that are applied to find the step in the Newton method.

We recall that the system is now stacked and has been decomposed into J∗

according to the decomposition technique explained earlier. The size of the4This same approach has been applied for the parallel execution of Gauss-Seidel iterations

in Section 4.2.2.


nontrivial system to be solved is T × 413, where T is the number of times themodel is stacked. Figure 5.8 shows the pattern of the stacked model for T = 10.

Figure 5.8: Incidence matrix of the stacked system for T = 10.

The nonstationary solvers suited for nonsymmetric problems suggested in theliterature we chose to experiment are BiCGSTAB, QMR and GMRES(m). TheQMR method (Quasi-Minimal Residual) introduced by Freund and Nachti-gal [40] has not been presented in Section 2.5 since the method presents poten-tials for failures if implemented without sophisticated look-ahead procedures.We tried a version of QMR without look-ahead strategies for our application.BiCGSTAB—proposed by van der Vorst [99]—is also designed to solve largeand sparse nonsymmetric linear systems and usually displays robust featuresand small computational cost. Finally, we chose to experiment the behavior inour framework of GMRES(m) originally presented by Saad and Schultz [90].

For all these methods, it is known that preconditioning can greatly influencethe convergence. Therefore, following some authors (e.g. Concus et al. [24],Axelsson [6] and Bruaset [21]), we applied a preconditioner based upon theblock structure of our problem.

The block preconditioner we used is built on the LU factorization of the firstblock of our stacked system. If we dropped the leads and lags of the model, theJacobian matrix would be block diagonal, i.e.

Dt+1

Dt+2

. . .Dt+T

.

The tradeoff between the cost of applying the preconditioner and the expectedgain in convergence speed can be improved by using a same matrix D along


the diagonal. This simplification is reasonable when the matrices display littlechange in their structure and values. We therefore selected Dt+1 for the di-agonal block, computed its sparse LU factorization with partial pivoting andused it to perform the preconditioning steps in the iterative methods. Since thefactorization is stored, only the forward and back substitutions are carried outfor applying this block preconditioner.

The values in the Jacobian matrix change at each step of the Newton methodand therefore a new LU factorization of Dt+1 is computed at each iteration.Another possibility involving a cheaper cost would have been to keep the fac-torization fixed during the whole solution process.

For our experiment we shocked the variable of Canada’s government expendi-tures by 1% of the canadian GDP for the first year of simulation.

We report the results of the linear solvers in the classical Newton method. Thefigures reported are the average number of Mflops (millions of floating pointoperations) used to solve the linear system of size T ×413 arising in the Newtoniteration. The number of Newton steps needed to converge is 2 for T less than20 and 3 for T = 30.

Table 5.4 presents the figures for the solver BiCGSTAB. The column labeled“size” contains the number of equations in the systems solved and the one named“nnz” shows the number of nonzero entries in the corresponding matrices. Inorder to keep them more compact, this information is not repeated in the othertables.

tolerance

T size nnz 10−4 10−8 10−12

7 2891 17201 53 71 85

8 3304 19750 67 90 105

10 4130 24848 100 140 165

15 6195 37593 230 320 385

20 8260 50338 400 555 670

30 12390 75821 970 1366 1600

50 20650 126783 ∗ ∗ ∗The symbol ∗ indicates a failure to converge.

7 10 15 20 30

size

0

200

400

600

800

1000

1200

1400

1600

Mflops/it

10−4

10−8

10−12

Table 5.4: Average number of Mflops for BiCGSTAB.

We remark that the increase in the number of flops is less than the increasein the logarithm of the tolerance criterion. This seems to indicate that for ourcase the rate of convergence of BiCGSTAB is, as usually expected, more thanlinear. The work to solve the linear system increases with the size of the set ofequations; doubling the number of equations leads to approximately a fourfoldincrease in the number of flops.

Table 5.5 summarizes the results obtained with the QMR method. We chooseto report only the flop count corresponding to a solution with a tolerance of10−4. As for BiCGSTAB, the numbers reported are the average Mflops countsof the successive linear solutions arising in the Newton steps.

The increase of floating point operations is again linear in the size of the problem.


T tol=10−4

7 91

8 120

10 190

15 460

20 855

30 2100

50 9900∗

∗Average of the first twoNewton steps; failure to con-verge in the third step. 7 10 15 20 30

size

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Mflops/it

Table 5.5: Average number of Mflops for QMR.

The QMR method seems however, about twice as expensive as BiCGSTABfor the same tolerance level. The computational burden of QMR consists ofabout 14 level-1 BLAS and 4 level-2 BLAS operations, whereas BiCGSTABuses 10 level-1 BLAS and 4 level-2 BLAS operations. This apparently indicatesa better convergence behavior of BiCGSTAB than of QMR.

Table 5.6 presents a summary of the results obtained with the GMRES(m)technique.

Like in the previous methods, the convergence displays the expected superlinearconvergence behavior. Another interesting feature of GMRES(m) is the possibil-ity of tuning the restart parameter m. We know that the storage requirementsincrease with m and that the larger m becomes, the more likely the methodconverges, see [90, p. 867]. Each iteration uses approximately 2m + 2 level-1BLAS and 2 level-2 BLAS operations.

To confirm the fact that the convergence will take place for sufficiently large m,we ran a simulation of our model with T = 50, tol=10−4 and m = 50. In thiscase, the solver converged with an average count of 9900 Mflops.

It is also interesting to notice that long restarts, i.e. large values of m, donot in general generate much heavier computations and that the increase inconvergence may even lead to diminishing the global computational cost. Anoperation count per iteration is given in [90], which clearly shows this featureof GMRES(m).

Even though this last method is not cheaper than BiCGSTAB in terms of flops,the possibility of overcoming nonconvergent cases by using larger values of mcertainly favors GMRES(m).

Finally, we used the sparse LU solver provided in MATLAB. For general non-symmetric matrices, a strategy that this method implements is to reorder thecolumns according to their minimum degree in order to minimize the fill-in.On the other hand, a sparse partial pivoting technique proposed by Gilbet andPeierls [45] is used to prevent losses in the stability of the method. In Table 5.7,we present some results obtained with the method we have just described.


m = 10 m = 20 m = 30

tolerance tolerance tolerance

T 10−4 10−8 10−12 10−4 10−8 10−12 10−4 10−8 10−12

7 76 130 170 74 107 135 76 109 145

8 117 170 220 103 125 190 110 150 200

10 185 275 355 175 250 320 165 240 310

15 415 600 785 430 620 810 460 660 855

20 725 1060 1350 705 990 1300 770 1085 1450

30 2566 3466 4666 2100 2900 3766 2166 3000 3833

50 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗The symbol ∗ indicates a failure to converge.

7 10 15 20 30

size

0

400

800

1200

1600

2000

2400

××× ××

×

++++

+

+

Mflops/it

m

× 10+ 20 30

Table 5.6: Average number of Mflops for GMRES(m).

The direct solver obtains an excellent error norm for the computed solution,which in general is less than the machine precision for our hardware (i.e. about2·10−16). A drawback, however, is the steep increase in the number of arithmeticoperations when the size of the system increases. This results in our situationfavors the nonstationary iterative solvers for systems larger than T = 10, i.e.4130 equation with about 25000 nonzero elements. Another major disadvan-tage of the sparse solver is that memory requirements became a constraint sorapidly that we were not able to experiment with matrices of order larger thanapproximately 38000 with 40 Mbytes of memory. We may mention, however,that no special tuning of the parameters in the sparse method was performedfor our experiments. Careful control of such parameters may probably allow toobtain better performance results.

The recent nonstationary iterative methods proposed in the scientific comput-ing literature are certainly an alternative to sparse direct methods for solvinglarge and sparse linear systems such as the ones arising with forward lookingmacroeconomic models. Sparse direct methods, however, have the advantageof allowing the monitoring of the stability of the process and the reuse of thestructural information for instance, in several Newton steps.


T Mflops MBytes

7 91

8 160

10 295 13.1

15 730 19.8

20 1800 40.2

30 5133 89.3

50 ∗ ∗The symbol ∗ indicates that

the memory capacity has been

exceeded.

7 10 15 20 30

size

0

800

1600

2400

3200

4000

4800

Mflops/it

Table 5.7: Average number of Mflops for MATLAB’s sparse LU.

Appendix A

The following pages provide background material on computation in finite pre-cision and an introduction to computational complexity—two issues directlyrelevant to the discussions of numerical algorithms.

A.1 Finite Precision Arithmetic

The representation of numbers on a digital computer is very different from theone we usually deal with. Modern computer hardware, can only represent afinite subset of the real numbers. Therefore, when a real number is enteredin a computer, a representation error generally appears. The effects of finiteprecision arithmetic are thoroughly discussed in Forsythe et al. [38] and Gill etal. [47] among others.

We may explain what occurs by first stating that the internal representation ofa real number is a floating point number. This representation is characterizedby the number base β, the precision t and the exponent range [L, U ], all fourbeing integers.

The form of the set of all floating point numbers F is

F = f | f = ±.d1d2 . . . dt × βe

and 0 ≤ di < β, i = 1, . . . , n, d1 = 0, L < e < U∪ 0 .

Nowadays, the standard base is β = 2, whereas the other integers t, L, U varyaccording to the hardware; the number .d1d2 . . . dt is called the mantissa. Themagnitudes of the largest and smallest representable numbers are

M = βU (1 − β−t) for the largest,m = βL−1 for the smallest.

Therefore, when we input x ∈ R in a computer it is replaced by fl(x) which is theclosest number to x in F . The term “closest” is defined as the nearest numberin F (rounded away from zero if there is a tie) when rounded arithmetic is usedand the nearest number in F such as |fl(x)| ≤ |x| when chopped arithmetic isused.

A.2 Condition of a Problem 123

If |x| > M or 0 < |x| < m, then an arithmetic fault occurs, which, most of thetime, implies the termination of the program.

When computations are made, further errors are introduced. If we denoteby op one of the four arithmetic operations +,−,×,÷, then (a op b) is rep-resented internally as fl(a op b).

To show which relative error is introduced, we first note that fl(x) = x(1 + ε)where |ε| < u and u is the unit roundoff defined as

u =

12β(1−t) for rounded arithmetic,

β(1−t) for chopped arithmetic.

Hence, fl(x op y) = (x op y)(1+ε) with |ε| < u and therefore the relative error,if (x op y) = 0, is

|fl(x op y)− (x op y)||x op y| ≤ u .

Thus we see that an error corresponds to each arithmetic operation. This erroris not only the result of a rounding in the computation itself, but also of theinexact representation of the arguments. Even if one should do one’s best notto accumulate such errors, this is to be the most likely source of problems whendoing arithmetic computations with a computer.

The most important danger is catastrophic cancellation, which leads to a com-plete loss of correct digits. When close floating point numbers are subtracted,the number of significant digits may be small or even inexistent. This is dueto close numbers carrying many identical digits in the first left positions of themantissa. The difference then cancels these digits and the renormalized man-tissa contains very few significant digits.

For instance, if we have a computer with β = 10 and t = 4, we may find that

fl(fl(10−4 + 1)− 1) = fl(1 − 1) = 0 .

We may notice that the exact answer can be found by associating the termsdifferently

fl(10−4 + fl(1 − 1)) = fl(10−4 + 0) = 10−4 ,

which shows that floating point computations are, in general, not associative.

Without careful control, such situations can lead to a disastrous degradation ofthe result.

The main goal is to build algorithms that are not only fast, but above all reliablein their numerical accuracy.

A.2 Condition of a Problem

The condition of a problem reflects the sensitivity of its exact solution withrespect to changes in the data. If small changes in the data lead to large changesin the solution, the problem is said to be ill-conditioned.

A.2 Condition of a Problem 124

The condition number of the problem measures the maximum possible changein the solution relative to the change in the data.

In our context of finite precision computations, the condition of a problem be-comes important since when we input data in a computer, representation errorsgenerally lead to store slightly different numbers than the original ones. More-over, the linear systems we deal with most of the time solve approximate orlinearized problems and we would like to ensure that small approximation er-rors will not lead to drastic changes in the solution.

If we consider the problem of solving a linear system of equations, the conditioncan be formalized in the following: Let us consider a linear system Ax = b witha nonsingular matrix A. We want to determine the change in the solution x∗,given a change in b or in A. If b is perturbed by ∆b and the correspondingperturbation of x∗ is ∆xb so that the equation

A(x∗ + ∆xb) = b + ∆b

is satisfied, we then haveA∆xb = ∆b .

Taking the norm of ∆xb = A−1 ∆b and of Ax∗ = b, we get

‖∆xb‖ ≤ ‖A−1‖ ‖∆b‖ and ‖A‖ ‖x∗‖ ≤ ‖b‖ ,

so that ‖∆xb‖‖x∗‖ ≤ ‖A

−1‖ ‖A‖ ‖∆b‖‖b‖ . (A.1)

However, perturbing A by ∆A and letting ∆xA be such that

(A + ∆A)(x∗ + ∆xA) = b ,

we find∆xA = −A−1∆A(x∗ + ∆xA) .

Taking norms and rewriting the expression, we finally get

‖∆xA‖‖x∗ + ∆xA‖ ≤ ‖A

−1‖ ‖A‖ ‖∆A‖‖A‖ . (A.2)

We see that both expressions (A.1) and (A.2) are a bound for the relative changein the solution given a relative change in the data. Both contain the factor

κ(A) = ‖A−1‖ ‖A‖ ,

called the condition-number of A. This number can be interpreted as the ratioof the maximum stretch the linear operator A has on vectors, over the minimumstretch of A. This follows from the definition of matrix norms. The formulasdefining the matrix norms of A and A−1 are

‖A‖ = maxv =0

‖Av‖‖v‖

‖A−1‖ =

1

minv =0

‖Av‖‖v‖

.

A.3 Complexity of Algorithms 125

We note that κ(A) depends on the norm used. With this interpretation, it isclear that κ(A) must be greater than or equal to 1 and the closer to singularitymatrix A is, the greater κ(A) becomes. In the limiting case where A is singular,the minimum stretch is zero and the condition number is defined to be infinite.

It is certainly not operational to compute the condition number of A by theformula ‖A−1‖‖A‖. This number can be estimated by other procedures whenthe 1 norm is used. A classical reference is Cline et al. [23] and the LAPACKlibrary.

A.3 Complexity of Algorithms

The analysis of the computational complexity of algorithms is a very sophis-ticated and difficult topic in computer science. Our goal is simply to presentsome terminology and distinctions that are of interest in our work.

The solution of most problems can be approached using different algorithms.Therefore, it is natural to try to compare their performance in order to findthe most efficient method. In its broad sense, the efficiency of an algorithmtakes into account all the computing resources needed for carrying out its ex-ecution. Usually, for our purposes, the crucial resource will be the computingtime. However, there are other aspects of importance such as the amount ofmemory needed (space complexity) and the reliability of an algorithm. Some-times a simpler implementation can be preferred to a more sophisticated one,for which it becomes difficult to assess the correctness of the code.

Keeping these caveats in mind, it is nonetheless very informative to calculatethe time requirement of an algorithm. The techniques presented in the followingdeal with numerical algorithms, i.e. algorithms were the largest part of the timeis devoted to arithmetic operations. With serial computers, there is an almostproportional relationship between the number of floating point operations (ad-ditions, subtractions, multiplications and divisions) and the running time of analgorithm. Clearly, this time is very specific to the computer used, thus thequantity of interest is the number of flops (floating point operations) used.

The time requirements of an algorithm are conveniently expressed in terms of thesize of the problem. In a broad sense, this size usually represents the number ofitems describing the problem or a quantity that reflects it. For a general squarematrix A with n rows and n columns, a natural size would be its order n.

We will focus on the time complexity function which expresses, for a givenalgorithm, the largest amount of time needed to solve a problem as a functionof its size. We are interested in the leading terms of the complexity function soit is useful to define a notation.

Definition 3 Let f and g be two functions f, g : D → R, D ⊆ R.

1. f(x) = O(g(x)) if and only if there exists a constant a > 0 and xa suchthat for every x ≥ xa we have |f(x)| ≤ a g(x),

2. f(x) = Ω(g(x)) if and only if there exists a constant b > 0 and xb suchthat for every x ≥ xb we have f(x) ≥ b g(x),


3. f(x) = Θ(g(x)) if and only if f(x) = O(g(x)) and f(x) = Ω(g(x)).

Hence, when an algorithm is said to be O(g(n)), it means that the running timeto solve a problem of size n ≥ n0 is less than c1g(n) + c2 where c1 > 0 and c2

are constants depending on the computer and the problem, but not on n.

Some simple examples of time complexity using the O(·) notation are useful forsubsequent developments:

O(n) Linear complexity. For instance, computing a dot product of two vectorsof size n is O(n).

O(n2) Quadratic complexity generally arises in algorithms processing all pairsof data input; generally the code shows two nested loops. Adding twon× n matrices or multiplying a n× n matrix by a n× 1 vector is O(n2).

O(n3) Cubic complexity may appear in triple nested loops. For instance theproduct of two n× n full matrices involves n2 dot products and thereforeis of cubic complexity.

O(bn) Exponential complexity (b > 1). Algorithms proceeding to exhaustivesearches usually have this exploding time complexity.

Common low-level tasks of numerical linear algebra are extensively used in manyhigher-level packages. The efficiency of these basic tasks is essential to providegood performance to numerical linear algebra routines. These components arecalled BLAS for Basic Linear Algebra Subroutines. They have been grouped incategories according to the computational and space complexity they use:

• Level 1 BLAS are vector-vector operations involving O(n) operations onO(n) data,

• Level 2 BLAS are matrix-vector operations involving O(n2) operations onO(n2) data,

• Level 3 BLAS are matrix-matrix operations involving O(n3) operationson O(n2) data.

Levels in BLAS routines are independent in the sense that Level 3 routines donot make calls to Level 2 routines, and Level 2 routines do not make calls toLevel 1 routines.

Since these operations should be as efficient as possible, different versions areoptimized for different computers. This leads to a large portability of the codeusing such routines without losing the performance on each particular machine.

There are other measures for the number of operations involved in an algorithm.We presented here the worst-case analysis, i.e. the maximum number of opera-tions to be performed to execute the algorithm. Another approach would be todetermine the average number of operations after having assumed a probabilitydistribution for the characteristics of the input data. This kind of analysis is notrelevant for the class of algorithms presented later and therefore is not furtherdeveloped.


An important distinction is made between polynomial time algorithms, the timecomplexity of which is O(p(n)), where p(n) is some polynomial function in n,and non deterministic polynomial time algorithms. The class of polynomial isdenoted by P ; nonpolynomial time algorithms fall in the class NP . For an exactdefinition and clear exposition about P and NP classes see Even [32] and Gareyand Johnson [43]. Clearly, we cannot expect nonpolynomial algorithms to solveproblems efficiently. However they sometimes are applicable for some small val-ues of n. In very few cases, as the bound found is a worst-case complexity, somenonpolynomial algorithms behave quite well in an average case complexity anal-ysis (as, for instance, the simplex method or the branch-and-bound method).

Bibliography

[1] L. Adams. M-Step Preconditioned Conjugate Gradient Methods. SIAM J. Sci.Stat. Comput., 6:452–463, 1985.

[2] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis ofComputer Algorithms. Addison-Wesley, Reading, MA, 1974.

[3] H. M. Amman. Nonlinear Control Simulation on a Vector Machine. ParallelComputing, 10:123–127, 1989.

[4] A. Ando, P. Beaumont, and M. Ando. Efficiency of the CYBER 205 for Stochas-tic Simulation of A Simultaneous, Nonlinear, Dynamic Econometric Model. Iter-nat. J. Supercomput. Appl., 1(4):54–81, 1987.

[5] J. Armstrong, R. Black, D. Laxton, and D. Rose. A Robust Method for Simulat-ing Forward-Looking Models. The Bank of Canada’s New Quarterly ProjectionModel, Part 2. Technical Report 73, Bank of Canada, Ottawa, Canada, 1995.

[6] O. Axelsson. Incomplete Block Matrix Factorization Preconditioning Methods.The Ultimate Answer? J. Comput. Appl. Math., 12:3–18, 1985.

[7] O. Axelsson. Iterative Solution Methods. Oxford University Press, Oxford, UK,1994.

[8] R. Barrett et al. Templates for the Solution of Linear Systems: Building Blocksfor Iterative Methods. SIAM, Philadelphia, PA, 1994.

[9] R. J. Barro. Rational Expectations and the Role of Monetary Policy. Journalof Monetary Economics, 2:1–32, 1976.

[10] F. L. Bauer. Optimally Scaled Matrices. Numer. Math., 5:73–87, 1963.

[11] R. Becker and B. Rustem. Algorithms for Solving Nonlinear Models. PROPEDiscussion Paper 119, Imperial College, London, 1991.

[12] M. Beenstock. A Neoclassical Analysis of Macroeconomic Policy. CambridgeUniversity Press, London, 1980.

[13] M. Beenstock, A. Dalziel, P. Lewington, and P. Warburton. A MacroeconomicModel of Aggregate Supply and Demand for the UK. Economic Modelling,3:242–268, 1986.

[14] K. V. Bhat and B. Kinariwala. Optimum Tearing in Large Systems andMinimum Feedback Cutsets of a Digraph. Journal of the Franklin Institute,307(2):71–154, 1979.

[15] C. Bianchi, G. Bruno, and A. Cividini. Analysis of Large Scale EconometricModels Using Supercomputer Techniques. Comput. Sci. Econ. Management,5:271–281, 1992.

[16] L. Bodin. Recursive Fix-Point Estimation, Theory and Applications. SelectedPublications of the Department of Statistics. University of Uppsala, Uppsala,Norway, 1974.

BIBLIOGRAPHY 129

[17] R. Boucekkine. An Alternative Methodology for Solving Nonlinear Forward-looking Models. Journal of Economic Dynamics and Control, 19:711–734, 1995.

[18] A. S. Brandsma. The Quest Model of the European Community. In S. Ichimura,editor, Econometric Models of Asian-Pacific Countries. Springer-Verlag, Tokyo,1994.

[19] F. Brayton and E. Mauskopf. The Federal Reserve Board MPS Quarterly Econo-metric Model of the U.S. Economy. Econom. Modelling, 2(3):170–292, 1985.

[20] C. J. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equa-tions. Mathematics of Computation, 19:577–593, 1965.

[21] A. M. Bruaset. Efficient Solutions of Linear Equations Arising in a NonlinearEconomic Model. In M. Gilli, editor, Computational Economics: Models, Meth-ods and Econometrics, Advances in Computational Economics. Kluwer Aca-demic Press, Boston, MA, 1995.

[22] L. K. Cheung and E. S. Kuh. The Bordered Triangular Matrix and MiminumEssential Sets of a Digraph. IEEE Transactions on Circuits and Systems,21(1):633–639, 1974.

[23] A. K. Cline, C. B. Moler, G. W. Stewart, and J. H. Wilkinson. An Estimate forthe Condition Number of a Matrix. SIAM J. Numer. Anal., 16:368–375, 1979.

[24] P. Concus, G. Golub, and G. Meurant. Block Preconditioning for the ConjugateGradient Method. SIAM J. Sci. Stat. Comput., 6:220–252, 1985.

[25] J. E. Jr. Dennis and J. J. More. A Characterization of Superlinear Convergenceand its Application to Quasi-Newton Methods. Mathematics of Computation,28:549–560, 1974.

[26] J. E. Jr. Dennis and R. B. Schnabel. Numerical Methods for UnconstrainedOptimization and Nonlinear Equations. Series in Computational Mathematics.Prentice-Hall, Englewood Cliffs, NJ, 1983.

[27] H. Don and G. M. Gallo. Solving Large Sparse Systems of Equation in Econo-metric Models. Journal of Forecasting, 6:167–180, 1987.

[28] P. Dubois, A. Greenbaum, and G. Rodrigue. Approximating the Inverse ofa Matrix for Use in Iterative Algorithms on Vector Processors. Computing,22:257–268, 1979.

[29] I. S. Duff. MA28 – A Set of FORTRAN Subroutines for Sparse UnsymmetricLinear Equations. Technical Report AERE R8730, HMSO, London, 1977.

[30] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices.Oxford Science Publications, New York, 1986.

[31] I. S. Duff and J. K. Reid. The Design of MA48, a Code for the Direct Solution ofSparse Unsymmetric Linear Systems of Equations. Technical Report RAL-TR-95-039, Computer and Information Systems Department, Rutherford AppeltonLaborartory, Oxfordshire, August 1995.

[32] S. Even. Graph Algorithms. Computer Sience Press, Rockville, MD, 1979.

[33] R. C. Fair. Specification, Estimation and Analysis of Macroeconometric Models.Harvard University Press, Cambridge, MA, 1984.

[34] R. C. Fair and J. B. Taylor. Solution and Maximum Likelihood Estimation ofDynamic Nonlinear Rational Expectations Models. Econometrica, 51(4):1169–1185, 1983.

[35] J. Faust and R. Tryon. A Distributed Block Approach to Solving Near-Block-Diagonal Systems with an Application to a Large Macroeconometric Model. InM. Gilli, editor, Computational Economics: Models, Methods and Econometrics,Advances in Computational Economics. Kluwer Academic Press, Boston, MA,1995.

BIBLIOGRAPHY 130

[36] P. Fisher. Rational Expectations in Macroeconomic Models. Kluwer AcademicPublishers, Dordrecht, 1992.

[37] P. G. Fisher and A. J. Hughes-Hallett. An Efficient Solution Strategy for Solv-ing Dynamic Nonlinear Rational Expectations Models. Journal of EconomicDynamics and Control, 12:635–657, 1988.

[38] G. E. Forsythe, M. A. Malcolm, and C. B. Moler. Computer Methods for Math-ematical Computations. Prentice-Hall, Englewood Cliffs, NJ, 1977.

[39] R. W. Freund, G. H. Golub, and N. M. Nachtigal. Iterative Solution of LinearSystems. Acta Numerica, pages 1–44, 1991.

[40] R. W. Freund and N. M. Nachtigal. QMR: A Quasi-mininal Residual Methodfor Non-Hermitian Linear Systems. Numer. Math., 60:315–339, 1991.

[41] J. Gagnon. A Forward-Looking Multicountry Model for Policy Analysis: MX3.Jornal of Economic and Financial Computing, 1:331–361, 1991.

[42] M. Garbely and M. Gilli. Two Approaches in Reading Model Interdependencies.In J.-P. Ancot, editor, Analysing the Structure of Econometric Models, pages 15–33. Martinus Nijhoff, The Hague, 1984.

[43] M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to theTheory of NP-Completeness. W.H. Freeman and Co., San Francisco, 1979.

[44] J. R. Gilbert, C. B. Moler, and R. Schreiber. Sparse Matrices in MATLAB:Design and Implementation. SIAM J. Matrix Anal. Appl., 13:333–356, 1992.

[45] J. R. Gilbert and Peierls. Sparse Partial Pivoting in Time Proportional toArithmetic Operations. SIAM J. Sci. Statist. Comput., 9:862–874, 1988.

[46] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. AcademicPress, London, 1981.

[47] P. E. Gill, W. Murray, and M. H. Wright. Numerical Linear Algebra and Opti-mization. Advanced Book Program. Addison-Wesley, Redwood City, CA, 1991.

[48] M. Gilli. CAUSOR — A Program for the Analysis of Recursive and Interdepen-dent Causal Structures. Technical Report 84.03, Departement of Econometrics,University of Geneva, 1984.

[49] M. Gilli. Causal Ordering and Beyond. International Economic Review,33(4):957–971, 1992.

[50] M. Gilli. Graph-Theory Based Tools in the Practice of Macroeconometric Mod-eling. In S. K. Kuipers, L. Schoonbeek, and E. Sterken, editors, Methods andApplications of Economic Dynamics, Contributions to Economic Analysis. NorthHolland, Amsterdam, 1995.

[51] M. Gilli and M. Garbely. Matching, Covers, and Jacobian Matrices. Journal ofEconomic Dynamics and Control, 20:1541–1556, 1996.

[52] M. Gilli, M. Garbely, and G. Pauletto. Equation Reordering for Iterative Pro-cesses — A Comment. Computer Science in Economics and Management, 5:147–153, 1992.

[53] M. Gilli and G. Pauletto. Econometric Model Simulation on Parallel Computers.International Journal of Supercomputer Applications, 7:254–264, 1993.

[54] M. Gilli and E. Rossier. Understanding Complex Systems. Automatica,17(4):647–652, 1981.

[55] G. H. Golub and J. M. Ortega. Scientific Computing: An Introduction withParallel Computing. Academic Press, San Diego, CA, 1993.

[56] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins, Balti-more, 1989.

BIBLIOGRAPHY 131

[57] G. Guardabassi. A Note on Minimal Essential Sets. IEEE Transactions onCircuit Theory, 18:557–560, 1971.

[58] G. Guardabassi. An Indirect Method for Minimal Essential Sets. IEEE Trans-actions on Circuits and Systems, 21(1):14–17, 1974.

[59] A. Hadjidimos. Accelerated Overrelaxation Method. Mathematics of Computa-tion, 32:149–157, 1978.

[60] L. A. Hageman and D. M. Young. Applied Iterative Methods. Computer Scienceand Applied Mathematics. Academic Press, Orlando, FL, 1981.

[61] S. G. Hall. On the Solution of Large Economic Models with Consistent Expec-tations. Bulletin of Economic Research, 37:157–161, 1985.

[62] L. P. Hansen and T. J. Sargent. Formulating and Estimating Dynamic LinearRational Expectations Models. Journal of Economic Dynamics and Control,2:7–46, 1980.

[63] J. Helliwell et al. The Structure of RDX2—Part 1 and 2. Staff Research Studies 7,Bank of Canada, Ottawa, Canada, 1971.

[64] M. R. Hestenes and E. Stiefel. Method of Conjugate Gradients for Solving LinearSystems. J. Res. Nat. Bur. Stand., 49:409–436, 1952.

[65] F. J. Hickernell and K. T. Fang. Combining Quasirandom Search and Newton-Like Methods for Nonlinear Equations. Technical Report MATH–037, Departe-ment of Mathematics, Hong Kong Baptist College, 1993.

[66] High Performance Fortran Forum, Houston, TX. High Performance FortranLanguage Specification. Version 0.4, 1992.

[67] P. Hollinger and L. Spivakovsky. Portable TROLL 0.95. Intex Solution, Inc., 35Highland Circle, Needham, MA 02194, Preliminary Draft edition, May 1995.

[68] A. J. Hughes Hallett. Multiparameter Extrapolation and Deflation Methods forSolving Equation Systems. International Journal of Mathematics and Mathe-matical Sciences, 7:793–802, 1984.

[69] A. J. Hughes Hallett. Techniques Which Accelerate the Convergence of FirstOrder Iterations Automatically. Linear Algebra and Applications, 68:115–130,1985.

[70] A. J. Hughes Hallett. A Note on the Difficulty of Comparing Iterative Processeswith Differing Rates of Convergence. Comput. Sci. Econ. Management, 3:273–279, 1990.

[71] A. J. Hughes Hallett, Y. Ma, and Y. P. Ying. Hybrid Algorithms with Auto-matic Switching for Solving Nonlinear Equations Systems in Economics. Com-putational Economics, forthcoming 1995.

[72] R. M. Karp. Reducibility Among Combinatorial Problems. In R. E. Miller andJ. W. Thatcher, editors, Complexity of Computer Computations, pages 85–104.Plenum Press, New York, 1972.

[73] C. T. Kelley. Iterative Methods for Linear and Nonlinear Systems of Equations.Frontiers in Applied Mathematics. SIAM, Phildelphia, PA, 1995.

[74] J.-P. Laffargue. Resolution d’un modele macroeconometrique avec anticipationsrationnelles. Annales d’Economie et Statistique, 17:97–119, 1990.

[75] R. E. Lucas and T. J. Sargent, editors. Rational Expectations and EconometricPractice. George Allen & Unwin, London, 1981.

[76] R. E. Lucas, Jr. Some International Evidence on Output-Inflation Tradeoffs.American Economic Review, 63:326–334, 1973.

BIBLIOGRAPHY 132

[77] R. E. Lucas, Jr. Econometric Policy Evaluation: A Critique. In K. Brunnerand A. H. Meltzer, editors, The Phillps Curve and Labor Markets, volume 1 ofSupplementary Series to the Jornal of Monetary Economics, pages 19–46. NorthHolland, 1976.

[78] D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, Read-ing, MA, second edition, 1989.

[79] P. Masson, S. Symanski, and G. Meredith. MULTIMOD Mark II: A Revised andExtended Model. Occasional Paper 71, International Monetary Fund, Washing-ton D.C., July 1990.

[80] B. T. McCallum. Rational Expectations and the Estimation of EconometricModels: An Alternative Procedure. International Economic Review, 17:484–490, 1976.

[81] A. Nagurney. Parallel Computation. In H. M. Amman, D. Kendrick, and J. Rust,editors, Handbook of Computational Economics. North Holland, Amsterdam,forthcoming 1995.

[82] P. Nepomiastchy and A. Ravelli. Adapted Methods for Solving and OptimizingQuasi-Triangular Econometric Models. Anals of Economics and Social Measure-ment, 6:555–562, 1978.

[83] P. Nepomiastchy, A. Ravelli, and F. Rechenmann. An Automatic Method toGet an Econometric Model in Quasi-triangular Form. Technical Report 313,INRIA, 1978.

[84] T. Nijman and F. Palm. Generalized Least Square Estimation of Linear Mod-els Containing Rational Future Expectations. International Economic Review,32:383–389, 1991.

[85] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equationsin Several Variables. Academic Press, New York, 1970.

[86] L. Paige and M. Saunders. Solution of Sparse Indefinite Systems of LinearEquations. SIAM J. Numer. Anal., 12:617–629, 1975.

[87] C. E. Petersen and A. Cividini. Vectorization and Econometric Model Simula-tion. Comput. Sci. Econ. Management, 2:103–117, 1989.

[88] A. Pothen and C. Fan. Computing the Block Triangular Form of a SparseMatrix. ACM Trans. Math. Softw., 16(4):303–324, 1990.

[89] J. K. Reid, editor. Large Sparse Sets of Linear Equations. Academic Press,London, 1971.

[90] Y. Saad and M. Schultz. GMRES: A Generalized Minimal Residual Algorithmfor Solving Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comput., 7:856–869, 1986.

[91] T. J. Sargent. Rational Expectations, the Real Rate of Interest, and the NaturalRate of Unemployment. Brooking Papers on Economic Activity, 2:429–480, 1973.

[92] T. J. Sargent. A Classical Macroeconometric Model for the United States. Jour-nal of Political Economy, 84(2):207–237, 1976.

[93] R. Sedgewick. Algorithms. Addison Wesley, Reading, MA, 2nd edition, 1983.

[94] D. Steward. Partitioning and Tearing Systems of Equations. SIAM J. Numer.Anal., 7:856–869, 1965.

[95] J. C. Strikwerda and S. C. Stodder. Convergence Results for GMRES(m). De-partment of Computer Sciences, University of Wisconsin, August 1995.

[96] J. B. Taylor. Estimation and Control of a Macroeconometric Model with Ratio-nal Expectations. Econometrica, 47(5):1267–1286, 1979.

BIBLIOGRAPHY 133

[97] Thinking Machines Corporation, Cambridge, MA. CMSSL release notes for theCM-200. Version 3.00, 1992.

[98] A. A. Van der Giessen. Solving Nonlinear Systems by Computer; A New Method.Statistica Neerlandica, 24(1), 1970.

[99] H. van der Vorst. BiCGSTAB: A Fast and Smoothly Converging Variant ofBi-CG for the Solution of Nonsymmetric Linear Systems. SIAM J. Sci. Stat.Comput., 13:631–644, 1992.

[100] K. F. Wallis. Multiple Time Series Analysis and the Final Form of EconometricModels. Econometrica, 45(6):1481–1497, 1977.

[101] K. F. Wallis. Econometric Implications of the Rational Expectations Hypothesis.Econometrica, 48(1):49–73, 1980.

[102] M. R. Wickens. The Estimation of Econometric Models with Rational Expecta-tion. Review of Economic Studies, 49:55–67, 1982.

[103] A. Yeyios. On the Optimisation of an Extrapolation Method. Linear Algebraand Applications, 57:191–203, 1983.

solution of macroeconometric models

Documents