loureiro et al,2010pvar effet sptial .pdf

8/14/2019 Loureiro et al,2010pvar effet sptial .pdf

1/41

THE DYNAMICS OF LAND-USE IN BRAZILIANAMAZON

Paulo R. A. LoureiroAdolfo Sachsida

Mrio Jorge Cardoso de Mendona

Universidade de Braslia

Departamento de Economia

Srie Textos para Discusso

Department of Economics Working Paper 342University of Brasilia, November 2010

Texto No 342Braslia, Novembro de 2010


2/41

T HE DYNAMICS OF L AND -USE IN BRAZILIAN AMAZON

Mrio Jorge [email protected]

IPEA - Instituto de Pesquisa Econmica Aplicada (Brazil)

Paulo R. A. Loureiro

[email protected]

University of Brasilia (UnB)

Adolfo Sachsida

[email protected]

IPEA - Instituto de Pesquisa Econmica Aplicada (Brazil)

Abstract : This paper studies the dynamics of land-use in the Brazilian Amazon using a

structural vector autoregressive (SVAR) model. The heterogeneity in the data is controlled

by mean of fixed effect panel specification. Meanwhile, spatial autocorrelation is also

diagnosed by a statistical methodology that allows to us to split the model in subsamples(clusters) of more homogenous municipalities in order to re-estimate the model on separate

clusters. The clustering analysis shows that there are three clusters whose land-use patterns

are strongly different in an economical point of view. The first cluster identifies the pioneer

fronts; dedicated to logging, natural resources exploitation and slash-and-burn cultures, the

second cluster have grown a more diversified agriculture while the third cluster presents

most developed, intensive agriculture oriented municipalities.

Another distinctive feature of this article pertains to the assessment of the

contemporaneous causal order that exists among distinct land-uses. This permits to evaluate

the succession dynamics that derive from unexpected innovations in the process of soil

occupation by means of impulse response functions (IRFs). The IRFs applied for cluster 1

lead to the following results: (1) the new demand for cropping requires to clear new areas

of forest. This extra cleared land will be transformed in pasture land or fallow in the long
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]


3/41

term. This process can be considered a necessary outcome of the slash-and-burn

agriculture, a common practice in the Amazon; (2) Contrary to other studies we do not find

evidence that cattle ranching is the primary driver of deforestation; (3) the impact of a

shock of pasture land on itself is virtually null at the beginning but it augments substantially

over time not requiring to clear extra areas of forest land but rather competing with crop

land, and (4) it seems that if not for all the Amazon Basin, at least in this cluster, cattle

ranching and cropping could be competitive activities.

Key words : Brazilian Amazon, Vector autoregressive, Panel data, Spatial autocorrelation,

Impulse response functions.

JEL : Q24; Q34; Q56; C33; C38.

1. INTRODUCTION

Over the past two decades the international community has become aware of the global

and regional environmental risks associated to possible massive forest losses in the

Brazilian Amazon. The impacts on global carbon cycle, regional climate and the loss of

biodiversity are among the main consequences of extensive land-use change processes in

this region. Therefore a good understanding of land-use dynamics in the Brazilian Amazon

is a fundamental step before action whether controlling the problem or even change its

course towards a more sustainable path.

Several studies have already contributed to a better understanding of the underlying

processes that drive land-use change in the Amazon. One of the main difficulties of land-

use studies lies in the fact that relevant economic and natural processes are fundamentally

defined (and face constraints) at a fine geographic scale. This calls for an effort to bridgemethodological gaps within a triangle of three approaches: classical economic approach

(e.g. Angelsen, A., 1996), agent-based models (e.g. Soares-Filho & alli, 2006) and

econometric modelling [(Reis and Blanco, 1997); (Pfaff, 1999), (Reis and Guzmn, 1994),

inter allia]. Each of these approaches has advantages and drawbacks relative to their

respective research objectives. Here, we focus on the econometric approach. This


4/41

methodology, applied to land-use, has been criticized for its lack of economic consistency

(in comparison for example to sectoral or general equilibrium models), but it has

definitively the advantage to unveil the most out of existing data when the primary research

issue is to describe land-use dynamics in the whole Amazonian perimeter, including non-

optimal succession of uses from a strict economic standpoint.

Econometric and statistical methods have been used for example with the objective to

determine the main dynamical features between succeeding anthropogenic land-use

sequence, like, e.g., cropland, pastures, and regeneration, which constitute a typical pattern

of successive land-uses following land clearing. Indeed, in order to seek for deforestation

drivers, it is important to know up to what extent, in past deforestation trends, cropland or

pastures uses are more or less keen to immediately follow deforestation. Such trends mayalso differ in different frontier patterns, for example the eastern versus southwestern

frontiers of the Amazonian Basin.

In one of the first attempts to determine a succession profile for the whole Amazon,

Andersen et al. (2002, 1997) estimated a reduced form vector autoregressive (VAR) model

involving the municipalities of Brazilian Amazon over the period 1970 through 1985.

Concerning the use of this model to formulate environmental policies some remarks need

be stated. A first difficulty lies in the fact that spatial heterogeneity among cross-section

units (municipalities) has to be taken into account in order to contemplate the geographic,

environmental and economic diversity among those different units and the effect of their

mutual interactions. A second difficulty is related to the fact that land-use census data in the

Brazilian Amazon are only available in a few 5-years time steps.

The environment in the Amazon is subject to very strong dynamic forces that are

capable to change the patterns of land-use even in a moderately short period of five years,

that is, the period in which the data are collected. Combined with the fact the Andersen et

al. model is a reduced form VAR and not a structural model, it implies that the

contemporaneous (i.e. at each studied time step) relationships among distinct land-uses

occuring during a given period are not clearly treated. In this case, the model fails to


5/41

adequately estimate the succession dynamics that derives from contemporaneous causal

orders of land-use change. In other words, such an approach is not adequate to uncover

stylized facts on the short-run impacts of the identified exogenous sources in the land-use

process. For instance, an exogenous demand shock on agricultural product or meat prices

generates both a current and subsequent effect on soil occupation. In the same way a

technological shock or demographic shock can induce a new unexpected evolution of land-

use. However, the proper understanding of this kind of diffusion processes in Amazonian

land-occupation can be improved by using a model that captures the structural relations

among different land-uses .

In this paper we adopt the structural VAR (SVAR) model with panel data to assess

impacts of the identified exogenous sources. We expand and refine previous works totackle these metodological issues. First, we take into account the heterogeneity among land

units using the panel data 1 model [(Hsiao, 1995), (Baltagi, 1995), (Arellano, 2003), etc.],

mixing information concerning variation among individual units with variations taking

place over time. Second, in the structural VAR literature, the contemporaneous relationship

is identified on the basis of prior information supported by a-theoretical considerations.

Notwithstanding, another distinctive feature of this article is the employment of Directed

Acyclic Graphs (DAGs) to assess the contemporaneous causal order of the SVAR. Using

the selected orderings to identify the SVAR models we obtained their impulse response

functions (IRFs). The IR functions allow to model the dynamics of distinct land-uses

derived from an unexpected shock on the occupation of the Amazon Basin. Finally,

because the object of this study is clearly the spatial process that results from the complex

interplay of many phenomena occurring in a much extended spatial domain, it is obviously

a spatial phenomenon and we may expect spatial autocorrelation to be present in the data.

The methodology developed here aims to diagnose the extend of the problem so that in a

first step, the model may be re-estimated on subsamples of more homogenous groups of

municipalities. Moreover, this methodology allows us to propose a comprehensive

interpretation of the meaning of the clusters in an economic point of view.

1. Important sources of variation may be left out if the data is only pooled in a single (temporal or spatial)dimension, and more precise parameter estimates can be obtained in panel approaches that explore thevariability present in the data both across counties and within counties over time.


6/41

The article is organized as follows. The section 2 presents the methodology used to

analyse the dynamics of land-use introducing the distinct subjects involved in our analysis

such as structural and reduced form VAR, panel data method, spatial autocorrelation and

identification by DAGs. The database is described in section 3. The econometric results are

presented in sections 4 and 5. In section 4 we present many partial but important results

derived from estimation, spatial analysis and the identification procedure undertaken by

DAGs methodology. In section 5 we present the clustering and the IRFs analyses. Some

concluding remarks are stated in section 6.

2. M ETHODOLOGY

2.1. Overview

The need for a specific methodology allowing to deal with the contemporaneous

relationship existing among distinct land-uses was soon perceived by many authors.

Andersen et al. (1997, 2002) and Weinbold (1999) estimated a model of land-use using a

vector autoregressive (VAR) specification on the municipalities of Brazilian Amazon. This

model is given by the system of equations (1):

it it it it it it

it it it it it it

it it it it it it

fallow pasturecrop Dclear fallow

fallow pasturecrop Dclear pasture

fallow pasturecrop Dclear crop

31413121

21413121

11413121

(1) 2

The indexes i and t are associated, respectively, to each municipality and each time

period. Our point here is to reformulate this model in order to contemplate some interesting

issues about land-use in the Brazilian Amazon. The first issue is how to identify the

contemporaneous causal order underlying the land-use process. Any model that does not

2 Here it it it it fallow pasturecropclear and 1 it it it clear clear Dclear .


7/41

take this point into account is inappropriate in a policy perspective concerning land-use in

the Amazon. The second issue is that equation system (1) does not deal explicitly with the

heterogeneity in climate, soil quality, etc among the municipalities of the Brazilian

Amazon. As a third issue, we question the possibility to analyze adequately deforestation in

this dynamic framework. Finally, considering that the land-use change is a spatial

phenomenon the treatment of spatial autocorrelation in the data is an open issue.

In this paper, these questions are tackled in the following way. The two first problems

are treated simultaneously using an econometric model that contemplates spatial

heterogeneity among counties and contemporaneous causal order in the same framework.

This is performed in a structural vector autoregressive (SVAR) with panel data

specification. In order to model deforestation we include to system (1) an extra endogenousvariable associated explicitly to deforestation and hereafter designated by forest 3. Finally,

considering the difficulty to incorporate directly spatial autocorrelation in a SVAR model

with a panel structure, we chose a more progressive route where in a first step we carefully

study the spatial correlation in the residuals after a first estimation step. Indeed, the

literature either implements SVAR panel models (Favero, 2002) or spatial SVAR models

(Di Giacinto, 2003). The direct estimation of the model with a spatial component would

require further theoretical developments. Thus we rely on spatial autocorrelation diagnosis

tests and propose to split the sample into homogeneous sub-samples for re-estimation with

the idea to lower spatial autocorrelation.

2.2. VAR analysis: Reduced x Structural form and Identification

Considering that natural forest can be seen as another category of land-use, the

endogenous variables of our model can be represented by the following vector

),,,(),,,( 4321 it it it it it it it it forest fallow pasturecrop y y y y . Under this notation we propose to

use the following model to formalize the dynamics of land-use in Brazilian Amazon:

3 This variable is associated to natural land and considers both natural forest and natural pasture. More comments about the

variables used in this research appear in Section 4.


8/41


9/41

with ),0(~ N u where u is the reduced form of disturbance covariance matrix and it

is also assumed that ),0(~ , with a diagonal matrix. The relationship between

structural form and reduced form is based on the following identities, providing 0 A is

invertible, c Ab 10 , 11

01 A A B , t

10t Au and:

'', )())(( 101

01

0t t 1

0 A A A E A

(5).

Note that this representation does not allow to identify the effects of exogenous

independent shocks onto the variables, since reduced form residuals are contemporaneously

correlated (the matrix is not diagonal) 6. That is, the reduced form residuals t u can be

interpreted as the result of linear combinations of exogenous shocks that are notcontemporaneously (in the same instant of time) correlated. In evaluations of the model

(and economic policies) it only makes sense to measure exogenous independent shocks.

Therefore, it is necessary to present the model in another form where the residuals are not

contemporaneously correlated.

Thus, without additional restrictions on 0 A we cannot recover the structural form from

the reduced form because does not have enough estimated coefficients to identify an

unrestricted 0 A matrix. Therefore, we need to set up restrictions in order to identify and

estimate 0 A7. This procedure is named identification. It is possible to estimate the reduced

form parameters b , 1 B and consistently, but except for forecasting t Y given 1t Y , these

matrices are not the parameters of interest.

6 These shocks are primitive and exogenous forces, with no common causes, that affect the variables of the model.

7 The matrix 0 A cannot have, together, a number of free parameters bigger than the number of free parameters in the symmetricmatrix . If n is the number of endogenous variables of the model then, to satisfy the order condition for identification of 0 A ,it is necessary that the number of free parameters to be estimated in 0 A be no bigger than n(n-1)/2. When n is smaller thann(n-1)/2 the model is over-identified. There exists no simple general condition for local identification of the parameters of A.However, as has been shown by Rothenberg (1971), a necessary and sufficient condition for local identification of any regular

point in R n is that the determinant of the information matrix be different from zero. In practice, evaluations of the determinantof the information matrix at some points, randomly chosen in the parameter space, is enough to establish the identification of amodel.


10/41

Spirtes, Glymour, and Scheines (1993, 2000) [hereafter SGS] and Pearl and Verma

(1991) claimed that it is possible to make causal inferences based on associations observed

in non-experimental data without previous knowledge. These restrictions follow from

directed acyclic graphs (DAGs) estimated by the TETRAD software developed by SGS

using as input the covariance matrix of the reduced form disturbances. Moreover, if the

causal relations can be represented by DAGs, SGS have shown that under some weak

conditions the Markov Condition and distribution of random variables faithful to the

causal graph there exist methods for identification of causal relations that are

asymptotically (in sample size) correct 8 . The results of SGS are discussed in several

articles 9 In other words, this methodology enables us to read from the data the

contemporaneous relationships existing among distinct land-uses in the Amazon.

2.3. Spatial Autocorrelation

In order to assess spatial autocorrelation directly in system (3) some new elements

must be introduced in our analysis. A standard choice is to include in the mean process a

spatial autoregressive component that takes the spatial locations into account. This can be

done by the use of a contiguity or neighborhood matrix )( ijwW with ijw representing the

neighborhood between the sites i and j, such that 0ijw if the sites i and j are neighbors,

and 0ijw otherwise. This spatial weight matrix is a square matrix representing the spatial

context. It encodes the neighborhood relationship among the spatial units. The literature on

this subject is vast, we shall only retain here that the neighborhood relationships chosen by

the analyst may change the results 10. In this sense equation (3) may be rewritten in the

following way:

t t t t Y AY W cY A

1110 (6)

t t t W 2

8 Demiralp and Hoover (2003) evaluate the PC algorithm employed by TETRAD in a Monte Carlo study and conclude that it isan effective tool for the selection the contemporaneous causal order of SVARs.

9 Swanson and Granger (1997) were the first to apply graphical models to identify contemporaneous causal order of a SVAR,although they restrict the admissible structures to causal chains. Bessler and Lee (2002) use error correction and DAGs tostudy both lagged and contemporaneous relations in late 19 th and early 20 th century U.S. data.

10 See Anselin at al. (1988) for a detailed discussion.


11/41

where )( ij and )( ij are matrices and ),0(~ I t .

In this study we assume that0

which means that we focus our attention in random

spatial effect or spatial correlation in errors. The development of a spatial SVAR panel data

model in (6) is left for further research. We implement here the spatial analysis as a non

stationarity and we propose to split the sample to estimate separate models. In doing so we

expect to capture the land-use dynamics in homogeneous areas. The spatial autocorrelation

analysis is conducted on the mean residuals of the model, that is, the arithmetic mean of the

reduced form VAR model residuals 11 , because we are mainly interested in the splitting of

the sample over the whole period.

The methodology is implemented in four steps. First, spatial autocorrelation in the

mean residuals of the model is tested by the Moran scatterplot (Anselin, 1995). We check

two distinct contiguity structures (see below) to assess the stability of the results. That is, if

the results are coherent for those different neighborhood structures we assume enough

confidence to proceed. Second, the residuals are spatially smoothed meaning that current

values are replaced by the average of the neighbors conditional on the neighborhood

structure. Third, a principal component analysis (PCA) on the smoothed residuals is

conducted. This method is motivated in Benali and Escoffier (1990) who name it Local

Principal Component Analysis as a way to extract the systematic spatial features present in

the data by local smoothing and dimension reduction 12 . As a byproduct, this tends to

produce spatially connected clusters which is a desirable property of clusters for spatial

data. Fourth, from the LPCA results we compute by clustering homogenous areas to split

the model for re-estimation.

Because one may have some difficulty to grasp all the different steps of the analysis we

present a summary of the methodology we used in this paper. It can be described in the

following way

11 Formal tests on the whole residuals are left to posterior work.

12 Note that in their original paper, these authors develop the dual of this analysis which consists in substracting from the originaldata the average of the neighbors. This method is not discussed here.


12/41

a) In the first step we estimate the reduced form VAR for unrestricted sample;

b) With the residuals of VAR we analyse spatial autocorrelation throughout the

method described in section 2.3. Based on this analysis the sub-samples of

homogeneous areas (clusters) are selected;

c) Next we re-estimate the VAR on selected sub-samples. In this step the structural

VAR is recovered using the DAGs method to identify the contemporaneous causal

orders; and

d) Finally, the impulse response functions are computed.

3. Database

The main original source of the data available for this study comes from Brazilian National Agriculture Census elaborated by the Brazilian Institute for Geography and

Statistics (IBGE) which is usually conducted every five years. Others original data sources

used are the Industrial and Commercial Census that were also elaborated by the IBGE for

the same periods. The data were collected for the following for years 1970, 1975, 1980,

1985 and 1995 at the municipality level. The data were cleaned, harmonized and merged

with data of other sources by the Institute for Applied Economic Research (IPEA) managed

by the team of IPEADATA 13 . The original database includes data on economic,

demographic, ecological and agriculture variables.

In the Brazilian Amazon Basin a county can be subjected to ongoing change in its size

mainly during the expansion of the agricultural frontier in Amazon. This fact obstructs the

comparison between periods at county level. That is why the concept of Minimum

Comparable Area (MCA) 14 was introduced, which is the smallest stable spatial unity

during these five censuses that accommodates the changing county boundaries over the

panel. The aggregation of counties in the later census years, in order to match the county

area in 1970, is greatest in the more recently populated and sub-divided regions found in

the legal Amazon.

13 http://www.ipeadata.gov.br

14 For further details about the concept of MCA see Reis at al. (2007)


13/41

The agricultural censuses group all land into private land and public land. Private land

is stratified into eight categories according to agricultural use. These are (i) annual crops,

(ii) perennial crops, (iii) planted forest, (iv) planted pasture, (v) short fallow and (vi) long

fallow are classified in cleared land, while (vii) natural forest and natural pasture are

considered non-cleared land. A small category of private non-usable land (rivers,

mountains, etc.) is also considered non-cleared land. Finally, all land that is not claimed by

anyone is considered public land and by definition non-cleared.

Based on these definitions the dependent variables used in our land-use model fall into

one of the following four categories: cropland ( crop ), pasture ( pasture ), fallow ( fallow ) and

natural land ( forest ). Cropland covers annual crops, perennial crops and planted forest.Pasture is planted pasture only. Fallow land includes short fallow, long fallow and non-

usable land like roads, dams, etc. Natural land considers natural forest and natural pasture.

4. Econometric Results

4. 1. Regression Analysis

We propose in this section a consistent method to estimate the reduced form VAR

with panel data (4). Before derivation of the estimator, we must introduce notations to

accommodate the data sample. The observations related to each equation j can be

represented in the following way. For J j ,...,1 , '1111 ),...,,...,,...,( jNT jN T j j j y y y y y where

j y is a 1 NT vector of dependent variables associated to equation j . Let

'1 ),...,( jN j j the 1 N vector of individual effects and 1 NT the vector of noise

given by '1111 ),...,,...,,...,( jNT jN T j j j . We assume that individual effects are fixed. In

this sense the individual effect can be included into the set of parameters. Thus the error

covariance matrix is defined so that NT j jj j j I E 2' )( .


14/41


15/41

The residuals of the model estimated in the preceding section are analyzed as follows.

First, the mean residuals over the all periods are computed. Second, these mean residuals

are further centered and standardized. In short, the residuals are transformed into:

ie

ii s

i

eee

(8)

with s standing for standardized. This transformation is performed on each of the four

equations of the SVAR, that is residuals of the crop , pasture , fallow and forest equations, in

that order. In other words, the present analysis do not use the NJ-K observations of the

estimated model but the mean of the residual vector on the N spatial units: the 257 AMCs

of the Amazon Basin.

4.2.2 Spatial weights

The spatial weight matrix is the square N by N matrix representing the spatial context.

In this context we intend to test several neighborhood schemes to check the stability of our

computations. Note that in the literature it is generally admitted that the analyst may test

several neighborhood matrices and retain the most stable or meaningful results. This is why

we use two common neighborhood structures to compute our tests.

The most common way to introduce spatial weights is the contiguity matrix, which is

defined by the presence of a common frontier between two spatial units. We have then w ij =

1 if i and j share a common border, zero if not. The second form of neighborhood retained

here is the k-nearest neighbors matrix, (with k = 4 and k = 8). One easily sees that

contiguity is a symmetric relation, but not nearest neighbors. Consequently, the contiguity

matrix is symmetric but not the k-nearest neighbors matrix. The row sums are the degrees

of the incidence matrix of the graph associated to the neighborhood relation; in the

contiguity case, the degree is variable but in the k-nearest neighbors case it is constant and

equal to k. Symmetry is a desirable property in spatial analysis but generally it is lost via

normalization. Effectively, it is often more practical to standardize the neighborhood

matrices, and this is done by dividing each row by the degree, that is, the number of

neighbors.


16/41

4.2.3. Moran test

The Moran spatial autocorrelation coefficient is defined as the ratio of the local

autocovariances to the total variance of the series:

2 x x

x x x xw

S n

I i

jiij (9)

In other words it is the ratio of the covariance between neighbors to the variance of the

whole series (complete graph): it is the fraction of total variance accounted by spatial

neighborhood relationship. For the neighbors the normalization constant is S = ijw i,j, i. e.,

the total number of links in the neighborhood matrix. The computation and visualization of

the statistic can be greatly enhanced by the Moran scatterplot where the Moran I statistic is

computed on the standardized residuals. From equation (8), let us call S ii e z ; given a row

standardized neighborhood matrix W, then each row sums to one, and as there are exactly

N rows, S=n, thus the first term of equation (8) is equal to one, to give in matrix form:

zz

zWz I t

t

(10)

Then Moran I statistic can be easily expressed as a regression coefficient, as one can

verify that if we pose: Wz y , and noting that the denominator of (10) may be written as

1 zz t , we obtain:

zy zz I t t 1 (11)

This offers an easy way to compute Moran I, which is simply the slope coefficient of

the regression equation (11) on the variable Wz, that is, one simply regresses y=Wz on z

without a constant. The dependent is the spatially smoothed variable Wz and the


17/41

independent is the original (standardized) variable. The standardization allows a clear

interpretation of the regression graphs (see below). This phenomenon is easily understood

in this context as Anselin (2003) puts it: he quotes deforestation studies as an exemplary

case for spatial autocorrelation left in the residuals of an econometric model because of non

observed variables.

4.2.4. Spatial autocorrelation analysis

The spatial correlation coefficients are summed up in Table 1 by contiguity structure.

The computations were performed as explained in section 2.5, equation 3 and 4. Thus,

neighborhood matrices are row standardized so that direct estimation by regression is

possible. Computations where performed in GEODA (Anselin & al, 2005) and GRETL for

comparison purposes and also because of a minor technical problem 17.

Table 1: Moran coefficients by contiguity structure

Neighborhood

matrixVariable

GRETL GEODA

Moran

I

Bootstrap p-value

for t

Moran

I

Bootstrap p-value

for t

First order

CROP 0.066 0.023** 0.066 0.007***

PASTURE 0.303 0.000*** 0.303 0.002***

FALLOW 0.306 0.000*** 0.307 0.001***

FOREST 0.266 0.000*** 0.266 0.003***

4NN

CROP 0.035 0.153 0.035 0.020**

PASTURE 0.124 0.000*** 0.124 0.004***

FALLOW 0.239 0.000*** 0.239 0.001***

FOREST 0.123 0.000*** 0.123 0.006***

8NN

CROP 0.026 0.113 0.026 0.030**

PASTURE 0.102 0.000*** 0.102 0.001***

FALLOW 0.239 0.000*** 0.239 0.001***

17 In the first version of this paper we could not use GEODA to compute the Moran Is for the contiguity matrix becauseGEODA could not built properly the matrix because of defects in the shapefile of the Amazon Basin. These has now beencorrected and the full results presented here.


18/41

FOREST 0.063 0.003*** 0.063 0.014**

** significant at 5 % ; *** significant at 1 % or less. All tests

performed with 999 permutations.

Spatial autocorrelation is present and significant for most variables. For example,considering the contiguity neighborhood, it is lower for the crop equation (~ 0.07); but

quite higher for the three other variables: forest (~ 0.27), pasture (~ 0.30) and fallow (~

0.31) which is the highest of all four spatial autocorrelation coefficients computed for both

software application and neighborhood structure. The magnitudes of these autocorrelation

coefficients may seem modest, but because they relate to residuals estimates, they are in

fact rather high. We can expect these to decrease for both neighborhood structures (higher

order of contiguity and number of near neighbors).

There are slight differences between neighborhood structures, but the conclusions are

convergent between the two software implementations except for the crop residuals which

are not significant under the 4 and 8 nearest neighbors in GRETL. These differences are

due to the different implementations of the permutation tests between the softwares.

According to GEODA, all neighborhood structures show significant spatial autocorrelation

of the mean residuals for all equations but not in the case of GRETL. However, the orders

of magnitude of the computed coefficients are identical between software implementationsat least up to the second decimal place and often to higher places.

There is thus a relatively good stability of the estimates among neighborhood structures

and softwares, because the magnitude of the computed coefficients leads to a similar

ordering of spatial autocorrelation present in the residuals. Both for contiguity, and nearest

neighbors, the crop is the lowest followed by forest , then pasture and fallow which is

always highest.

The magnitude of the autocorrelation coefficients for the nearest neighbors matrices

decreases slowly for all variables except for fallow . This may be explained by the

increasing smoothing effect induced by the growing number of neighbors which lower the

difference to the average at each location and thus the Moran I. However, this effect does

not seem to affect the fallow equation mean residuals. For this variable, it appears that


19/41

spatial autocorrelation may be both stronger than for the other variables (at a given distance

or contiguity order) but also that it may be felt up farther away 18.

To sum up, there is significant autocorrelation for all residuals, with fairly robust

conclusions. The fallow equation residuals show the highest and strongest spatial

autocorrelation among the four series, this may indicate that the SVAR specification may

lack explanatory power for this variable, but seem better for the others variables.

4.2.5. Clustering Analysis

This step allows splitting the panel of MCAs in the Amazon Basin as a way to reduce

spatial autocorrelation that may be present in the residuals. The same spatially smoothed

data are used to perform the PCA on the 4 vectors of spatially smoothed residuals. The

smoothed mean residuals are defined by:

si

jii

si eW k

e ~

* 1 (11)

which is technically the same as variable y=Wz above. That is, the stared (spatially

smoothed) residuals are the average of the neighboring residuals. The computations are

done on the non symmetric k-nearest neighbors, with k=4,8 and for the order one contiguity

matrix. The results for k=8 do not differ from those relative to k=4, so they are not

presented here at this stage. The reader shall note that the splitted samples are exactly the

same for both these orders of neighborhood.

From the 4 vectors, the first two principal components explain 85 % of the variability in

the data. Now on we apply the k-means algorithm on the two first principal components.

Three clusters are found to be optimal. The clusters are depicted on map 1 below. Table 3

in Appendix 3 gives the bootstrapped Moran I computations for cluster I and II of this map.

Here cluster I is composed of 156 municipalities and cluster II 75 municipalities. The 26

18 We note this fact for further investigation.


20/41


21/41


22/41

use in Brazilian Amazon. This why we restrict the use of prior theoretical restrictions in

order to identify the matrix of contemporaneous relationship 0 A .

4. 3. Identification

4.3.1. The Identification Problem

Structural inference and policy analysis employing VAR model require

differentiating between correlation and causation, an issue known as the ide ntification

problem. The practice in the literature has been the use of identifying assumptions based

on economic theory or institutional knowledge to sort out the contemporaneous links

among the variables in order to allow correlations to be interpreted causally. Given an

exogenous impact on a certain category of land, it dictates the subsequent constraints by

which the mechanism of land-use takes place during the period of five years after this

exogenous shock. In short, it detects how the other categories of land-use react given a

primary impulse on a certain type of land-use. In this paper it can be associated to the

drivers or prime forcers of the soil occupation in this region.

For example, credit and fiscal subsidies to the agriculture jointly with the expansion

of the road network pushed the agricultural frontier in the northwestern part of the Amazon

Basin while the colonization programs aided to fix people in the interior. Strong incentives

were created to clear land for pasture. In fact, the growth of cattle herds has consistently

been cited as one of the primary factors behind the land clearing in Amazon. Regarding the

problem of the identification this analysis implies that the subsequent effect of an

enlargement of the area of pasture has a contemporaneous effect on the other land-uses,

most likely on forest land where new areas are required to be opened. Notwithstanding we

need to check if it really happen.

4.3.2. The Identification using DAGs


23/41

Based on the results of the spatial autocorrelation analysis we re-estimate the reduced

form VAR using the data related only for the biggest cluster (156 MCAs) because we doubt

that the other clusters have enough information to warrant the computations of the results.

Applying TETRAD at the 0.5% significance level 19 on the estimated disturbance

covariance matrix

and assuming that the variables selected for the model are causally

sufficient 20 , we obtain what is known as a pattern 21 . The pattern is a graphical

representation of the set of observationally equivalent DAGs containing the

contemporaneous causal ordering of the variables.

The DGAs detected four valid representations of the contemporaneous causal ordering.

In accordance with this pattern three of the contemporaneous causal ordering display

causality in one direction while one indicates causal ordering exists but it does not allow to

know which one is the true 22. In this case we produced the IRFs of both identifications

separately. Fortunately we did not note any relevant difference between them 23 . The

relations derived from DGAs enter in matrix 0 A as restrictions that will help to identify the

contemporaneous relationships 24 . The figure 1 shows the matrix 0 A identified using the

causal ordering obtained from DGAs 25 . In this matrix one observes seven identified

conditions represented by zeros. Because more the VAR has four variables the matrix 0 A

requires six conditions for identification. Hence this matrix is over-identified 26.

19 The significance level cannot be interpreted as the probability of type I error for the pattern output, but merely as a parameterof the search. Based on simulation tests with random DAGs, SGS suggests setting the significance level at 20% for samplesize smaller than 100; at 10% for sample size between 100 and 300; and at 0.5% (or smaller) for larger samples. Here wefollowed their suggestion. However, slight changes in the significance level can produce large variations in TETRADs output.

20 A set of variables V is said to be causally sufficient if every common cause of any two or more variables in V is in V .TETRAD has a bias towards excluding causal relations present in the data, to overcome this problem it is suggested that a20% significance level be used.

21 A pattern is a partially oriented DAG, where the directed edges represent arrows that are common to every member in theclass, while the undirected edges are directed one way in some DAGs and another way in others. Undirected edges ( ) meanthat there is causality in one of the two directions but not on both, while double oriented edges ( ) mean causality on bothdirections.

22 The DAGs display the following pattern : pastures crops, fallow crops, forest crops and pasture forest .23

The results can be obtained under request.

24 In appendix 1 we show how to apply the causal ordering obtained by DGAs to identify matrix 0 A .

25 We assume that pasture affects forest contemporaneous but the reverse is not true.

26 See footnote 9.


24/41

FIGURE 1

10001000010

1

42

141312

0

A forest fallow past

A A Acrops

forest fallow past crops

A

5. Analysis of Empirical Results

5.2. An economic interpretation of the partition in two clusters

Given a partition built by a clustering on the mean residuals of the unrestricted

VAR model, i.e. the estimates on the whole sample of 257 Amazon municipalities 27 we

would like to know the most salient features of the partitions. In this aim, we use the notion

of valeurs -test proposed in the French school of Exploratory Data Analysis. A valeur -

test is the e quivalent of a t-stat computed on the means of the clusters, provided the that

these variables are external to the computations (Lebart & al., 1995). It is then a test of

significance of the difference of means in the clusters to the mean on the whole sample; the

test is done in the context of sampling without replacement. In this work we consider avariable mean in a cluster to significantly differ from the population mean if the valeur-test

is greater than 2.0 in absolute value. This means that a variable for which a test -score is

greater than 2.0 conveys significant information on the feature pertaining to the given

variable in the cluster. The usual and natural interpretation of these statistics carries over: a

negative score indicates a significantly inferior mean in the cluster and conversely for a

positive value.

We computed different statistics for each separate year of census and also from all

the years. This gives an interesting perspective about the evolution of the clusters

27 The data form a panel consisting in the 257 municipalities of Legal Amazon for the space domain and 5 dates in the timedomain. The dates are separated by about five years which where years of censuses. Let us stress that the partitions where builtfrom the mean residuals of the SVAR model estimated on all the panel, thus the partitions are generated on 257 meanresiduals.


25/41

characteristics through time: the partitions are fixed (fixed zoning) but test-scores may vary

at different dates. This would allow to sort out effects that may be present in the long run

(30 years) from effects that are felt differently in time or more from more recent times.

The t-scores on the illustrative attributes are used to describe the most interesting

features of the clusters in each of the two partitions. To ease the exposition, the different

attributes have been regrouped under the different topics: Demography, Agricultural

Economy, General Economy, Education, Infrastructure, Natural Environment and Land-

Use. The complete table is reported in Appendix 3. The reader is warned that not all

variables may be present for all the years, we report and comment statistics for separate

years on the available variables; note that some attributes are present on all years. For each

partition a t-score was computed on all the attributes and every cluster in the partition. Thenumber of municipalities of each cluster is indicated as a number in parentheses under the

cluster number. The t-scores have been computed on a set of 39 variables in the panel

database, on all the years. In those computations, means and standard errors were computed

using total population as a weighting variable 28.

It is interesting to present a brief description of the clusters by topic and then compare

the two partitions and comment those results. In accordance with the table of t-scores in

Appendix 3 the demographic topic t-scores show that population and consequently

dwellings are significantly higher than the average in group 1, lower and group 2, and not

significantly different from the average in group 3. Thus, the biggest cluster of 156

municipalities has a higher population, while the second is significantly less populated. We

shall see below that this is also true for the size of municipalities. For this we would have to

compute t-scores for the shares of dwellings with sanitation and electric lighting per group.

The agricultural economy profile of t-scores shows that, in cluster one, even as we haveseen average population is higher than cluster two, agricultural investment is the lowest,

while it is still lower than the average in cluster two and far higher in cluster three. The

same is true for bovine livestock, total irrigated land area and number of tractors. On the

28 More detailed results by year are available from the authors.


26/41

contrary, labor force in the agricultural sector is higher than the average in cluster three,

lower in cluster two and insignificant in cluster three. The value of wood production differs

only in cluster two, and the average rental value of land is higher in the municipalities of

cluster one, lower in those of cluster two, and not different from the average in cluster

three. However, the volume of commercialized wood is the most significantly different

from the average in cluster one, with a t-score of +27.31, lowest in cluster two (-8.75), and

significantly under the average in cluster 3. However, the average wood production

potential (cubic meters per hectare) is essentially the same in clusters one and two and it is

far above the average. Thus, it is quite clear that municipalities of cluster one are largely

devoted to logging production, having a higher population and total employment in the

agricultural sector, while those of cluster two have a more diversified agriculture that is less

reliant on logging: these have even a larger than average surface of irrigated land while being less populated on the average but still retaining a similar logging potential. Finally,

the average purchase price of live beef is significantly higher than average in cluster one,

and lower in cluster two.

The general economy profile of the clusters shows significant differences: higher share

of wages in group one, lower in group 2; significantly lower average wages in group one,

average in group three and far higher in group 3; lower industrial wages in group two but

not significantly different from the Amazon mean in groups 1 and 3; Interestingly, a very

significantly lower share of government/Banco do Brasil financing in cluster 1 but both

higher in clusters 2 and 3 and the lowest SUDAM financing for cluster one, lower than

average in group 2 and of course, significantly higher in group 3. These differences in

economic profile illustrate differences of economic structures.

Consider now the education and infrastructure profiles. Surprisingly, the share of

persons with five years of schooling is far above the Amazon average in group 1, and

significantly lower in group 2; the same is observed with the average number of years of

education in the population. On the paved roads length, cluster one shows no difference to

the average, but significantly less unpaved roads, and both lower for cluster two. There are

slight differences in infrastructure between clusters one and two, and cluster three has

significantly higher level of infrastructure. The contrast is the greatest between cluster one


27/41


28/41

Group 2: the margins; older pioneer fronts that have grown a more diversified

agriculture;

Group 3: the oldest, most developed, intensive agriculture oriented municipalities.

5.2. Analysis of IRFs for the biggest cluster (156 MCAs)

In this section we use the selected causal ordering obtained by DAGs to identify the

structural model on the biggest cluster of 156 MCA in order to build up the impulse

response functions. This procedure allows assessing the dynamic effect on different types

of land-use that derived from an unexpected innovation in a specific land-use. It can be

used to evaluate the impacts on areas of pasture, fallow and forest due to an unexpected

event on crop land. The dynamic effect includes not only the future impacts but also the

contemporaneous responses.

An important point refers to the fact that economic analysis can be indirectly performed

by means of the proper interpretation of the innovation. For instance, an unexpected

expansion of credit and subsidies implemented by a government policy aimed at promoting

agricultural activities can be seen as an innovation on the variable crop. Unexpected

movements of price can be also incorporated in this analysis like, for example, a rise in the

price of t imber may cause a growth of deforestation. In the same sense, an elevation of the

price of meat derived from a demand shock can be linked in the current context like a shock

on pasture land.

The IRFs are showed in figure 2. This figure must be viewed like a matrix in which

each graph represents the effect of a shock of the land-use of column j on the land-use ofrow i. We start the analysis of IR functions checking the consequences of a shock on crops.

In this case there exists an immediate effect on forest land. One can still see that there is lag

effect on fallow and pasture about six periods after. The new demand for cropping requires

to open new areas of forest and due to the poor quality of Amazons soil, crop land will be

transformed in the future in pasture land or left as fallow. This process is in accordance to


29/41

Smith and alli (1997) and can be considered a necessary outcome for the slash-and-burn

agriculture, a common practice in the Amazon 30

Figure 2

Impulse Response Functions

C r o p s

P a s t

P o u s

i o

F o r e s t

2 4 6 8 10 12

-2

-1

0

1

x 104Crops

2 4 6 8 10 12

-2

-1

0

1

x 104Past

2 4 6 8 10 12

-2

-1

0

1

x 104Pousio

2 4 6 8 10 12

-2

-1

0

1

x 104Forest

2 4 6 8 10 12

0

1

2

3

4

x 106

2 4 6 8 10 12

0

1

2

3

4

x 106

2 4 6 8 10 12

0

1

2

3

4

x 106

2 4 6 8 10 12

0

1

2

3

4

x 106

2 4 6 8 10 12

0

2

4

6x 10

4

2 4 6 8 10 12

0

2

4

6x 10

4

2 4 6 8 10 12

0

2

4

6x 10

4

2 4 6 8 10 12

0

2

4

6x 10

4

2 4 6 8 10 12

0

0.5

1

1.5

2

x 105

2 4 6 8 10 12

0

0.5

1

1.5

2

x 105

2 4 6 8 10 12

0

0.5

1

1.5

2

x 105

2 4 6 8 10 12

0

0.5

1

1.5

2

x 105

Contrary from the other studies, we do not find evidence that cattle ranching is the

primary driver of deforestation. The contemporaneous response of the variable forest to animpulse on pasture land is in fact positive but small compared to the response of this same

variable to a shock on crops land. As a matter of fact, it can be seen that the area of forest

30 The cycle of slash-and burn agriculture can be briefly described in the following way. In the first place, it begins withdeforestation targeting the implementation of agricultural activities involving many perennials crops. In the following stage,when the soil fertility declines, the site is used as pasture land. The cycle finishes when the soil is entirely exhausted and

pasture area is converted into fallow land.


30/41

grows in the future (figure 2, graph at row 4, column 2). As we expected the effect on the

area of fallow in response to the impulse on pasture land occurs fundamentally many

periods after the shock (graph at row 3, column 2). The figure also shows that the impact of

a shock on pasture land on itself is almost null initially but augments substantially over

time (graph at row 2, column 2) ; it do not require extra areas of forest land to clear but

rather competes with crop land (compare, graphs at row 1 and 2, column 2). In this sense

pasture affects crops not only in the current period but also in the longer term while this

long run impact appears considerably stronger than its contemporaneous counterpart. Our

results suggest evidence that, at least in this cluster, if not for all the Amazon Basin, that

cattle ranching and cropping could be competitive activities.

6. Concluding Remarks.

This paper studies the dynamics of land-use in the Brazilian Amazon using a

structural vector autoregressive (SVAR) model. The heterogeneity in the data is controlled

by the mean of fixed effect panel data. This method allows taking into account the specific

features like geographic, environmental, economic diversity present among the different

sites. Considering that the land-use is also determined by the interaction of the counties

among themselves, possible spatial autocorrelation in the data must be taken into account.

In this case spatial autocorrelation is diagnosed by a statistical methodology that allows to

us to split the model in sub-samples of more homogenous municipalities so as to re-

estimate the model on these. In doing so, cluster analysis shows that there are three clusters

whose land-use patterns are strongly different.

Furthermore, the illustrative attributes are used to describe the most interesting

features in each of the three clusters. To ease the exposition, the attributes have been

regrouped under the distinct topics such like demography, agriculture, education,

infrastructure, environment, etc. In this way it can be said that the cluster 1 identifies the

pioneer fronts; dedicated to logging, natural resources exploitation and slash-and-burn


31/41


32/41

Arellano, M. (2003) . Panel Data Econometrics. Oxford University Press.

Arellano, M. And Bond, S. (1991) . Some Tests of Specification for Panel Data: Monte

Carlo Evidence and Application to Employment and Equations. Review of Economic

Studies, 58: 277-297.

Angelsen, A. (1996). Deforestation: Population or Market Driven? Different

approaches in modelling agricultural expansion . CMI Working Papers 1996:9, Michelsen

Institute, Bergen, Norway.

Awokuse, T. and Bessler, D. (2003). Vector Autoregressions, Policy Analysis, and

Directed Acyclic Graphs: an Application to the U.S. Economy . Journal of Applied

Economics VI(1): 1-24.

Baltagi, B. H. (1995). Econometric Analysis of Panel Data. John Wiley and Sons.

Bessler, D. and Lee, S. (2002). Money and Prices: U.S. Data 1869-1914 (A Study with Directed Graphs) . Empirical Economics 27: 427-446.

Binswanger, H.P. (1991). Brazilian Policies that Encourage Deforestation in the

Amazon. World Development, 19(7): 821-829.

Browder, J. O. (1988). The Social Cost of Rainforest Destruction: a critique of the

hamburger debate . Intercience, 13: 115-120.

Case, A. C. (1991). Spatial Patterns in Household Demand . Econometrica, 59 (4): 953-

965.

Demiralp, S. and Hoover, K. (2003). Searching for the Causal Structure of a Vector

Autoregression. Oxford Bulletin of Economics and Statistics 65 (Supplement): 745-767.

Enders, W. (1995). Applied Econometric Time Series, John Wiley and Sons.

Faminow, M. D. (1998). Cattle, Deflorestation and Development in Latin American.

CAB International.

Fearnside, P. M. (2006). Desmatamento na Amaznia: dinmica, impactos e controle .

Acta Amaznica, 36(3): 395-400.

Hamilton, J. (1993). Time Series Analysis. Princeton University Press.

Hsiao, C. 1995. Analysis of Panel Data . Cambridge University Press.

Jacobs, R. Leamer, E. E. and Ward, M. P. (1979). Difficulties with Testing for

Causation. Economic Inquiry, 17: 401-413.


33/41

Lebart, L., Morineau, A., Tabart, N. (1995). Statistique Exploratoire

Multidimensionnelle . Dunod, Paris.

Mahar, D. (1989). Government Policies and Deforestation in Brazil's Amazon Region .

Washington, DC: World Bank

Myers, N. (1994). Tropical Deforestation Rates and Patterns . In: Brown, K and Pearce,

D. (eds.) The causes of tropical deforestation, the economic and statistical analysis of

factors giving rise to the loss of tropical forests, 172-91. University College London Press.

Nepstad, D. C., Moreira, A. G. e Alencar, A. A. 1999 A Floresta em Chamas: Origens,

Impactos e Preveno de Fogo na Amaznia. Braslia: Programa Piloto para a Proteo

das Florestas Tropicais do Brasil .

Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University

Press.Pearl, J. and Verma, T. (1991). A Theory of Inferred Causation. In J. Allen, Fikes, R.

and

Pfaff, A.S. 1999. What drives deforestation in the Brazilian Amazon? Evidence from

Satellite and Socioeconomic Data. Journal of Environmental Economics and Management,

37(1): 26-43.

Reis, E. J. and Blanco, F. A. (1997). The Causes of Brazilian Amazon Deforestation .

Working Paper, IPEA.

Reis, E and Guzmn, R. 1994. An Econometric Model of Amazon Deforestation . In:

Brown,K and Pearce, D. (eds.) The causes of tropical deforestation, the economic and

statistical analysis of factors giving rise to the loss of tropical forests, 172-91. University

College London Press, London.

Reis, E. J., Pimentel and Alvarenga, A. I. (2007). Areas Minimas Comparaveis entre os

Anos de 1920 a 2000. WP NEMESIS.

Robins, J., Sheines, R., Spirtes, P. and Wasserman, L. (2003). Uniform Consitency in

Causal Inference . Biometrika 90(3): 491-515.

Rothenberg, T.J. (1971) Identification in Parametric Models. Econometrica, VOL. 39,

NO. 3-May, 1971. pp. 577-592.


34/41

Smith, J., Winograd, M., Gallopin, G. and Pachico, D. (1998). Dynamics of the

Agricutural Frontier in the Amazon: analysing the impact of policy and tecnology.

Environmental Modelling and Assessment, 3:31-46.

Spirtes, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction, and Search .

Lecture Notes in Statistics 81. Springer-Verlag.

48(4): 555-568. 2000). Causation, Prediction, and Search. MIT Press.

Soares-Filho, B. S.; Nepstad, D.; Curran, L.; Voll, E.; Cerqueira, G.; Garcia, R. A.;

Ramos, C.; McDonald, A.; Lefebvre, P.; Schilesinger, P. Modeling Conservation in the

Amazon Basin . Nature, 440: 520-523, 2006.

Scheneider, R. R. (1992). An Economic Analysis of the Environmental Problem in the

Amazon . Draft, World Bank.

Swanson, N. and Granger, C. (1997). Impulse Response Functions Based on a Causal Approach to Residual Orthogonalization in Vector Autoregressions . Journal of the

American Statistical Association 92(437): 357-367.

Toiolo, A. and Uhl, C. (1995). Economic and Ecological Perspectives on Agriculture in

Eastern Amazon . World Development, 23: 959-973.

Walker, R. and Homma, A. K. O. (1996). Land-use and land Cover Dynamics in

Brazilian Amazon: an overview. Ecological Economics, 18: 67-80.

Wassenaar, T., Gerber, P. Verburg, P.H., Rosales, M., Ibrahim M. and Steinfeld, H.

(2007). Projecting land-use changes in the Neotropics: The geography of pasture

expansion into forest., Global Environmental Change, 17(1): 86-104.

Weinhold, D. (1999). Estimating the Loss of Agricultural Productivity in the Amazon.

Ecological Economics, 31: 63-76.

Appendix 1. DAGS and the Identification of Structural Form VAR

The SGS procedures allow us to establish the conditional independence relations that

are equivalent to determining whose coefficients of matrix A 0 are equal to zero. The

framework developed by SGS does not rule out the possibility of finding alternative

sets of conditional independence relations for a given data set. In this case we arrive at a


35/41

set of matrices A 0 that are observationally equivalent. It may be the case that the found

conditional independence relations are not enough to allow for the identification of the

matrices. In this case, additional restrictions are needed in order to identify the model.

Next we show how DAGs can be used to impose restrictions that allows identification

of Structural VARS (SVARs). In the example below we assume that the VAR has 4

endogenous variables.

The relationship between reduced form and structural form residuals is given by the

following equation:

t = [I -A 0 ] t + t

where: t column vector, with dimension 4x1, with reduced form VAR residuals at

period t;

t column vector, with dimension 4x1, with structural form VAR residual at period t;

A0 full rank matrix with the relationship between the two types of residuals.

The above equations form a system of linear equations, where each variable (reduced

form residual) is a linear function of its direct causes and an error term (structuralresidual), with error terms independent of each other. If the graph G that represents the

model has no cycles (is a DAG) then the variables are generated by a Markovian model.

Therefore, the model satisfy the property that guarantees the compatibility between its

distribution function and graph G 31 . Because conditional independence implies zero

partial correlation, Proposition 2 translates into a graphical test for identifying the

partial correlations that must vanish in the model 32 . Therefore, equations (3) can be

structured according to a DAG G, and the partial correlation coefficient 3V.2V1V

vanishes whenever the vertices corresponding to the variables in V 3 d-separate vertex

V1 from vertex V 2 in G.

31 For a sketch of the proof see Pearl (2000).

32 The partial correlation coefficient of X and Y, controlling for Z is given

by 2/12/1. )1()1/()( YZ XZ YZ XZ XY Z XY .


36/41


37/41

Appendix 2. Clustering Analysis using Bootstrapped Moran I computations

The following table gives the bootstrapped Moran I computations for cluster 1 and 2of map 1. Here cluster 1 is composed of 156 municipalities and cluster 2 of 75

municipalities. The 26 municipalities cluster is not included as the model could not be

estimated because of insufficient degrees of freedom. All the computations were performed

in GRETL because GEODA is lacking functionalities to select subsamples and compute

associated stats.

Here we present for comparison both final results for the truncated matrices and the

correctly re-indexed matrix but only for the nearest neighbors structure. To be more

precise, it is not possible to re-index the contiguity because when a neighbor is in another

group, there are no new neighbor to replace it. This is why the border effect must be

stronger, and subsequently the bias for the contiguity neighborhood. Table 3 presents the

results for both computations of neighborhood, this allows to compare results and

interestingly provide a qualitative assessment of border effects biases.

Table 3: Moran test and bootstrap p-values for clusters I and II


38/41

*significant at 10%; ** significant at 5 % ; *** significant at 1 % ; All tests performedwith 999 permutations; NA: Not Applicable.

Zone 1: spatial autocorrelation is not significant for the 4NN, except for the fallow

residuals. We know from initial computations on the whole model that fallow residuals

display the strongest spatial autocorrelation, here it is significant at 5 % but largely

reduced. For the 8NN, spatial autocorrelation is significant for crop and pasture . Note that

it is negative for crop which is an interesting result meaning crop residuals values are

significantly dissimilar at farther distance. Note that it is the only variable for which we

find negative spatial autocorrelation. Then, both pasture and forest residuals do not display

any significant spatial autocorrelation in the residuals of zone 1.

Neighborhood

structureVariables

Moran I test sample

Trucated matrices Correct matrices

Zone 1 Zone 2 Zone 1 Zone 2

MoranI

Bootstrap

pvalue

for t

MoranI

Bootstrap

pvalue

for t

MoranI

Bootstrap

pvalue

for t

MoranI

Bootstrap

pvalue

for t

First Order

Contiguity

CROP -0.091 0.072* 0.176 0.004***

NAPASTURE 0.256 0.003*** 0.638 0.000***

FALLOW -0.010 0.842 0.088 0.120

FOREST 0.062 0.219 0.252 0.001***

4NN

CROP -0.007 0.691 0.186 0.005*** -0.004 0.802 0.219 0.120

PASTURE 0.023 0.243 0.259 0.000*** 0.019 0.310 0.388 0.233

FALLOW 0.032 0.089* 0.058 0.203 0.064 0.041** 0.095 0.048**

FOREST 0.058 0.131 0.173 0.000*** 0.016 0.617 0.221 0.125

8NN

CROP -0.044 0.006*** 0.120 0.005*** -0.044 0.011** 0.004 0.004***

PASTURE 0.025 0.144 0.201 0.000*** 0.026 0.118 0.000 0.000***

FALLOW 0.020 0.221 0.067 0.055* 0.051 0.014** 0.049 0.243

FOREST 0.031 0.281 0.132 0.001 0.025 0.388 0.000 0.002***


39/41

Zone 2: At the 4NN we find the same results as for zone 1: significant spatial

autocorrelation for fallow residuals. At the 8NN the picture is quite different, as we see

significant autocorrelation for crop, pasture and forest residuals. The comparison of the

results for truncated and correct neighborhood matrices show differences in values that are

substantial for zone 2 (large differences) but smaller for zone 2. Also, the computed values

for zone 1 at 8NN are nearly the same but significance levels differ strongly. These

findings point to substantial biases due the boundary (border) effects and open interesting

further research on this subject.

Appendix 3.

Table 4: t-scores for selected variables for partitions in two clusters

whole panel, years 1970-1995

Topics

Variables Cluster

1

(156)

Cluster

2

(75)

Cluster

3

(26)

Demograp

hy

Total population 11.43 -11.54 -0.92

Urban population 10.79 -11.04 -0.81

Total dwellings 9.70 -11.07 -0.35

Dwellings with water 8.66 -10.13 -0.21

Dwellings with sanitation 6.59 -7.84 -0.10

Dwellings with electric lighting 9.54 -10.59 -0.46

Agricultur

al

Economy

Agricultural sector investments -9.13 -4.05 6.39

Total employement in agricultural

sector

4.46 -6.18 0.17

Bovine livestock -8.78 -3.47 5.97

Irrigated land area -7.96 2.15 3.28


40/41

Total number of tractors -11.31 -4.07 7.53

Value of wood production -0.19 -3.15 1.40

Average rental value of land (per

hectare)

4.10 -3.08 -1.04

Total volume of commercialized

wood

27.31 -8.75 -2.79

Average volume of wood in

munipicality

10.72 11.39 -7.15

Average purchase price of live beef 6.41 -3.75 -1.39

General

Economy

Share of wages in total expenses 4.09 -8.73 1.26

Average wage per employed -8.49 -0.91 4.78

Average wage in industry 1.97 -3.18 0.20

Share of direct government / Banco

do Brasil aids

-13.84 5.56 5.32

Surface of SUDAM financed projects -7.88 -5.67 5.94

Value of SUDAM financing -13.59 -3.35 7.78

Education

Share of population w. 5 yrs. or more

of education

10.51 -12.25 -1.41

Average number of years ofschooling

4.45 -9.40 1.57

Infrastruct

ure

Lenght of paved roads -0.08 -7.14 2.81

Lenght of unpaved roads -9.40 -6.80 7.07

Natural

Environment

Surface in 1997 2.94 -8.93 2.07

Land carbon content per hectare -4.39 -7.30 5.53

Share of water masses 11.25 -8.25 -2.87

Land nitrogen content per hectare -2.81 -7.98 4.95

Deforested area -8.50 -6.92 2.91

Land-use

Surface devoted to agriculture -10.28 -5.12 7.42

Artificial forest area -0.79 -4.25 2.12

Natural forests area -6.27 -6.75 5.99


41/41

Permanent plowing area 4.22 -5.67 0.09

Temporary plowing area -10.59 -4.39 7.28

Artificial pastures area -10.93 -3.07 6.93

Natural pastures area -13.77 -2.21 8.06

Resting land area -7.69 -2.46 5.00

loureiro et al,2010pvar effet sptial .pdf

Documents