influence of spatial features on land and housing prices

TSINGHUA SCIENCE AND TECHNOLOGYISSN 1007-0214 11/21 pp344-353Volume 10, Number 3, June 2005

Influence of Spatial Features on Land and Housing Prices*

GAO Xiaolu ( )**, ASAMI Yasushi

Japan Society for the Promotion of Science; Department of Urban Research, National Institute for Land and Infrastructure Management, Japan;

Center for Spatial Information Science, the University of Tokyo, Tokyo 113-8654, Japan

Abstract: The analysis of hidden spatial features is crucial for the improvement of hedonic regression

models for analyzing the structure of land and housing prices. If critical variables representing the influence

of spatial features are omitted in the models, the residuals and the coefficients estimated usually exhibit

some kind of spatial pattern. Hence, exploration of the relationship between the spatial patterns and the

spatial features essentially leads to the discovery of omitted variables. The analyses in this paper were

based on two exploratory approaches: one on the residual of a global regression model and the other on the

geographically weighted regression (GWR) technique. In the GWR model, the regression coefficients are al-

lowed to differ by location so more spatial patterns can be revealed. Comparison of the two approaches shows

that they play supplementary roles for the detection of lot-associated variables and area-associated variables.

Key words: spatial features; spatial variation; regression model; residual; geographically weighted regres-

sion (GWR)

Introduction

When the hedonic regression approach is applied to

analyze the structure of land and housing prices, we of-

ten seek hidden variables to improve the models. If

critical variables representing the influence of spatial

features are omitted in the models, the residuals and

the coefficients being estimated usually exhibit some

kind of spatial pattern. Analysis of the relationships be-

tween the spatial patterns and the spatial features will

lead to the discovery of omitted variables. For this rea-

son, analysis of the spatial patterns is very important.

In recent years, a variety of approaches have been

proposed for exploring spatial patterns. This work

analyzes two methods. The first method studies the

patterns of the residuals with a standard regression

model to investigate the influence of spatial features on

the patterns. Since the regression coefficients are

constant over space by definition of the model, this

method is a global regression method. The second

method uses the geographically weighted regression

(GWR) approach, which belongs to the family of local

regression methods. A GWR model allows the

regression coefficients to differ with location, and thus

more spatial patterns are revealed, which can be used

to explore the impacts of spatial features. The two

approaches are compared to show how they play

supplementary roles for improving hedonic regression

models.

Received: 2003-07-10; revised: 2003-12-08

Supported by the Special Coordination Funds for Promoting Sci-

ence and Technology, and the Research Grant-In-Aid provided by

the Ministry of Education, Culture, Sports, Science, and Technol-

ogy, Japan

To whom correspondence should be addressed.

E-mail: [email protected]; Tel: 81-29-864-3839

1 Analysis of the Residuals of the Global Regression Model

The Gao and Asami[1]

sample, which includes the

transaction price and the detailed attributes of 190

GAO Xiaolu ( ) et al Influence of Spatial Features on Land and Housing Prices 345

detached housing lots in Setagaya Ward, western

Tokyo, was used for the analyses. We start with the

hedonic regression model given by Gao and Asami[1]

,

which models the unit price of the 190 properties with

a stepwise regression method using 16 independent

variables. Their coefficients were satisfactory. Model

A in Table 1 lists the estimated coefficients for this

model.

Table 1 Unit price regression model results

Regression coefficient (million Yen/m2 per unit)

Variable Definition of variable Model A Model B Model C

Constant 0.9115**

(9.17)

0.8600**

(8.81)

0.9127**

(9.20)

Actual_FAR Building floor area / lot area 0.1276**

(3.22)

0.1360**

(3.54)

0.1122**

(2.81)

Train Time to the nearest train station (min) 0.0157**

(9.61)

0.0154**

( 9.72)

0.0166**

( 9.98)

Road_width Width of road fronting on lot (m) 0.0209**

(2.86)

0.0212**

(3.00)

0.0162*

(2.28)

Bldg_duration/S Remaining age of house (a)/ lot area 0.5686**

(6.42)

0.5300**

(6.11)

0.3899**

(4.02)

Landscaping Within landscape zones(1)

, 1; otherwise, 0 0.1726**

( 8.46)

0.1660**

( 8.33)

0.1860**

( 8.77)

Shinjuku Time to Shinjuku CBD by train (min) 0.0168**

( 6.60)

0.0164**

( 6.61)

0.0146**

( 5.92)

Frontage Lot frontage (m) 0.0058*

(2.38)

0.0054*

(2.26)

0.0049*

( 2.15)

Good_pavement Front road pavement is good, 1; otherwise, 0 0.0420**

(2.80)

0.0417**

(2.87)

0.0339*

(2.33)

Parking_lot Number of parking lots 0.0382**

(3.54)

0.0421**

(4.00)

0.0482**

(4.60)

Bldg_quality Building quality in the district is good, 1; otherwise, 0 0.0575**

(3.51)

0.0627**

(3.93)

0.0610**

(3.89)

Sunshine/S Sunshine duration of house (h)/ lot area 0.9476**

(2.67)

0.9750**

(2.83)

0.8488*

(2.51)

Adj_park Adjacent to park, 1; otherwise, 0 0.1956**

( 2.97)

0.2410**

( 3.70)

0.2468**

( 3.90)

Adj_park/S Adj_park / lot area 21.4547**

(3.14)

25.2110**

(3.75)

26.2212**

(4.01)

Mixed_use Non-residential uses are more than 1/3 in the district,

1; otherwise, 0

0.2384**

(2.64)

0.2290**

(2.61)

0.2973**

(3.38)

Mixed_use/S Mixed_use/ lot area 17.4766*

( 2.44)

17.1450*

( 2.47)

23.1857**

( 3.31)

Tree Greenery is good in the district, 1; otherwise, 0 0.0335*

(1.99)

0.0353*

(2.16)

0.0368*

(2.23)

500 m_large_parks Within 500 m to parks over 5000 m2, 1; otherwise, 0 0.0502

**

(3.46)

0.0571**

(4.00)

Pop_den Density of population in block (100 person/hm2) 5.9931

*

( 2.42)

% of road area/S Road coverage ratio in block 47.9868**

(2.82)

R-square 0.756 0.772 0.789

Adjusted R-square 0.734 0.749 0.766

AIC 876.11 887.12 897.02

*: Significant at 5% level. **: Significant at 1% level. T-statistic is given inside brackets.

(1): Planning controls in landscape zones are stricter than in other districts. This helps to explain the negative coefficient of this variable.

346 Tsinghua Science and Technology, June 2005, 10(3): 344 353

Because this model assumed that the estimated re-

gression coefficients are constant over space, it is

called a global regression model.

One way to improve a hedonic regression model is

to identify critical spatial features that have been ne-

glected in the model. Our strategy is to investigate the

spatial pattern of the residuals of the global regression

model to find spatial features that have some relation-

ship with the spatial pattern, then to build an improved

model including these features. The 190 sample prop-

erties were plotted on a contour map, Fig. 1, with the

regression residuals associated with each sample point.

This contour map was generated with a spline method.

The following features were then examined.

Fig. 1 Contour map of regression residuals

1.1 Public facilities

According to survey results, poor accessibility to pub-

lic facilities, such as parks, libraries, and community

centers, are among the items most cited by residents

that cause dissatisfaction towards living environments

(1993 National Housing Survey of Japan).

The distribution of public facilities, including

schools, hospitals, community centers, parks, sports

facilities, and water treatment plants, can be roughly

related to the residual map in Fig. 1.

Therefore, the distances from the housing lots to

these facilities were included in the regression analysis

to indicate the effects of these facilities. The new dis-

tance variables that correlated relatively strongly to the

residuals were added to the regression function speci-

fied in Eq. (1), which is the same as that of Model A,

and their significance levels were examined.

(1) 1 1

constant ( )k k

i i i i

i i

P/S c X /S d X

where P is the price of land lot and house; S is lot area;

constant is the intercept; Xi is independent variables

representing attribute i; ci and di are parameters for i

{1,…,k}, indicating hedonic prices; and is the

error term.

This specification identifies the variables that affect

the price and that affect the unit price. If di is signifi-

cantly different from 0, Xi has a significant effect on

the unit price (P/S). If ci is significantly different from

0, Xi has a significant effect on the price (P). If ci and di

for the same variable are both significant, Xi has a

variable effect as the lot size changes.

The results show that, among the new variables, the

proximity to a large park has a significant positive ef-

fect on the unit prices of the land and house. By chang-

ing the threshold size of a large park (including 1000,

2000, 3000, 4000, 5000, 6000, 8000, 10 000, and

20 000 m2) and the distances from the large park (in-

cluding 400, 450, 500, 600, 800, and 1000 m), the

dummy variable for within 500 m to parks over 5000

m2 gave the best fit. With this variable, the R-square of

the regression model increased from 0.756 to 0.772

and the Akaike’s information criterion(AIC) decreased


from 876 to 887. The second model in Table 1,

Model B, lists the new results.

In practice, parks over 5000 m2 are designated as re-

gional parks. The results suggest that the unit price of

lands and houses within 500 m (about 7-8 min walking

distance) of large parks is approximately 0.05 million

Yen per square meter higher than in other areas. This

estimate is quite reasonable and agrees well with other

studies[2]

. The results also imply that large parks are an

effective way to improve the environment of the areas

without large parks.

1.2 District environmental indices

The residual map was also compared to a variety of

environmental indicators in the blocks. The correlation

between these indicators and the residuals was not as

strong as expected, but the regression analysis showed

that the population density (pop_den) and the road

coverage ratio (% of road area) were significant. The

unit prices of lands and houses were higher in the areas

with greater road coverage ratios or lower population

densities.

Model C in Table 1 shows the results with the road

coverage ratio and population density included. Model

C gives the best fit among the three models from the

values of R-square and AIC. The estimated coefficients

for this model were similar to those of the other two

models.

The residual map and the correlations between the

residuals and the spatial features were used to analyze

the effects of various variables. The correlations of the

spatial features with the residuals do not immediately

reflect their significance in Model C. For example, the

correlation coefficients for population density and the

ratio of road area / lot size with the residuals were fairly

weak but these two variables were significant in Model

C. This is because the new variables indicating spatial

features may be correlated with the variables originally

included in Model A. To identify the significant vari-

ables, the regressions were run with the new spatial

features (to be called Z) as dependent variables and the

original variables of Model A as independent variables.

The residuals of these regression analyses were called

residual_Z. The residual_Z were then correlated with

the residuals of Model A. Some examples are given in

Table 2, where the values in the second column are the

correlation of Z with the residual of Model A, and the

values in the third column are the correlation of resid-

ual_Z with the residual of Model A. The variables hav-

ing high correlation coefficients in the third column

correspond well to those signified in Model C.

Table 2 Correlation of spatial features with residuals of Model A

Correlation coefficient

Spatial feature variables (Z) Z and

residual

Residual_Z and

residual

500 m_large_parks 0.243**

0.255**

Pop_den 0.068 0.114*

% of road area/S 0.099 0.181**

% of public open space(1)

/S 0.013 0.016

% of vacant land(2)

/S 0.063 0.098

*: Significant at 5% level (one-tail);

**: Significant at 1% level (one-tail).

(1): Coverage ratio of parks and playgrounds in block;

(2): Coverage ratio of vacant parcels in block.

However, the residual analysis method has a serious

shortcoming: while a residual map can reveal the ef-

fects of the most significant features, it cannot as easily

reveal the presence of weak effects or the presence of

many co-existing spatial effects. Sometimes, the spa-

tial patterns of the residuals may be just too compli-

cated to be described with one or several global fea-

tures. In such cases, a more formal framework is

needed to explore the influences of the spatial features.

Techniques that focus on the localized estimates of co-

efficients are very useful for this purpose.

2 Local Concerns and Geographi-cally Weighted Regression

In the field of spatial analysis, there has been an in-

creasing interest on local forms and local modeling in

recent years. A variety of new techniques have been

developed which focus on identifying spatial variations

in relationships rather than on the establishment of

global statements of spatial behavior, e.g., local point

pattern analyses, local graphical approaches, local

measures of spatial dependency, the spatial expansion

method, adaptive filtering, multilevel modeling, GWR,

random coefficient models, autoregressive models, and

local forms of spatial interaction models[3-5]

.

Among these techniques, GWR is thought to be a

particularly good exploratory method to assist model-

ing. The GWR theory is based on the model given by


i

i

Fotheringham et al.[6]

Consider the global regression model given by

(2)0i k ik

k

y a a x

The GWR technique extends the traditional regres-

sion framework of Eq. (2) by allowing local rather than

global parameters to be estimated with the model re-

written as

(3) 0i i ki ik

k

y a a x

where aki represents the value of ak at point i. With Eq.

(3), regressions run at various locations give different

estimates.

To estimate the parameters in Eq. (3), an observa-

tion is weighted in accordance with its proximity to

point i, so the weighting of the observation is no longer

constant but varies with i. Data from observations

close to i are weighted more than data from observa-

tions far away. In vector form, a GWR model can be

written as

1

2i iy a X W (4)

Then, Eq. (5) gives the estimate of ai as

(5)T 1 Tˆ ( )i ia X W X X Wi y

where X is the independent variable matrix, y is the

dependent variable vector, and Wi is an n n matrix

whose diagonal elements (wij) denote the geographical

weighting of the observed data for point i and whose

off-diagonal elements are zero. n is the number of

samples.

There may be various definitions for wij, for exam-

ple, the reciprocal of the Euclidean distance between i

and j. A global regression can be seen as a specific

case of Eq. (4) where the weightings are unity.

To be more adaptable, Fotheringham et al.[6]

defined

wij as

2

2exp

ij

ij

dw

critical.

]

(6)

where dij represents the Euclidean distance between

points i and j, and is a bandwidth. With Eq. (6), if j

coincides with i, the weighting of the data at that point

will be unity, and the weighting of other data will de-

crease as dij increases. For data far from i, the weight-

ing will be essentially zero, effectively excluding these

observations from the estimates of parameters for loca-

tion i.

The choice of bandwidth is related to the trade-off

between bias and variance. The greater the local sam-

ple size, the lower the standard errors of the coefficient

estimates are. But this must be offset against the fact

that enlarging the subset increases the chance that

coefficient drift introduces bias. Therefore, the

selection of an appropriate value for is

One way to select is by minimizing

2ˆ[ ( )i i

i

y y (7)

where is the fitted value of yˆ ( )iy i with a bandwidth

of . However, if becomes so small that the weight-

ings of all points except for i itself become negligible,

the value of Formula (7) becomes zero, but =0 is

meaningless. A cross-validation approach was pro-

posed by Fotheringham et al. [6]

to solve this problem,

where the estimates at point i were calibrated with

samples near to i but excluding i. Accordingly, a cross-

validation (CV) score defined by

2ˆCV score= [ ( )]i i

i

y y (8)

was computed and the optimal value of was derived

by minimizing the CV score.

The localized parameter estimates obtained from

GWR exhibit a high degree of variability over space.

These spatial patterns reveal the spatial nature of rela-

tionships and the spatial consequences of modeling

such relationships, which can be used for model

improvement.

3 Analysis of Housing and Land Prices with GWR

The GWR method was used to analyze the 190-sample

data set. The same independent variables as in Model

A were used, but the parameters to be estimated varied

with location.

( / ) constant ( / )i i ki k ki k i

k k

P S c X S d X

0

(9)

The localized parameter matrix was estimated with Eq.

(5), where the weighting matrix Wi at point i was de-

fined as

1

2

0 ... 0

0 ...

... ... ... ...

0 0 ...

i

i

i

in

w

w

w

W ,


with wij= exp

2

2

ijd.

A cross validation method was used to estimate .

As Fig. 2 shows, the optimal value was 1.243 km. This

value of was used with the regression analysis using

Eq. (9) to estimate the regression coefficients at each

sample point.

Fig. 2 CV score variation with bandwidth

Then, the coefficients for each of the 190 sample

points were plotted over the study area. Three exam-

ples will be given whose spatial patterns are relatively

evident: the constant term, actual_FAR (the marginal

effect of increasing the ratio of the actual building

floor area to the lot size), and tree (the marginal effect

of having many trees in the neighborhood). The graphs

of the three variables are shown in Figs. 3-5. In all

three examples, the estimates are positive. The extents

of the spatial variations are represented by the shadings

of the Voronoi polygons generated from the sample

points. The regression coefficients in the darker areas

are higher.

Fig. 3 Regression coefficients for the constant term

Fig. 4 Regression coefficients for the ratio of thebuilding floor area to the lot size (actual_FAR)

Fig. 5 Regression coefficients for the abundance of trees (tree)

The large fluctuations of the coefficients estimated

by GWR indicate that it is irrational for them to be as-

sumed constant and suggest the necessity of new vari-

ables. If the spatial patterns of the coefficients are quite

uniform, then the variable and its form are appropriate.

Consider the spatial distribution of the constant term

in Fig. 3 as an example. The variation of the unit price

of land and houses is evident, even though 16 inde-

pendent variables have already been included. The

dark areas are concentrated in the Seijo area, a residen-

tial area known for expensive housing. This might ei-

ther be caused by a so-called “brand” effect, or be re-

lated to the fact that most lots in Seijo area are rea-

sonably large and the population density of this area is

lower than that of other areas. The constant terms are

lower in the north central and southeastern districts.

The distribution maps of actual_FAR (the ratio of the

actual building floor area to the lot size) in Fig. 4, of

tree (abundance of trees in the neighborhood) in Fig. 5,

and of the other variables help visually identify spatial


features that may affect the spatial patterns.

The GWR model provides much more information

than a global model. For example, Model A gives 17

parameters, while GWR yields 17 190 parameters that

suggest more linkages to omitted spatial features that

may affect the unit price of land and houses. As an ex-

ploratory method, the GWR method is especially use-

ful in that the mappings of results can be directly used

to find underlying relationships.

The weighting method used in the GWR method

gives higher weights to geographically adjacent obser-

vations, so the effect of geographically related features

such as the effects in the Seijo area are easily found,

since the model estimates around such areas are likely

to differ from those of other areas. However, for fea-

tures that lack geographical correlations, the GWR

method will not reveal their effects. For example, if

observations belonging to a certain income class or a

special interest are loosely distributed, their impact on

the land and house prices will not be captured by the

GWR method, because the number of observations in

each localized regression is limited. In such cases,

however, the global regression model is a more power-

ful method for revealing their effects on the land and

house prices.

4 Comparison of Two Approaches

The performance of the GWR analysis was compared

to that of the residual analysis by examining the

Pearson’s correlation coefficients between the residual

and the spatial features excluded in the global

regression model, and the correlation coefficients

between the geographical weighted estimates and the

same group of spatial features. The new variables are

arranged into two groups. Lot-associated variables are

variables such as the unevenness degree (uneven) and

the distance to parks which are directly associated with

each lot (Table 3). Area-associated variables are

variables related to an area containing the lot, such as

the block population density (Table 4).

Table 3 Correlation coefficients for lot-associated variables

Distance to park Uneven

(2)

50 m (3)

500 m (3)

600 m (3)

800 m (3)

500 m_large_parks(4)

Residuals of global model(1)

0.025 0.029 0.164*

0.122*

0.004 0.255**

GWR estimates

Constant 0.006 0.202**

0.089 0.065 0.000 0.056

Actual_FAR 0.064 0.075 0.152*

0.088 0.067 0.014

Train 0.071 0.054 0.113 0.110 0.058 0.159*

Road_width 0.047 0.142*

0.013 0.012 0.073 0.012

Bldg_duration/S 0.064 0.072 0.088 0.026 0.048 0.077

Landscaping 0.032 0.052 0.032 0.064 0.038 0.048

Shinjuku 0.027 0.210**

0.052 0.044 0.004 0.037

Frontage 0.075 0.124*

0.196**

0.126*

0.073 0.026

Good_pavement 0.028 0.158*

0.039 0.007 0.008 0.032

Parking_lot 0.003 0.051 0.027 0.065 0.073 0.002

Bldg_quality 0.083 0.053 0.009 0.070 0.177**

0.090

Sunshine/S 0.115 0.169**

0.037 0.021 0.009 0.050

Adj_park/S 0.023 0.051 0.019 0.012 0.031 0.051

Adj_park 0.012 0.055 0.018 0.008 0.028 0.031

Mixed_use/S 0.005 0.057 0.072 0.034 0.108 0.077

Mixed_use 0.005 0.049 0.063 0.027 0.112 0.087

Tree 0.031 0.137*

0.074 0.043 0.022 0.051



(1): This row shows the correlations of residual_Z with the residual of Model A.

(2): Variable indicating the unevenness degree of land around lot.

(3): Within 50 m to parks, 1; otherwise, 0. The definitions for 500 m (600 m, 800 m) to parks are similar.

(4): Within 500 m to large parks. Large parks indicate the parks larger than 5000 m

2.


Table 4 Correlation coefficients for area-associated variables

Seijo% of road

area

% of road

area/S

% of public

open apace

% of public

open space/S

% of vacant

land


0.162*

0.180** 0.181

*0.01 0.016 0.07

GWR estimates

Constant 0.447**

0.178** 0.369

**0.09 0.009 0.025

Actual_FAR 0.506**

0.146* 0.209

**0.022 0.161

*0.126

*

Train 0.368**

0.047 0.076 0.201**

0.256**

0.133*

Road_width 0.472**

0.210** 0.354

**0.052 0.002 0.222

**

Bldg_duration/S 0.389**

0.147* 0.172

**0.018 0.136

*0.112

Landscaping 0.095 0.136 0.040 0.059 0.089 0.089

Shinjuku 0.290**

0.234** 0.342

**0.138

*0.085 0.054

Frontage 0.411**

0.057 0.224**

0.039 0.050 0.099

Good_pavement 0.146*

0.297** 0.269

**0.113 0.109 0.203

**

Parking_lot 0.451**

0.098 0.238**

0.117 0.008 0.125*

Bldg_quality 0.341**

0.114 0.192**

0.122*

0.102 0.180**

Sunshine/S 0.294**

0.274** 0.320

**0.052 0.005 0.022

Adj_park/S 0.230**

0.099 0.271**

0.146*

0.086 0.017

Adj_park 0.221**

0.110 0.275**

0.138*

0.086 0.021

Mixed_use/S 0.526**

0.045 0.245**

0.177**

0.069 0.132*

Mixed_use 0.493**

0.033 0.222**

0.180**

0.077 0.105

Tree 0.391**

0.097 0.298**

0.072 0.014 0.008

% of vacant

land/SPop_den C/R(2) I/R(2)

(C+I)/R(2)


0.098 0.114 0.044 0.068 0.068

GWR estimates

Constant 0.224**

0.458**

0.194**

0.057 0.136*

Actual_FAR 0.296**

0.481**

0.336**

0.289**

0.367**

Train 0.133*

0.015 0.296**

0.078 0.093

Road_width 0.323**

0.438**

0.044 0.065 0.022

Bldg_duration/S 0.275**

0.549**

0.278**

0.393**

0.410**

Landscaping 0.060 0.282**

0.157 0.136 0.022

Shinjuku 0.138*

0.365**

0.124*

0.025 0.079

Frontage 0.132*

0.736**

0.187**

0.349**

0.334**

Good_pavement 0.030 0.493**

0.048 0.204**

0.165*

Parking_lot 0.215**

0.482**

0.199**

0.171**

0.217**

Bldg_quality 0.202**

0.373**

0.109 0.087 0.114

Sunshine/S 0.157*

0.161*

0.103 0.088 0.009

Adj_park/S 0.214**

0.382**

0.042 0.177**

0.143*

Adj_park 0.215**

0.346**

0.027 0.156*

0.122*

Mixed_use/S 0.248**

0.511**

0.201**

0.023 0.116

Mixed_use 0.221**

0.510**

0.200**

0.048 0.133*

Tree 0.204**

0.620**

0.093 0.174**

0.167*



(1): This row shows the correlations of residual_Z with the residual of Model A;

(2): C, I, and R indicate the area of commercial, industrial, and residential land used in block. For instance, I/R is the ratio of industrial to residential

used land in block, and (C+I)/R is the ratio of total commercial and industrial used land to residential used land in block.


4.1 The GWR method effectively characterizedthe area-associated variables

The results in Tables 3 and 4 imply that the residual

analysis and GWR methods differ greatly in their abil-

ity to detect the significance of lot-associated and area-

associated variables. The GWR estimates exhibit sig-

nificant correlations with such area-associated vari-

ables as the Seijo area indicator, population density,

road coverage ratio, and land use mixture. However,

the GWR approach does not effectively correlate the

lot-associated variables.

For instance, the population density (pop_den) had

strong positive correlations with the GWR estimates of

frontage and mixed_use, but strong negative correla-

tions with the spatial variations of bldg_duration/S and

tree. This implies that people living in densely popu-

lated blocks tend to think that the lot frontage and a

large number of mixed land uses are more valuable,

while the remaining building age and a large amount of

trees are less valuable. Thus, pop_den is a proper vari-

able in the regression model. Likewise, % of road

area/S is also significantly correlated to the GWR

estimates.

In addition, the correlation between GWR estimates

and another area-associated variables, such as Seijo

(within Seijo area, 1; otherwise, 0), is highly signifi-

cant, which implies that the hedonic prices for

actual_FAR, road_width, landscaping, frontage, and

parking_lot are significantly lower in the Seijo area,

while the prices for bldg_duration, bldg_quality, tree,

etc. are higher. Therefore, the use of the Seijo indicator

in the regression model may likely smooth the spatial

variations since some omitted physical features or

socio-economic characteristics of the people living in

this area might have caused the spatial variations.

However, when Seijo was added to the regression

model in Eq. (1), the resulting coefficient was not sig-

nificant, perhaps due to the multiple collinearity be-

tween Seijo and other variables, especially road_width,

Shinjuku, tree, and pop_den. In fact, when Seijo was

included, the estimated coefficients for these other

variables fluctuated wildly.

Similarly, the mixed land use indicators such as the

ratio of commercial and industrial used land to residen-

tial used land in block, ((C+I) R1)/S (the ratio of

commercial and industrial used land to residential used

land in a block), were significantly correlated to the

GWR estimates, but were not significant in the global

regression model.

4.2 Residual analysis characterized lot-associated variables

Tables 3 and 4 also show that the residual analysis ef-

fectively characterizes the lot-associated variables in

Model A, but does not as effectively reveal the influ-

ences of area-associated variables.

5 Conclusions

Residual analysis and GWR methods were used to ana-

lyze the influence of spatial features on housing and

land prices. Comparison of the two methods shows that,

for the data set in this paper, both methods are useful

for improving the hedonic regression models, with the

GWR method more effectively revealing the influence

of area-associated spatial features, while the residual

analysis more effectively identifies the influence of lot-

associated spatial features.

These results are probably related to the aggregation

effects of area-associated variables. Because they indi-

cate the features of the entire block, the multiple ob-

servations in the same block often had the same value,

so their values were geographically more uniform than

the lot-associated variables and the spatial variations

caused by the omitted area-associated variables were

easily captured by GWR. As a result, the correlations

of the area-associated variables with the GWR esti-

mates were stronger. This conclusion ought to be ex-

amined further, for example, by changing the aggrega-

tion of area-associated variables or by applying the

methods described in this paper to other data sets.

The localized analysis techniques not only facili-

tated the discovery of omitted spatial features but also

helped identify the effects of variables that tend to con-

flict with each other. Global regression models some-

times include variables having strong spatial variations,

such as the distance to the central city areas, which

tend to mask the effects of other more important fea-

tures. This problem was avoided by restricting the

sample area. With local modeling techniques like

GWR, even if the sample area is very large, the models

can give good estimates because the results vary with

location. In addition, the omitted spatial features are


not the only reasons for the variations of the regression

parameters. Market segmentation may also result in

different bidding prices for the same attribute. In such

situations, localized regression techniques will help

distinguish geographically separated markets.

A crucial issue underlying the analyses of local re-

gression models is the validation of the models. If, for

example, a global regression model outperforms a

GWR model, the rational for including the spatial

variations revealed by GWR may be lost. For the data

set used in this paper, cross validation tests were used

to validate the model with the price of each sample

predicted with a model including all the rest samples

and with analysis of the predicted prices’ deviations

from really observed prices. The analyses demonstrate

that the global model is reasonable for this data set, but

the GWR model is slightly better [7]

.

The development of spatial analysis techniques is

providing more investigative tools for improving

hedonic regression models. Experience will give more

know-how on which technique is better: in which

situation certain techniques should be utilized, how to

assess different methods, etc. The comparison of the

residual analysis and GWR methods provides useful

information for evaluating and using these methods.

Acknowledgements

The authors gratefully acknowledge the valuable comments

from Prof. Atsuyuki Okabe, Prof. Yukio Sadahiro, Dr. Takaya

Kojima, Prof. Hongyu Liu, Dr. Chang-Jo Chung, and the mem-

bers of the Housing Economics Research Group, Japan.

References

[1] Gao Xiaolu, Asami Y. The external effects of local attrib-

utes on living environment in detached residential blocks

in Tokyo. Urban Studies, 2001, 38(3): 487-505.

[2] Yazawa N, Kanemoto Y. The choice of variables in he-

donic approaches. Proceedings of Environmental Science,

1992, 5(1): 45-56.

[3] Can A. Specification and estimation of hedonic housing

price models. Regional Science and Urban Economics,

1992, 22: 453-474.

[4] Fotheringham A S, Brunsdon C. Local forms of spatial

analysis. Geographical Analysis, 1999, 31(4): 341-358.

[5] Orford S. Modeling spatial structures in local housing mar-

ket dynamics: A multilevel perspective. Urban Studies,

2000, 37(9): 1643-1671.

[6] Fotheringham A S, Charlton M E, Brunsdon C. Geographi-

cal weighted regression: A natural evolution of the expan-

sion method for spatial data analysis. Environment and

Planning A, 1998, 30: 1905-1927.

[7] Gao Xiaolu, Asami Y, Chung C F. An empirical evaluation

of hedonic regression models. In: Proceedings of Joint In-

ternational Symposium on Geospatial Theory, Processing

and Applications. Ottawa, Canada, 2002.

influence of spatial features on land and housing prices

Documents