influence of spatial features on land and housing prices
TRANSCRIPT
TSINGHUA SCIENCE AND TECHNOLOGYISSN 1007-0214 11/21 pp344-353Volume 10, Number 3, June 2005
Influence of Spatial Features on Land and Housing Prices*
GAO Xiaolu ( )**, ASAMI Yasushi
Japan Society for the Promotion of Science; Department of Urban Research, National Institute for Land and Infrastructure Management, Japan;
Center for Spatial Information Science, the University of Tokyo, Tokyo 113-8654, Japan
Abstract: The analysis of hidden spatial features is crucial for the improvement of hedonic regression
models for analyzing the structure of land and housing prices. If critical variables representing the influence
of spatial features are omitted in the models, the residuals and the coefficients estimated usually exhibit
some kind of spatial pattern. Hence, exploration of the relationship between the spatial patterns and the
spatial features essentially leads to the discovery of omitted variables. The analyses in this paper were
based on two exploratory approaches: one on the residual of a global regression model and the other on the
geographically weighted regression (GWR) technique. In the GWR model, the regression coefficients are al-
lowed to differ by location so more spatial patterns can be revealed. Comparison of the two approaches shows
that they play supplementary roles for the detection of lot-associated variables and area-associated variables.
Key words: spatial features; spatial variation; regression model; residual; geographically weighted regres-
sion (GWR)
Introduction
When the hedonic regression approach is applied to
analyze the structure of land and housing prices, we of-
ten seek hidden variables to improve the models. If
critical variables representing the influence of spatial
features are omitted in the models, the residuals and
the coefficients being estimated usually exhibit some
kind of spatial pattern. Analysis of the relationships be-
tween the spatial patterns and the spatial features will
lead to the discovery of omitted variables. For this rea-
son, analysis of the spatial patterns is very important.
In recent years, a variety of approaches have been
proposed for exploring spatial patterns. This work
analyzes two methods. The first method studies the
patterns of the residuals with a standard regression
model to investigate the influence of spatial features on
the patterns. Since the regression coefficients are
constant over space by definition of the model, this
method is a global regression method. The second
method uses the geographically weighted regression
(GWR) approach, which belongs to the family of local
regression methods. A GWR model allows the
regression coefficients to differ with location, and thus
more spatial patterns are revealed, which can be used
to explore the impacts of spatial features. The two
approaches are compared to show how they play
supplementary roles for improving hedonic regression
models.
Received: 2003-07-10; revised: 2003-12-08
Supported by the Special Coordination Funds for Promoting Sci-
ence and Technology, and the Research Grant-In-Aid provided by
the Ministry of Education, Culture, Sports, Science, and Technol-
ogy, Japan
To whom correspondence should be addressed.
E-mail: [email protected]; Tel: 81-29-864-3839
1 Analysis of the Residuals of the Global Regression Model
The Gao and Asami[1]
sample, which includes the
transaction price and the detailed attributes of 190
GAO Xiaolu ( ) et al Influence of Spatial Features on Land and Housing Prices 345
detached housing lots in Setagaya Ward, western
Tokyo, was used for the analyses. We start with the
hedonic regression model given by Gao and Asami[1]
,
which models the unit price of the 190 properties with
a stepwise regression method using 16 independent
variables. Their coefficients were satisfactory. Model
A in Table 1 lists the estimated coefficients for this
model.
Table 1 Unit price regression model results
Regression coefficient (million Yen/m2 per unit)
Variable Definition of variable Model A Model B Model C
Constant 0.9115**
(9.17)
0.8600**
(8.81)
0.9127**
(9.20)
Actual_FAR Building floor area / lot area 0.1276**
(3.22)
0.1360**
(3.54)
0.1122**
(2.81)
Train Time to the nearest train station (min) 0.0157**
(9.61)
0.0154**
( 9.72)
0.0166**
( 9.98)
Road_width Width of road fronting on lot (m) 0.0209**
(2.86)
0.0212**
(3.00)
0.0162*
(2.28)
Bldg_duration/S Remaining age of house (a)/ lot area 0.5686**
(6.42)
0.5300**
(6.11)
0.3899**
(4.02)
Landscaping Within landscape zones(1)
, 1; otherwise, 0 0.1726**
( 8.46)
0.1660**
( 8.33)
0.1860**
( 8.77)
Shinjuku Time to Shinjuku CBD by train (min) 0.0168**
( 6.60)
0.0164**
( 6.61)
0.0146**
( 5.92)
Frontage Lot frontage (m) 0.0058*
(2.38)
0.0054*
(2.26)
0.0049*
( 2.15)
Good_pavement Front road pavement is good, 1; otherwise, 0 0.0420**
(2.80)
0.0417**
(2.87)
0.0339*
(2.33)
Parking_lot Number of parking lots 0.0382**
(3.54)
0.0421**
(4.00)
0.0482**
(4.60)
Bldg_quality Building quality in the district is good, 1; otherwise, 0 0.0575**
(3.51)
0.0627**
(3.93)
0.0610**
(3.89)
Sunshine/S Sunshine duration of house (h)/ lot area 0.9476**
(2.67)
0.9750**
(2.83)
0.8488*
(2.51)
Adj_park Adjacent to park, 1; otherwise, 0 0.1956**
( 2.97)
0.2410**
( 3.70)
0.2468**
( 3.90)
Adj_park/S Adj_park / lot area 21.4547**
(3.14)
25.2110**
(3.75)
26.2212**
(4.01)
Mixed_use Non-residential uses are more than 1/3 in the district,
1; otherwise, 0
0.2384**
(2.64)
0.2290**
(2.61)
0.2973**
(3.38)
Mixed_use/S Mixed_use/ lot area 17.4766*
( 2.44)
17.1450*
( 2.47)
23.1857**
( 3.31)
Tree Greenery is good in the district, 1; otherwise, 0 0.0335*
(1.99)
0.0353*
(2.16)
0.0368*
(2.23)
500 m_large_parks Within 500 m to parks over 5000 m2, 1; otherwise, 0 0.0502
**
(3.46)
0.0571**
(4.00)
Pop_den Density of population in block (100 person/hm2) 5.9931
*
( 2.42)
% of road area/S Road coverage ratio in block 47.9868**
(2.82)
R-square 0.756 0.772 0.789
Adjusted R-square 0.734 0.749 0.766
AIC 876.11 887.12 897.02
*: Significant at 5% level. **: Significant at 1% level. T-statistic is given inside brackets.
(1): Planning controls in landscape zones are stricter than in other districts. This helps to explain the negative coefficient of this variable.
346 Tsinghua Science and Technology, June 2005, 10(3): 344 353
Because this model assumed that the estimated re-
gression coefficients are constant over space, it is
called a global regression model.
One way to improve a hedonic regression model is
to identify critical spatial features that have been ne-
glected in the model. Our strategy is to investigate the
spatial pattern of the residuals of the global regression
model to find spatial features that have some relation-
ship with the spatial pattern, then to build an improved
model including these features. The 190 sample prop-
erties were plotted on a contour map, Fig. 1, with the
regression residuals associated with each sample point.
This contour map was generated with a spline method.
The following features were then examined.
Fig. 1 Contour map of regression residuals
1.1 Public facilities
According to survey results, poor accessibility to pub-
lic facilities, such as parks, libraries, and community
centers, are among the items most cited by residents
that cause dissatisfaction towards living environments
(1993 National Housing Survey of Japan).
The distribution of public facilities, including
schools, hospitals, community centers, parks, sports
facilities, and water treatment plants, can be roughly
related to the residual map in Fig. 1.
Therefore, the distances from the housing lots to
these facilities were included in the regression analysis
to indicate the effects of these facilities. The new dis-
tance variables that correlated relatively strongly to the
residuals were added to the regression function speci-
fied in Eq. (1), which is the same as that of Model A,
and their significance levels were examined.
(1) 1 1
constant ( )k k
i i i i
i i
P/S c X /S d X
where P is the price of land lot and house; S is lot area;
constant is the intercept; Xi is independent variables
representing attribute i; ci and di are parameters for i
{1,…,k}, indicating hedonic prices; and is the
error term.
This specification identifies the variables that affect
the price and that affect the unit price. If di is signifi-
cantly different from 0, Xi has a significant effect on
the unit price (P/S). If ci is significantly different from
0, Xi has a significant effect on the price (P). If ci and di
for the same variable are both significant, Xi has a
variable effect as the lot size changes.
The results show that, among the new variables, the
proximity to a large park has a significant positive ef-
fect on the unit prices of the land and house. By chang-
ing the threshold size of a large park (including 1000,
2000, 3000, 4000, 5000, 6000, 8000, 10 000, and
20 000 m2) and the distances from the large park (in-
cluding 400, 450, 500, 600, 800, and 1000 m), the
dummy variable for within 500 m to parks over 5000
m2 gave the best fit. With this variable, the R-square of
the regression model increased from 0.756 to 0.772
and the Akaike’s information criterion(AIC) decreased
GAO Xiaolu ( ) et al Influence of Spatial Features on Land and Housing Prices 347
from 876 to 887. The second model in Table 1,
Model B, lists the new results.
In practice, parks over 5000 m2 are designated as re-
gional parks. The results suggest that the unit price of
lands and houses within 500 m (about 7-8 min walking
distance) of large parks is approximately 0.05 million
Yen per square meter higher than in other areas. This
estimate is quite reasonable and agrees well with other
studies[2]
. The results also imply that large parks are an
effective way to improve the environment of the areas
without large parks.
1.2 District environmental indices
The residual map was also compared to a variety of
environmental indicators in the blocks. The correlation
between these indicators and the residuals was not as
strong as expected, but the regression analysis showed
that the population density (pop_den) and the road
coverage ratio (% of road area) were significant. The
unit prices of lands and houses were higher in the areas
with greater road coverage ratios or lower population
densities.
Model C in Table 1 shows the results with the road
coverage ratio and population density included. Model
C gives the best fit among the three models from the
values of R-square and AIC. The estimated coefficients
for this model were similar to those of the other two
models.
The residual map and the correlations between the
residuals and the spatial features were used to analyze
the effects of various variables. The correlations of the
spatial features with the residuals do not immediately
reflect their significance in Model C. For example, the
correlation coefficients for population density and the
ratio of road area / lot size with the residuals were fairly
weak but these two variables were significant in Model
C. This is because the new variables indicating spatial
features may be correlated with the variables originally
included in Model A. To identify the significant vari-
ables, the regressions were run with the new spatial
features (to be called Z) as dependent variables and the
original variables of Model A as independent variables.
The residuals of these regression analyses were called
residual_Z. The residual_Z were then correlated with
the residuals of Model A. Some examples are given in
Table 2, where the values in the second column are the
correlation of Z with the residual of Model A, and the
values in the third column are the correlation of resid-
ual_Z with the residual of Model A. The variables hav-
ing high correlation coefficients in the third column
correspond well to those signified in Model C.
Table 2 Correlation of spatial features with residuals of Model A
Correlation coefficient
Spatial feature variables (Z) Z and
residual
Residual_Z and
residual
500 m_large_parks 0.243**
0.255**
Pop_den 0.068 0.114*
% of road area/S 0.099 0.181**
% of public open space(1)
/S 0.013 0.016
% of vacant land(2)
/S 0.063 0.098
*: Significant at 5% level (one-tail);
**: Significant at 1% level (one-tail).
(1): Coverage ratio of parks and playgrounds in block;
(2): Coverage ratio of vacant parcels in block.
However, the residual analysis method has a serious
shortcoming: while a residual map can reveal the ef-
fects of the most significant features, it cannot as easily
reveal the presence of weak effects or the presence of
many co-existing spatial effects. Sometimes, the spa-
tial patterns of the residuals may be just too compli-
cated to be described with one or several global fea-
tures. In such cases, a more formal framework is
needed to explore the influences of the spatial features.
Techniques that focus on the localized estimates of co-
efficients are very useful for this purpose.
2 Local Concerns and Geographi-cally Weighted Regression
In the field of spatial analysis, there has been an in-
creasing interest on local forms and local modeling in
recent years. A variety of new techniques have been
developed which focus on identifying spatial variations
in relationships rather than on the establishment of
global statements of spatial behavior, e.g., local point
pattern analyses, local graphical approaches, local
measures of spatial dependency, the spatial expansion
method, adaptive filtering, multilevel modeling, GWR,
random coefficient models, autoregressive models, and
local forms of spatial interaction models[3-5]
.
Among these techniques, GWR is thought to be a
particularly good exploratory method to assist model-
ing. The GWR theory is based on the model given by
348 Tsinghua Science and Technology, June 2005, 10(3): 344 353
i
i
Fotheringham et al.[6]
Consider the global regression model given by
(2)0i k ik
k
y a a x
The GWR technique extends the traditional regres-
sion framework of Eq. (2) by allowing local rather than
global parameters to be estimated with the model re-
written as
(3) 0i i ki ik
k
y a a x
where aki represents the value of ak at point i. With Eq.
(3), regressions run at various locations give different
estimates.
To estimate the parameters in Eq. (3), an observa-
tion is weighted in accordance with its proximity to
point i, so the weighting of the observation is no longer
constant but varies with i. Data from observations
close to i are weighted more than data from observa-
tions far away. In vector form, a GWR model can be
written as
1
2i iy a X W (4)
Then, Eq. (5) gives the estimate of ai as
(5)T 1 Tˆ ( )i ia X W X X Wi y
where X is the independent variable matrix, y is the
dependent variable vector, and Wi is an n n matrix
whose diagonal elements (wij) denote the geographical
weighting of the observed data for point i and whose
off-diagonal elements are zero. n is the number of
samples.
There may be various definitions for wij, for exam-
ple, the reciprocal of the Euclidean distance between i
and j. A global regression can be seen as a specific
case of Eq. (4) where the weightings are unity.
To be more adaptable, Fotheringham et al.[6]
defined
wij as
2
2exp
ij
ij
dw
critical.
]
(6)
where dij represents the Euclidean distance between
points i and j, and is a bandwidth. With Eq. (6), if j
coincides with i, the weighting of the data at that point
will be unity, and the weighting of other data will de-
crease as dij increases. For data far from i, the weight-
ing will be essentially zero, effectively excluding these
observations from the estimates of parameters for loca-
tion i.
The choice of bandwidth is related to the trade-off
between bias and variance. The greater the local sam-
ple size, the lower the standard errors of the coefficient
estimates are. But this must be offset against the fact
that enlarging the subset increases the chance that
coefficient drift introduces bias. Therefore, the
selection of an appropriate value for is
One way to select is by minimizing
2ˆ[ ( )i i
i
y y (7)
where is the fitted value of yˆ ( )iy i with a bandwidth
of . However, if becomes so small that the weight-
ings of all points except for i itself become negligible,
the value of Formula (7) becomes zero, but =0 is
meaningless. A cross-validation approach was pro-
posed by Fotheringham et al. [6]
to solve this problem,
where the estimates at point i were calibrated with
samples near to i but excluding i. Accordingly, a cross-
validation (CV) score defined by
2ˆCV score= [ ( )]i i
i
y y (8)
was computed and the optimal value of was derived
by minimizing the CV score.
The localized parameter estimates obtained from
GWR exhibit a high degree of variability over space.
These spatial patterns reveal the spatial nature of rela-
tionships and the spatial consequences of modeling
such relationships, which can be used for model
improvement.
3 Analysis of Housing and Land Prices with GWR
The GWR method was used to analyze the 190-sample
data set. The same independent variables as in Model
A were used, but the parameters to be estimated varied
with location.
( / ) constant ( / )i i ki k ki k i
k k
P S c X S d X
0
(9)
The localized parameter matrix was estimated with Eq.
(5), where the weighting matrix Wi at point i was de-
fined as
1
2
0 ... 0
0 ...
... ... ... ...
0 0 ...
i
i
i
in
w
w
w
W ,
GAO Xiaolu ( ) et al Influence of Spatial Features on Land and Housing Prices 349
with wij= exp
2
2
ijd.
A cross validation method was used to estimate .
As Fig. 2 shows, the optimal value was 1.243 km. This
value of was used with the regression analysis using
Eq. (9) to estimate the regression coefficients at each
sample point.
Fig. 2 CV score variation with bandwidth
Then, the coefficients for each of the 190 sample
points were plotted over the study area. Three exam-
ples will be given whose spatial patterns are relatively
evident: the constant term, actual_FAR (the marginal
effect of increasing the ratio of the actual building
floor area to the lot size), and tree (the marginal effect
of having many trees in the neighborhood). The graphs
of the three variables are shown in Figs. 3-5. In all
three examples, the estimates are positive. The extents
of the spatial variations are represented by the shadings
of the Voronoi polygons generated from the sample
points. The regression coefficients in the darker areas
are higher.
Fig. 3 Regression coefficients for the constant term
Fig. 4 Regression coefficients for the ratio of thebuilding floor area to the lot size (actual_FAR)
Fig. 5 Regression coefficients for the abundance of trees (tree)
The large fluctuations of the coefficients estimated
by GWR indicate that it is irrational for them to be as-
sumed constant and suggest the necessity of new vari-
ables. If the spatial patterns of the coefficients are quite
uniform, then the variable and its form are appropriate.
Consider the spatial distribution of the constant term
in Fig. 3 as an example. The variation of the unit price
of land and houses is evident, even though 16 inde-
pendent variables have already been included. The
dark areas are concentrated in the Seijo area, a residen-
tial area known for expensive housing. This might ei-
ther be caused by a so-called “brand” effect, or be re-
lated to the fact that most lots in Seijo area are rea-
sonably large and the population density of this area is
lower than that of other areas. The constant terms are
lower in the north central and southeastern districts.
The distribution maps of actual_FAR (the ratio of the
actual building floor area to the lot size) in Fig. 4, of
tree (abundance of trees in the neighborhood) in Fig. 5,
and of the other variables help visually identify spatial
350 Tsinghua Science and Technology, June 2005, 10(3): 344 353
features that may affect the spatial patterns.
The GWR model provides much more information
than a global model. For example, Model A gives 17
parameters, while GWR yields 17 190 parameters that
suggest more linkages to omitted spatial features that
may affect the unit price of land and houses. As an ex-
ploratory method, the GWR method is especially use-
ful in that the mappings of results can be directly used
to find underlying relationships.
The weighting method used in the GWR method
gives higher weights to geographically adjacent obser-
vations, so the effect of geographically related features
such as the effects in the Seijo area are easily found,
since the model estimates around such areas are likely
to differ from those of other areas. However, for fea-
tures that lack geographical correlations, the GWR
method will not reveal their effects. For example, if
observations belonging to a certain income class or a
special interest are loosely distributed, their impact on
the land and house prices will not be captured by the
GWR method, because the number of observations in
each localized regression is limited. In such cases,
however, the global regression model is a more power-
ful method for revealing their effects on the land and
house prices.
4 Comparison of Two Approaches
The performance of the GWR analysis was compared
to that of the residual analysis by examining the
Pearson’s correlation coefficients between the residual
and the spatial features excluded in the global
regression model, and the correlation coefficients
between the geographical weighted estimates and the
same group of spatial features. The new variables are
arranged into two groups. Lot-associated variables are
variables such as the unevenness degree (uneven) and
the distance to parks which are directly associated with
each lot (Table 3). Area-associated variables are
variables related to an area containing the lot, such as
the block population density (Table 4).
Table 3 Correlation coefficients for lot-associated variables
Distance to park Uneven
(2)
50 m (3)
500 m (3)
600 m (3)
800 m (3)
500 m_large_parks(4)
Residuals of global model(1)
0.025 0.029 0.164*
0.122*
0.004 0.255**
GWR estimates
Constant 0.006 0.202**
0.089 0.065 0.000 0.056
Actual_FAR 0.064 0.075 0.152*
0.088 0.067 0.014
Train 0.071 0.054 0.113 0.110 0.058 0.159*
Road_width 0.047 0.142*
0.013 0.012 0.073 0.012
Bldg_duration/S 0.064 0.072 0.088 0.026 0.048 0.077
Landscaping 0.032 0.052 0.032 0.064 0.038 0.048
Shinjuku 0.027 0.210**
0.052 0.044 0.004 0.037
Frontage 0.075 0.124*
0.196**
0.126*
0.073 0.026
Good_pavement 0.028 0.158*
0.039 0.007 0.008 0.032
Parking_lot 0.003 0.051 0.027 0.065 0.073 0.002
Bldg_quality 0.083 0.053 0.009 0.070 0.177**
0.090
Sunshine/S 0.115 0.169**
0.037 0.021 0.009 0.050
Adj_park/S 0.023 0.051 0.019 0.012 0.031 0.051
Adj_park 0.012 0.055 0.018 0.008 0.028 0.031
Mixed_use/S 0.005 0.057 0.072 0.034 0.108 0.077
Mixed_use 0.005 0.049 0.063 0.027 0.112 0.087
Tree 0.031 0.137*
0.074 0.043 0.022 0.051
*: Significant at 5% level (one-tail);
**: Significant at 1% level (one-tail).
(1): This row shows the correlations of residual_Z with the residual of Model A.
(2): Variable indicating the unevenness degree of land around lot.
(3): Within 50 m to parks, 1; otherwise, 0. The definitions for 500 m (600 m, 800 m) to parks are similar.
(4): Within 500 m to large parks. Large parks indicate the parks larger than 5000 m
2.
GAO Xiaolu ( ) et al Influence of Spatial Features on Land and Housing Prices 351
Table 4 Correlation coefficients for area-associated variables
Seijo% of road
area
% of road
area/S
% of public
open apace
% of public
open space/S
% of vacant
land
Residuals of global model(1)
0.162*
0.180** 0.181
*0.01 0.016 0.07
GWR estimates
Constant 0.447**
0.178** 0.369
**0.09 0.009 0.025
Actual_FAR 0.506**
0.146* 0.209
**0.022 0.161
*0.126
*
Train 0.368**
0.047 0.076 0.201**
0.256**
0.133*
Road_width 0.472**
0.210** 0.354
**0.052 0.002 0.222
**
Bldg_duration/S 0.389**
0.147* 0.172
**0.018 0.136
*0.112
Landscaping 0.095 0.136 0.040 0.059 0.089 0.089
Shinjuku 0.290**
0.234** 0.342
**0.138
*0.085 0.054
Frontage 0.411**
0.057 0.224**
0.039 0.050 0.099
Good_pavement 0.146*
0.297** 0.269
**0.113 0.109 0.203
**
Parking_lot 0.451**
0.098 0.238**
0.117 0.008 0.125*
Bldg_quality 0.341**
0.114 0.192**
0.122*
0.102 0.180**
Sunshine/S 0.294**
0.274** 0.320
**0.052 0.005 0.022
Adj_park/S 0.230**
0.099 0.271**
0.146*
0.086 0.017
Adj_park 0.221**
0.110 0.275**
0.138*
0.086 0.021
Mixed_use/S 0.526**
0.045 0.245**
0.177**
0.069 0.132*
Mixed_use 0.493**
0.033 0.222**
0.180**
0.077 0.105
Tree 0.391**
0.097 0.298**
0.072 0.014 0.008
% of vacant
land/SPop_den C/R(2) I/R(2)
(C+I)/R(2)
Residuals of global model(1)
0.098 0.114 0.044 0.068 0.068
GWR estimates
Constant 0.224**
0.458**
0.194**
0.057 0.136*
Actual_FAR 0.296**
0.481**
0.336**
0.289**
0.367**
Train 0.133*
0.015 0.296**
0.078 0.093
Road_width 0.323**
0.438**
0.044 0.065 0.022
Bldg_duration/S 0.275**
0.549**
0.278**
0.393**
0.410**
Landscaping 0.060 0.282**
0.157 0.136 0.022
Shinjuku 0.138*
0.365**
0.124*
0.025 0.079
Frontage 0.132*
0.736**
0.187**
0.349**
0.334**
Good_pavement 0.030 0.493**
0.048 0.204**
0.165*
Parking_lot 0.215**
0.482**
0.199**
0.171**
0.217**
Bldg_quality 0.202**
0.373**
0.109 0.087 0.114
Sunshine/S 0.157*
0.161*
0.103 0.088 0.009
Adj_park/S 0.214**
0.382**
0.042 0.177**
0.143*
Adj_park 0.215**
0.346**
0.027 0.156*
0.122*
Mixed_use/S 0.248**
0.511**
0.201**
0.023 0.116
Mixed_use 0.221**
0.510**
0.200**
0.048 0.133*
Tree 0.204**
0.620**
0.093 0.174**
0.167*
*: Significant at 5% level (one-tail);
**: Significant at 1% level (one-tail).
(1): This row shows the correlations of residual_Z with the residual of Model A;
(2): C, I, and R indicate the area of commercial, industrial, and residential land used in block. For instance, I/R is the ratio of industrial to residential
used land in block, and (C+I)/R is the ratio of total commercial and industrial used land to residential used land in block.
352 Tsinghua Science and Technology, June 2005, 10(3): 344 353
4.1 The GWR method effectively characterizedthe area-associated variables
The results in Tables 3 and 4 imply that the residual
analysis and GWR methods differ greatly in their abil-
ity to detect the significance of lot-associated and area-
associated variables. The GWR estimates exhibit sig-
nificant correlations with such area-associated vari-
ables as the Seijo area indicator, population density,
road coverage ratio, and land use mixture. However,
the GWR approach does not effectively correlate the
lot-associated variables.
For instance, the population density (pop_den) had
strong positive correlations with the GWR estimates of
frontage and mixed_use, but strong negative correla-
tions with the spatial variations of bldg_duration/S and
tree. This implies that people living in densely popu-
lated blocks tend to think that the lot frontage and a
large number of mixed land uses are more valuable,
while the remaining building age and a large amount of
trees are less valuable. Thus, pop_den is a proper vari-
able in the regression model. Likewise, % of road
area/S is also significantly correlated to the GWR
estimates.
In addition, the correlation between GWR estimates
and another area-associated variables, such as Seijo
(within Seijo area, 1; otherwise, 0), is highly signifi-
cant, which implies that the hedonic prices for
actual_FAR, road_width, landscaping, frontage, and
parking_lot are significantly lower in the Seijo area,
while the prices for bldg_duration, bldg_quality, tree,
etc. are higher. Therefore, the use of the Seijo indicator
in the regression model may likely smooth the spatial
variations since some omitted physical features or
socio-economic characteristics of the people living in
this area might have caused the spatial variations.
However, when Seijo was added to the regression
model in Eq. (1), the resulting coefficient was not sig-
nificant, perhaps due to the multiple collinearity be-
tween Seijo and other variables, especially road_width,
Shinjuku, tree, and pop_den. In fact, when Seijo was
included, the estimated coefficients for these other
variables fluctuated wildly.
Similarly, the mixed land use indicators such as the
ratio of commercial and industrial used land to residen-
tial used land in block, ((C+I) R1)/S (the ratio of
commercial and industrial used land to residential used
land in a block), were significantly correlated to the
GWR estimates, but were not significant in the global
regression model.
4.2 Residual analysis characterized lot-associated variables
Tables 3 and 4 also show that the residual analysis ef-
fectively characterizes the lot-associated variables in
Model A, but does not as effectively reveal the influ-
ences of area-associated variables.
5 Conclusions
Residual analysis and GWR methods were used to ana-
lyze the influence of spatial features on housing and
land prices. Comparison of the two methods shows that,
for the data set in this paper, both methods are useful
for improving the hedonic regression models, with the
GWR method more effectively revealing the influence
of area-associated spatial features, while the residual
analysis more effectively identifies the influence of lot-
associated spatial features.
These results are probably related to the aggregation
effects of area-associated variables. Because they indi-
cate the features of the entire block, the multiple ob-
servations in the same block often had the same value,
so their values were geographically more uniform than
the lot-associated variables and the spatial variations
caused by the omitted area-associated variables were
easily captured by GWR. As a result, the correlations
of the area-associated variables with the GWR esti-
mates were stronger. This conclusion ought to be ex-
amined further, for example, by changing the aggrega-
tion of area-associated variables or by applying the
methods described in this paper to other data sets.
The localized analysis techniques not only facili-
tated the discovery of omitted spatial features but also
helped identify the effects of variables that tend to con-
flict with each other. Global regression models some-
times include variables having strong spatial variations,
such as the distance to the central city areas, which
tend to mask the effects of other more important fea-
tures. This problem was avoided by restricting the
sample area. With local modeling techniques like
GWR, even if the sample area is very large, the models
can give good estimates because the results vary with
location. In addition, the omitted spatial features are
GAO Xiaolu ( ) et al Influence of Spatial Features on Land and Housing Prices 353
not the only reasons for the variations of the regression
parameters. Market segmentation may also result in
different bidding prices for the same attribute. In such
situations, localized regression techniques will help
distinguish geographically separated markets.
A crucial issue underlying the analyses of local re-
gression models is the validation of the models. If, for
example, a global regression model outperforms a
GWR model, the rational for including the spatial
variations revealed by GWR may be lost. For the data
set used in this paper, cross validation tests were used
to validate the model with the price of each sample
predicted with a model including all the rest samples
and with analysis of the predicted prices’ deviations
from really observed prices. The analyses demonstrate
that the global model is reasonable for this data set, but
the GWR model is slightly better [7]
.
The development of spatial analysis techniques is
providing more investigative tools for improving
hedonic regression models. Experience will give more
know-how on which technique is better: in which
situation certain techniques should be utilized, how to
assess different methods, etc. The comparison of the
residual analysis and GWR methods provides useful
information for evaluating and using these methods.
Acknowledgements
The authors gratefully acknowledge the valuable comments
from Prof. Atsuyuki Okabe, Prof. Yukio Sadahiro, Dr. Takaya
Kojima, Prof. Hongyu Liu, Dr. Chang-Jo Chung, and the mem-
bers of the Housing Economics Research Group, Japan.
References
[1] Gao Xiaolu, Asami Y. The external effects of local attrib-
utes on living environment in detached residential blocks
in Tokyo. Urban Studies, 2001, 38(3): 487-505.
[2] Yazawa N, Kanemoto Y. The choice of variables in he-
donic approaches. Proceedings of Environmental Science,
1992, 5(1): 45-56.
[3] Can A. Specification and estimation of hedonic housing
price models. Regional Science and Urban Economics,
1992, 22: 453-474.
[4] Fotheringham A S, Brunsdon C. Local forms of spatial
analysis. Geographical Analysis, 1999, 31(4): 341-358.
[5] Orford S. Modeling spatial structures in local housing mar-
ket dynamics: A multilevel perspective. Urban Studies,
2000, 37(9): 1643-1671.
[6] Fotheringham A S, Charlton M E, Brunsdon C. Geographi-
cal weighted regression: A natural evolution of the expan-
sion method for spatial data analysis. Environment and
Planning A, 1998, 30: 1905-1927.
[7] Gao Xiaolu, Asami Y, Chung C F. An empirical evaluation
of hedonic regression models. In: Proceedings of Joint In-
ternational Symposium on Geospatial Theory, Processing
and Applications. Ottawa, Canada, 2002.