structural and spatial determinants of london house prices, vishal kumar© 2013
TRANSCRIPT
1
GY240 Assessed Project 1 Structural and spatial determinants of London house prices, Vishal Kumar© 2013 Part 1: Introduction and Literature Review The history of land value economics stretches back to Alfred Marshall’s theory (1890) that examines the question of land rent and land value at length. More preeminent is the spatial model that was presented by von Thünen (1826). Alonso (1964) adopted von Thünen’s theory of agricultural land use and applied it to urban regions (Chiaradia, A. et al, 2009), describing cities as having a circular area of residential properties surrounding a central business district (CBD) of a certain radius. The influence of a dwelling unit’s price can be broken down into: its accessibility to work, transport and amenities and structural characteristics, neighborhoods and environmental quality (Muth, 1969). The varying empirical work on this subject has produced and extensive list of attributes that scientists use for specifying their model. Different authors use different approaches to divide the attributes of a house into categories. Malpezzi (2002) identifies structural, locational and neighborhood, contract depending and time specific attributes. Where as, Sirmans et al. (2009) mentions the internal features of a home, external, natural environment features, and public services. With that in mind, this study must be aware of all of these aspects when making the analysis. To that end, this project will consider firstly how specific structural and spatial characteristics affect sale price of London houses by using GIS spatial mapping software and a hedonic regression model, constructed by SPSS Statistic software. Secondly once the mapping and modeling are complete this project will analyze the impact of spatial characteristics relative to structural characteristics of the house on the eventual sale price. Structural determinants Sirmans et al (2009) explain that a substantial part of real estate price variance can be explained with the variables age and floor area. Nationwide house price index report (2011) states that a ‘10% increase in floor space adds 5% to the price of a typical home’. The idea of total floor size being the most influential determinant in house is supported by the hedonic results of So et al (1997) and Wabe (1971), and by the conclusions of Lehner (2011). Increasing the quality of life of a dwelling unit makes it more desirable to potential buyers. By adding a bedroom and bathroom through a loft conversion can add up to 23% on the property value (Nationwide, 2011). Further to this point an addition of one more toilet and central heating into a house increases the price by up to 40.4% and 4.9% respectively (Selim, 2008: 74). Looking at Figure 1 by the Nationwide (2010) report we can identify that in London the value of adding a second bathroom and full central heating increases the gross value by 15.6% and 8.3% respectively. ‘The jump from all houses without central heating to those with full central heating can increase the average value by £572’ (Wabe, 1971: 254). The age is normally used as a proxy for residential depreciation in terms of deterioration and has a negative impact on the price in all cases (Ong and Ho, 2003). This is supported the analysis of So et al (1997) that find a negative relationship with age of house and its price, because older properties tend to be inferior in quality compared to newly completed units (So et al, 1997: 44). However in the case of London ‘Older properties are also typically worth more, with a Jacobean property worth 30% more than one built in the fifties’ as reported in
2
the Nationwide (2010) report. ‘The premium will reflect the rarity of such properties whose supply is obviously fixed, as well as the prestige of owning a piece of history in the form of a listed building’ (Nationwide, 2010). So the literature has mixed views on this determinant.
Spatial determinants Since the number and nature of influences on house prices are large, one cannot solely determine house prices by the individual characteristics of the dwelling itself (So et al, 1997; 41). Traditional location theory looks at how house prices are linked to accessibility to central locations, with land prices declining with distance from the CBD (Alonso, 1964) Accessibility to employment centers are jointly purchased in that paying higher prices are compensated with lower cost of commuting to the CBD (So et al, 1997). The most important points of interest seem to be public transport access points and working areas. High distances away from these kinds of areas turned out to have a significant negative impact to property prices (Lehner, 2011: 22). This study will explore role to accessibility and the impact of open space, in the form of population density, on house prices in London. Bajic (1983) records the impact of a new subway line on house prices. Accessibility to a tube station causes reduction in time and direct saving in commuting costs, with the effect of a subway on market value at $2237 in the Spadina area (Bajic, 1983:156). Moreover, research by Baum-‐Snow and Kahn (2000) show that a decrease in distance from 3km to 1km to a rail way connections increases the mean price of houses by $4972 as tested in 5 US cities between 1980-‐1990 (Gibbons and Machin, 2008:110). Finally, for every kilometer increase of distance between home and station from the London Underground and National rail stations leads to a 1-‐4% decrease in the household price (Gibbons and Machin, 2005). Irwin (2002) observes the impact of open space attributes on the impact of property prices. Population density will be used as a proxy for diminished open space in this study. It is clear from the literature that the expectation of residential sales price will decrease with the population density, where a recorded decrease of up to 2% is visible from the regression results (Irwin, 2002:470). The literature review showed that the floor area turned out to have a strong impact on the housing unit’s price. It is furthermore expected that older flats yield lower prices that newer ones, however properties in London considered to be rare of have a very fixed number of supply bare higher premium qualities. Negative price impacts are also expected form the distance to the CBD as well as from distances from other points of interest such as tube stations, as well with increased population density.
Figure 1: Nationwide (2010)
3
Part 2: Description of the Dataset The data set that is used consists of recent London sales transactions over a single 3-‐month period of 2444 properties. This section will include tables and graphs to describe both the dependent and independent variables, and will explain any important information that needs to be considered. Description of Dependent Variable: Sale price The positive skewness at 2.330 of the dependent variable, sales price, stands out in Table 1; this is graphically depicted in figure 2. The regression model must be linear for OLS estimations (Studenmund, 2011) therefore the ‘sales price’ variable cannot be considered. A transformation of the sales price variable by taking the natural log must be applied, this gives the variable the properties of a normal distribution about the mean as depicted in figure 2 on the right hand side. Table 1 now shows that the skewness of the Log of the price is at 0.508, close enough to be normal, and will be used in the regression model.
Description of the Independent Variables Table 2 provides insight of the characteristics of the independent variables collected in the date; yellow highlight and green highlight depicting structural and spatial variables respectively. The range of the independent variables shows the depth and relationship of the
Table 1: Description of the Statistics -‐ Dependent Sale price Log of the price
Valid 2444 2444
Missing 0 0 Mean 159127.3733 11.8625271
Median 135000.0000 11.8130300 Std. Deviation 87968.19503 0.46061353
Skewness 2.330 .508 Std. Error of Skewness .050 .050
Figure 2: Sale price and log-‐sale price histograms
4
data. A correlation matrix using the Pearson’s Correlation Coefficient is also used to show the relationship between the independent variables with the dependent, showing the relative impact of each variable on sales price. Table 3 shows that floor size has the biggest impact and is in line with the literature. Table 2: Description of the Statistics – Independent
Table 3: Correlation Matrix using the Pearson’s Correlation Coefficient
Description of the Independent Spatial Variables In order to visualize the data of the independent spatial variables, GIS mapping was undertaken. The area enclosed by Holborn, Tottenham Court Road, Bank and Piccadilly tube stations, defines the CBD; this is visualized in Figure 4. Kringing maps below visually capture
Variable Mean Std. Deviation Maximum Minimum Range
Floorm2 87.87 34.507 270 25 245 Bathrooms 71.67 36.169 3 1 2
Age 1.09 .307 401 0 401 Chnone .08 .276 1 0 1
DistCBD 12549.579 5281.823 25939.974 765.935 25174.039 DistTube 2813.04 2798.593 18351.401 0.00 18351.402 PopDens 43.161 3777.844 717.527 0.118 717.527
Variables Lnprice Floorm2 Bathrooms Age chnone distCBD distTube popdens Lnprice 1 .696 .438 .301 -‐.210 -‐.254 -‐.209 -‐.036
Figure 3: Kringing Map showing the relationship between distance to the CBD station and property price
5
the relationship of the 3 spatial variables on house prices in London. Figure 3 shows the relationship between distance to the CBD and its affect on house prices. The colour range from yellows to red show the properties on the map; where yellow is the bottom quartile of all house prices and red being the upper quartile. Figure 3 shows that the majority of the red properties are located within the inner most circles of the CBD, where as the majority of the yellow properties are located within the outer most layers. Figure 3 tells us that the majority of the most expensive houses are clustered around the CBD.
Figure 5 shows the relationship between the distance from a tube station and its effect on property prices. Again the same colour code is used for properties in the Kringing map. The colour range from green to white shows the distance of a property from its nearest tube station; where green indicates closer. Figure 5 shows that nearly all of the red properties are in the green area, which indicates that the majority of the expensive houses are very close to a tube station. However there are some deviations from both of these conclusions where a small number of
Figure 4: Map showing the CBD of London Source: Google Maps
Figure 5: Kringing Map showing the relationship between distance to a tube station and property price
6
highly prices properties in the North West and South East of London are based in the outskirts both far away from the CBD and from tube stations.
Figure 5 shows that only in the most populated dense area of London, west of the CBD, do we see the most expensive houses and no cheap houses. This is the opposite of what the literature said. However expensive houses are also found in area with lower population density, namely north of the CBD. We can also see that the majority of the lower quartile properties are in the least populated areas. Figure 5 has a complicated pattern and thus a conclusive relationship cannot be drawn purely from mapping the data. The kringe maps are useful when analyzing the data visually, however they don’t provide any information about causality of these variables on sales price, as is apparent in figure 5. For this we need to construct a liner regression model including all the other relevant independent variables to show their relative affect on property price.
Figure 6: Kringing Map showing the relationship between Population density and property price
7
Part 3: Hedonic Regression Dwelling units are unique with varying qualities as described by the literature; in order to incorporate these heterogeneities into price estimations hedonic theory can be applied. Hedonic pricing approach is valuing specific goods characteristics depending on their utility for potential buyers (Lehner, 2011). Hedonic pricing approach is typically used to estimate the contribution of these individual characteristics. Lancaster (1966) applied hedonic theory in the field of real estate for the first time in the sixties. Because residential property’s are multidimensional commodities (So et al, 1997), the need to assess multiple factors are adherent to the success of their analysis. The final regression model will include 1) total floor area 2) number of bathrooms 3) age of property 4) whether or not it has central heating 5) distance from the CBD 6) distance from the nearest tube station and 7) local population density. The regression models will look like the following:
Y = βo + β1X1i + β2X2i + … + βkXki + ϵi Y = dependent variable βo = constant term β = regression coefficients X = independent variables ϵ = the error term The following hypothesis will also be tested and P values will be used to test the significance: H0 : The independent variables do not affect the dependent variable. H1: The independent variables do affect the dependent variable. The classical assumptions are the basic assumptions required to hold in order for the OLS to be considered the ‘best’ estimator available for regression models (Studenmund, 2011: 93). If one or more of these assumptions do not hold, other estimator techniques may sometimes be better than OLS. These 7 assumptions are listed bellow:
8
The baseline model The base line model will include the structural variables considered the most important determinants of the house prices in London. Tables 4 and 5 show the results obtained from the regression analysis. The regression analysis provided the following results:
Table 4: Model Summary Model R R Square Adjusted R
Square Std. Error of the
Estimate 1 .741a .550 .549 .30933958 a. Predictors: (Constant), None central heating system, Number of bathrooms, Age of the house, Total house area in square meters
Table 5: Coefficientsa Model Unstandardized Coefficients Standardized
Coefficients t Sig.
B Std. Error Beta
1
(Constant) 10.771 .027 402.029 .000
Total house area in square metres
.008 .000 .582 36.882 .000
Number of bathrooms .219 .023 .146 9.416 .000 Age of the house .003 .000 .200 14.492 .000 None central heating system -.158 .023 -.095 -6.870 .000
a. Dependent Variable: Log of the price
The standardized coefficients are used to depict the baseline model and show the comparison between the sale price and the variables that determine it.
Analysis of the baseline model: Table 4 that the structural determinants cause 54.90% of the variation in house prices across the London area. When comparing the analysis of the individual independent variables we look to the standardized coefficients rather than the unstandardized coefficient. This is because the unstandardized coefficient doesn’t take into account the differences in unit measurement of the independent variable. The standardized coefficients give the relative weighting of the independent variables and we can report that the Table 5 shows the greatest contributor to house price is the total house area, with a standardized coefficient of 0.582. Increases in number of bathrooms and age have positive effects on sale price and no central heating system reports a negative coefficient which means it decrease the value of a property.
Structural model: Lnprice = 10.771 + 0.582 floorarea + 0.146 bathrooms + 0.200 age – 0.095 chnone
9
Table 5 shows that all the variables have a high t-‐statistic and p values of 0.000, which proves that all the independent variables are significant with a 99% confidence level. Moreover, the calculated f-‐value of 744.397 shows that the overall equation is statistically sound at the 99% confidence level. Putting the two together we can statistically say that the Null hypothesis can be rejected. The extended model: includes the spatial variables within it The regression analysis provided the following results:
Table 6: Model Summary Model R R Square Adjusted R
Square Std. Error of the
Estimate 1 .779a .607 .605 .28931951 a. Predictors: (Constant), Population density of the local area , None central heating system, Number of bathrooms, Age of the house, Distance to tube stations, Total house area in square metres, Distance to CBD
Table 7: Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients
t Sig.
B Std. Error Beta
1
(Constant) 11.057 .036 309.819 .000
Total house area in square metres
.008 .0002 .633 41.466 .000
Number of bathrooms .162 .022 .108 7.329 .000 Age of the house .0014 .0002 .114 8.188 .000 None central heating system -.1475 .0227 -.088 -6.851 .000 Distance to CBD -1.359E-005 .000 -.156 -9.382 .000 Distance to tube stations -1.959E-005 .000 -.119 -8.143 .000 Population density of the local area
.0005 .0003 .038 2.647 .008
a. Dependent Variable: Log of the price Including the remaining spatial variable leads to the extended model. The adjusted R-‐squared value of 0.605 shows that 60.50% of the variation in sales price is to be accounted by all the 7 independent variables carried out in this study. The standardized coefficients are used again to illustrate the model.
Analysis of the extended model: The most significant fact to take away from the extended model is that total floor area remains the most influential independent variable on sales price. By including the spatial variables we can see that compared to the structural determinants they provide a greater increase in the price. Population density of the local area has the lowest effect on sales price,
Extended model: Lnprice = 11.0572 + 0.633 floorarea + 0.108 bathrooms +0.114 age – 0.088 chnone –
0.156 distCBD – 0.119 distTube + 0.038 popdens
10
however unlike the literature presumed the coefficient is marginally positive, this may prove that in the London area increased population may increase the house price. All of the variables as shown by the p-‐values in table 7 are statistically significant at the 99% confidence level, backed up by an F-‐statistic of 536.593 as shown in Table 8. Structural
Model Spatial Model Fully extended
model Floor Area of Dwelling (m2)
0.0078*** 0.0084***
(0.0002) (0.0002) Number of Bathrooms 0.2191*** 0.1616** (0.0259) (0.241) Age of Property 0.0025*** 0.0014*** (0.0002) (0.0002) No Central Heating (dummy)
-‐0.158*** -‐0.1475***
(0.0238) (0.00227) Distance to the CBD (meters)
-‐1.806E-‐5*** -‐1.739E-‐5***
(0.0000) (0.0000) Distance to the nearest Tube (meters)
-‐3.291E-‐5*** -‐1.442E-‐5***
(0.0000) (0.0000) Population Density -‐0.0012** 0.0005** (0.0004) (0.0003) Constant 10.7707 12.2279 11.1630 (0.0296) (0.0398) (0.0385) Adjusted R-‐squared 0.5489 0.0809 0.605 F Statistic 744.397*** 71.627*** 536.593*** Number of observations 2444 2444 2444
Table 8: Results of the regression, unstandardized beta coefficients are reported. Numbers in parentheses are robust standard errors. *** Significantly different from zero with 99% confidence
11
Classical Assumptions:
Classical Assumption 1: Taking the natural log to transform the dependent variable of sales price shows that the regression model is linear in the coefficients. This is depicted in Figure 7 therefore the assumption is satisfied.
Classical assumption 2: Econometricians add a stochastic (random) error term to regression equations to account for variation in the dependent variable that is not explained by the model (Studenmund, 2011:95). These error terms are assumed to be drawn from a random variable distribution with a mean of zero, to show this Table 9 shows that the mean the distribution is 0.000 therefore satisfying this assumption.
Classical assumption 3: It is assumed that the observed values of the explanatory variables are independent of the values of the error term, and are not correlated to them (Studenmund, 2011: 97). Scatter plots with each explanatory variable against the unstandardized residual are provided
Statistics Unstandardized Residual
N Valid 2444 Missing 0
Mean .0000000
Figure 7: P-‐P plot showing linearity
Table 9
12
below, all 7 graphs show that there is no correlation; therefore the 3rd assumption may be satisfied proving that this model is not bias. The OLS estimates have been proven to only be influences by the explanatory variables.
Figure 8
13
Classical Assumption 4:
Assumption 4 says that an increase in error term in one time period does not show up in another time period (Studenmund, 2011: 97). This assumption is only important in the context of a time series between data. Because the data for this study is collected in a single time period of 3 months this assumption can be disregarded.
Classical assumption 5: The variance or dispersion of the distribution from which the observations are drawn must be constant, that is to say that the observations are drawn continually from identical distributions (Studenmund, 2011:98), this is known as homoscedasticity. If the variance were increasing it is known as heteroscedasticity. Figure 9 shows that the dispersion of observations are even and random therefore this satisfies the assumption that the error term has a constant variance.
Classical assumption 6: Perfect collinearity between 2 independent variables implies that they are the same variable. Mulitcollinearity is where more than 2 variables are involved. If this were the case with the data the regression would not distinguish between the relative affect of one independent variable compared to another on the dependent variable. The variance inflation factor is used to detect for multicollinearity. With all variables in table 10 being below 1.5 it shows that the assumption has been satisfied, no variable is a perfect linear function of another
Coefficientsa
Model Collinearity Statistics
VIF
1
Total house area in square metres
1.442
Number of bathrooms 1.340 Age of the house 1.195 None central heating system 1.031 Distance to CBD 1.709 Distance to tube stations 1.323 Population density of the local area
1.283
a. Dependent Variable: Log of the price
Table 10:
Figure 9
14
Classical Assumption 7: The histogram in Figure 10 shows the normal distribution of the error term, as it is symmetrical about the mean. The application of normality is not applied in OLS estimation but is needed in hypothesis testing (Studenmund, 2011:100), to ensure that the t and F-‐statistics can be applicable. This assumption is satisfied. Conclusion The results have proven that the influence on variation of the structural variable relative to the spatial variable account for 54.9% compared to 8.09%. Table 11 show shows the variation influence of each individual variable, total floor area is undoubtedly the biggest contributor to sales price accounting for 48.5%, with an additional bathroom accounting for 19.2%. Access to the CBD is the highest spatial variable with an influence of 6.5% on house prices. The importance of the accessibility variables must not be diminished by this result as both standardized beta coefficients of 0.156 for CBD and 0.119 for tube station show a relatively large negative effect. However population density seems to be a fairly irrelevant determinant of house prices through out this study as shown firstly by the Kringing map (Figure 6), the standardized beta coefficient of 0.038 and R-‐squared change of 0.001. Perhaps another spatial determinant
Variables R-‐squared change
floorarea 0.485 Bathroom 0.192 Age 0.091 chnone 0.044 DistCBD 0.065 DistTube 0.044 popdens 0.001
Figure 10
Table 11
15
such as neighborhood quality or level of school quality may have had a more direct effect as capture in the work of Brasington (1999). The hedonic regression captured 60.5% of the total variation in property price that leaves 39.5% unexplained to other factors. Drawing upon the literature it is clear to distinguish the omitted variables that could have been chosen to further enhance this study and the fit of the model. Firstly, Irwin (2002) looks at the effect of open space where ‘open space is valued most for not being developed on’ (Irwin, 2002:465) which may explain why very expensive properties remain in North West and South East of London (Figure 5). Secondly as mentioned above Brasington (1999) and also Gibbons and Machin (2008) look quite deeply at the effect of school on property prices, this is an important factor to consider, by reporting that there is a ‘3.8% value increase with an increase in performance of target grades in primary schools in London’ (Gibbons and Machin, 2008: 109). Finally, crime is also a big influential factor on property prices with a 10% decrease in values for a one standard deviation increase in local density crime damage (Gibbons and Machin, 2008; Gibbons 2004). Including these discussed omitted variables will allow for a greater level of research and accountability. Although the results of the model have been proven to be statistically viable as well as satisfying the 7 Classical assumptions (Studenmund, 2011), some light needs to be brought on the data and the way it has been collected, because the data only captures one moment in time. If the data had been collected over other regions and times the results may have been different, more accurate and up-‐to-‐date with future preferences. Also, the majority of houses collected by the data were within proximity to the CBD. Thus, the reliability of the conclusions, especially for the spatial variables is questionable. Hamnett (2003) describes that ‘where people live isn’t a free choice, people are forced to live within proximity due to employment opportunities’. These preferences or limitations between accessibility and space could lead to a differing housing pattern in London. . Thus, there are many opportunities to expand this study further and to provide more explanation for house prices in London. Word Count: 3991
16
Bibliography Alonso, W. (1964) The Historic and the Structural Theories of Urban Form: Their Implications for Urban Renewal, Land Economics Vol. 4 No. 2, 227-‐231 Bajic V.(1983), The effects of a new subway line on Housing prices in Metropolitan Toronto, Urban Studies Vol. 20 No. 2, 147 -‐158 Ball M. (1973) Recent Empirical Work on the Determinants of Relative House Prices, Urban Studies, Vol. 10, 213 – 233 Brasington, D. M. (1999). Which measures of school quality does the housing market value?. Journal of real estate research, 18(3), 395-‐413. Chiaradia et al (2009) UCL Residential Property Value Patterns in London Gibbons, S. & Machin, S. (2005) ‘Valuing rail access using transport innovations’ Journal of Urban Economics, Vol. 57 (1): 148-‐169. Gibbons, S & Machin, S. (2008) ‘Valuing School Quality, Better Transport, and Lower Crime: Evidence from House Prices’, Oxford Review of Economic Policy, Vol. 24 No. 1, 99-‐119. Hammnet, C (2003). ‘ Unequal City: London in the Global Arena’ Irwin, E. G, (2002) The Effects of Open Space on Residential Property Values, Land Economics Vol. 78, No. 4, 465-‐480 Lancaster, K. J. (1966) A new approach to consumer theory, Journal of Political Economy, 74 (3) 132–157. Lehner, M. (2011). Modelling housing prices in Singapore applying spatial hedonic regression. Master of Science thesis, Insitute for Transport Planning and Systems, ETH Zurich. Löchl, M. (2010) Application of spatial analysis methods for understanding geographic variation of prices, Ph.D. Thesis, ETH Zurich Malpezzi, S. (2002) Hedonic pricing models: A selective and applied review, in T. O’Sullivan and K. Gibb (eds.) Housing Economics and Public Policy, Blackwell Science, Oxford. Marshall, A. (1890). Principles of economics: an introductory volume. Muth, R. F. 1969. Cities and Housing. Chicago, IL: University of Chicago Press. Nationwide (2011) House Price Index, Retrieved January 7th 2014 from http://www.nationwide.co.uk/hpi/historical/WhataddsvaluespecialreportOct11.pdf
17
Nationwide (2010) What Adds Value, Retrieved January 7th 2014 from http://www.regenerate.co.uk/House%20prices_what_adds_value.pdf
Ong, S. E. and K. H. D. a. Ho (2003) A constant-‐quality price index for resale public housing flats in Singapore, Urban Studies, 40 (13) 2705–2729. Selim, S. (2008) Determinants of House prices in Turkey: A Hedonic Regression Model, Doğuş Üniversitesi Dergisi, Vol. 9 No. 1, 65-‐76 Sirmans, S. G., D. A. Macpherson and E. N. Zietz (2009) The composition of hedonic pricing models, 13 (1) 1–44. Studenmund, A.H., (2011) Using Econometrics, A Practical Guide, 4th Ed, Longman So H. M., Tse R. and Ganesan S. (1997) Estimating the influence of transport on house prices: evidence from Hong Kong, Journal of Property Valuation and Investment, Vol. 15 No.1, 40–47 Wabe, S. (1971) ‘A Study of House Prices as a means of Establishing the Value of Journey Time, the Rate of Time Preference and the Valuation of some Aspects of Environment in the London Metropolitan Region’ Applied Economics, Vol. 3, 247-‐255