using machine learning and google street view to estimate ...thamilt2/...using machine learning and...

22
Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department of Economics University of Richmond [email protected] Erik B. Johnson Department of Economics University of Alabama [email protected] May 31, 2018 Abstract In this paper, we estimate nonmarket values for public parks and open space, distin- guishing between proximity-based use value and visual amenity value. We use machine learning to conduct visual analysis and classify the view of an individual household. In particular, we identify visual characteristics that are similar to those of publicly provided park space and thus provide a similar a visual amenity. This technique al- lows us to estimate the value of proximity to open space and separately identify values associated with visual amenities. We find positive capitalization rates associated with household views of park-like properties. From a policy perspective, such identification could indicate the optimal size, location, and shape of open space. Furthermore, ma- chine learning methods used in construction of our view variable provide a potentially powerful tool for nonmarket valuation studies.

Upload: others

Post on 31-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Using Machine Learning and Google Street View to

Estimate Visual Amenity Values

Timothy L. Hamilton

Department of Economics

University of Richmond

[email protected]

Erik B. Johnson

Department of Economics

University of Alabama

[email protected]

May 31, 2018

Abstract

In this paper, we estimate nonmarket values for public parks and open space, distin-

guishing between proximity-based use value and visual amenity value. We use machine

learning to conduct visual analysis and classify the view of an individual household.

In particular, we identify visual characteristics that are similar to those of publicly

provided park space and thus provide a similar a visual amenity. This technique al-

lows us to estimate the value of proximity to open space and separately identify values

associated with visual amenities. We find positive capitalization rates associated with

household views of park-like properties. From a policy perspective, such identification

could indicate the optimal size, location, and shape of open space. Furthermore, ma-

chine learning methods used in construction of our view variable provide a potentially

powerful tool for nonmarket valuation studies.

Page 2: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

1 Introduction

Hedonic housing price analysis has emerged as a common tool for nonmarket valuation in

urban and environmental economics. Derived primarily derived from Rosen (1974), a large

literature explores the degree to which local amenities are capitalized into housing markets.

The advent of geographic information system (GIS) software provided researchers with the

ability to observe and measure local amenities in a spatial context1. Thus, location attributes

could be defined based on continuous measure of distance and spatial overlap, rather than

based on local political boundaries. This allowed for a richer characterization of location-

specific goods and arguably more accurate valuation estimates. The recent development

of machine learning may offer economists another step forward in being able to properly

measure local amenities. More specifically, the combination of deep learning and image

recognition gives researchers the ability to account for nuanced characteristics of an amenity

that may not appear in a discrete set of descriptors. Similarly, these techniques may be able

to categorize observations with more accuracy than standard classification, in such a way as

to better represent the consumption perspective. Machine learning in this context improves

our characterization of inputs to the hedonic price function, resulting in improved valuation

estimates.

The nonmarket valuation literature has extensively examined open space as an envi-

ronmental amenity. Studies often identify physical attributes of the open space, including

land-cover designation, size, biodiversity measures, or recreational opportunities. In addi-

tion, analyses also measure the magnitude of a household’s consumption of such open space

through several different channels, including distance to open space, the percent of nearby

land that exists as open space, or simply the presence of open space within some distance.

Each of these metrics implies a slightly different consumption good.

An important attribute of open space that may provide considerable benefits to nearby

households is that it serves as a visual amenity. Defining the degree of visual amenity is

a difficult task. Several studies (Paterson and Boyle (2002); Walls et al. (2013)) use a

combination of digital elevation models and land-cover data to characterize a household’s

view. In this paper, we develop a machine learning approach to measuring the visual appeal of

properties to account for an important location attribute. In particular, we train and utilize

an image recognition algorithm to estimate the visual amenity that households consume.

This measure becomes a component of the hedonic price function with an associated marginal

price.

Our analysis rests on the notion that open space conveys value in several ways, including

1See Bateman et al. (2002) for an overview of GIS in environmental economics.

2

Page 3: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

recreational value, ecosystem services, and visual amenity value. Research in nonmarket

valuation of open space tends to focus primarily on the first two aspects. First, the omitted

visual amenity of open space may generate a bias in attempts to estimate recreation values.

For example, proper measurement of the visual characteristics of open space ensures unbi-

ased estimation of recreation demand and correctly informs policies that alter recreational

opportunities. Second, estimates of these visual amenity values that are distinct from recre-

ational value play an important role in determining the optimal provision of open space. In

the context of recreation value, it may be important to consider attributes such as the size

and layout of open space to maximize benefits to the surrounding community. To maximize

visual amenity value, however, such attributes become less important as only access to a

view of open space matters. Given a fixed quantity of land dedicated to open space (rat-

her than residential or commercial development), the spatial allocation of that space could

such that it offers an area and services for recreation. Alternatively, it could be allocated in

a way to expand the perimeter and thus maximize household exposure to a view amenity

through the use of, for example, boulevards, local squares, or small parks. Determining the

optimal balance between visual and recreation amenities requires the relative values of such

amenities.

This paper is organized as follows. First, we review the literature on hedonic valuation of

open space, including analyses that consider visual amenities. We then introduce a method

to define a visual amenity and discuss the few places such a technique has been used in

the economics literature. After reviewing the current state of the literature, we present our

empirical model and describe our data. Then, we present our image recognition algorithm

in detail and its role in our hedonic analysis. Finally, we present results of a housing price

hedonic model that accounts for visual amenity values and conclude with a discussion of our

estimates.

2 Open Space Hedonic Analyses

While there exists an extensive hedonic literature that attempts to value open space, we limit

our review to focus on studies that estimate the value of visual amenities. To begin with,

empirical evidence indicates that type of open space matters. Neumann et al. (2001) and

Liu et al. (2013) find positive housing price premiums associated with proximity to National

Wildlife Refuges, a form of open space with a natural aesthetic. These studies also find

that effects are highly localized around wildlife refuges, potentially driven by houses with

a view of the refuges. These results suggests that the corresponding visual amenity could

be an important component. Shultz and King (2001) also report positive amenity values

3

Page 4: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

associated with natural areas and wildlife habitat, but find that housing prices decrease

with proximity to city/county parks that may be used primarily for recreation. While this

could be the result of aversion to congested areas, the visual difference in the type of open

space may also be a contributing factor.

Several studies have attempted to directly value a view amenity. Weicher and Zerbst

(1973) estimate a hedonic housing price model that includes indicator variables for homes

that are adjacent to a municipal park, further distinguishing between homes that face a

park, back into a park, or face a park with heavy recreational use. They find positive effects

associated with facing a park that does not have heavy recreational use and insignificant or

negative effects in other cases of close proximity.

While Weicher and Zerbst (1973) define the visual amenity as a discrete variable, other

studies use GIS software to define a magnitude of visual amenity. Paterson and Boyle

(2002) employ a digital elevation model to determine how much is visible from a household’s

location. This is combined with land cover data to measure the percent of a household’s

view defined by different land use categories. The study finds that visible developed land

generates a negative price impact, but other visual measures are insignificant. However, the

inclusion of visual measures does have substantial effects on other model estimates. Walls

et al. (2013) take a similar approach and using ArcGIS Viewshed tool to characterize a

household’s view of different land covers. They find negative price effects of forest views,

but positive price effects of farmland and grassy area views.

3 Machine Learning for Image Classification

The method in Paterson and Boyle (2002) and Walls et al. (2013) relies on classifying land

cover into a specific category, such as residential, forest, or grassland. A view amenity is then

defined based on the amount of each category in a household’s viewshed. We take a different

approach, relying on a machine learning image classification algorithm. This approach has

two advantages. First, we are able to avoid potentially restrictive land-use categories and

instead determine the elements of a visual amenity in the process of estimating our an

image classification model. This allows us to measure heterogeneity in residential space. For

example, residential areas may vary considerably on other land use attributes. Second, by

separately measuring view and land-use type we can separately identify the visual amenity

value of open space from the recreation value.

We define the view amenity in our model based on its similarity to a pristine natural

view. We discuss the required data and details of our technique later in the paper, but

here we briefly outline our approach to measuring the view amenity. First, we use Google

4

Page 5: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Streetview to capture images that replicate the view from each household in our dataset: i.e.

looking outwards from the property. We then use Streetview images of parks that include

natural amenities (trees, shrubs, grass) to define a “green” view. Using these park images,

we estimate the parameters of an artificial neural network to build a predictive model that

can score any image based on its similarity to the “green” view. This method is similar

to commonly used discrete choice models in economics, with chosen park images serving as

observations with a dependent variable equal to 1.2 The model then allows us to predict the

score of each household’s view amenity from 0-100.

To our knowledge, no studies have used this type of image classification to define en-

vironmental attributes. Several studies, however, use machine learning image classification

methods in the context of empirical economics. Glaeser et al. (2015) demonstrate the use of

support vector regression to address a lack of economic statistics. Using data from Google

Streetview data, the authors predict median income in a census block group based on pixel

data from images of the neighborhood. The model not only fits the data well, but is able

to predict out of sample median income in a separate city with a high degree of accuracy.

Additional analysis shows that predicted income is strongly correlated with housing prices,

indirectly linking image data to housing prices. In Naik et al. (2014), the authors employ a

model that predicts the perceived safety of a city from Streetview images. The study gat-

hers data on perceived safety based on location images via crowdsourcing, which can then

be used to estimate a support vector regression. Results show a strong positive correlation

between a city’s median family income and perceived safety. Finally, Naik et al. (2017) use

crowdsourced data to determine the preceived safety of a location and estimate a predictive

model. The authors find positive correlation between perceived safety and education levels

in a neighborhood. In general, this line of research establishes a connection between visual

data and economic outcomes but stops short of formally estimating a specific economic value,

such as a capitalization value or willingness to pay in a hedonic analysis.

4 Visual Amenity in the Hedonic Analysis

We estimate a hedonic housing price function with the objective of identifying the marginal

price of the visual amenity associated with open space. To accomplish this, we build a

standard hedonic housing price model which includes a view amenity variable as an additional

2Our model more closely resembles a multivariate discrete choice model in which we define a park categoryin addition to several other image classifications.

5

Page 6: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

attribute. Specifically, we estimate

lnPijt = β0 + βXX + τt + γj + β1Proxi + β2Prox2i + βAV iew Parki + εi, (1)

where Xi is a vector of housing attributes, including number of bedrooms, gross living square

feet, lot size, and dummies for the year the home was built. We also include year-of-sale

fixed effects, τt, and spatial fixed effects, γj, based on US Census block groups.

The variable Proxi measures the inverse of distance to the nearest city/county designated

park. This follows a conventional approach in the hedonic literature that treats open space

as an amenity consumed via proximity. The variable V iew Park denotes the visual amenity

for house i. This measure represents the quality of the view that household i has when

looking across the street. We define a high-quality view as a view that is similar to a

forested/landscaped park. This includes the view of a designated city/county park, but

may also the view of undeveloped open space, well-manicured private land, or any other

land that looks similar to a park. We use several different definitions for V iew Park that

include both continuous and discrete designations. Our objective is to estimate the value

of the visual amenity. We are able to separately identify this value by directly estimating

the proximity value to designated parks. While very close proximity to a park will often

imply a high-quality view, the visual amenity will quickly dissipate as distance to the park

increases. Furthermore, many non-park areas provide a visual amenity but do not provide

the use-value through proximity since they are not designated parks. Thus we are able to

separately identify proximity and visual amenity values. As additional controls, we also

include visual scores for residential and commercial property in Equation 3, V iew Res and

V iew Com, respectively.

Since the view score is a somewhat indeterminate measure, we also estimate a dummy

variable specification in which V iew Park Indi is assigned a value 1 if the household’s view

score is above a particular threshold. Several specifications define this threshold as a percen-

tile in the observed distribution of V iew Park, while one specification defines the dummy

variable equal to 1 if the park score is the highest score among all categories. This appro-

ach fits better with the interpretation of the artificial neural network as a predictive model.

We discuss construction and interpretation of this variable in more detail in a subsequent

section, following a discussion of our data. We then estimate

lnPijt = β0 + βXX + τt + γj + β1Proxi + β2Prox2i + βAV iew Park Indi + εi. (2)

In addition to the above regressions, we estimate alternative specifications to test for

other price impacts. For example, we include the variable Own Parki as an additional re-

6

Page 7: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

gressor to capture the visual amenity of a household’s own property that may be capitalized

into its price. Also, interaction effects may suggest more complex consumption patterns. For

example, by including V iew Parki ∗ Proxi, we can test for whether view and use amenities

are complements or substitutes. A similar interpretation follows for Own Parki∗Proxi. The

interaction V iew Parki ∗ Own Parki indicates potential substitution between view ameni-

ties. Further discussion of these effects follows with estimation results.

5 Data

The first data component of our analysis is housing transaction data. This includes 121,045

observations of sales transactions in Denver, CO, from 1999-2017. Summary statistics for

housing data are shown in the top panel of Table I. In addition to the previously mentioned

structural housing variables and transaction prices, we observe the exact address of each

home. This allows us to obtain images of the home and the view across the street and to

accurately measure distance to the nearest designated park. Next, the exact location of

city/county designated parks is obtained from the Denver Department of Parks and Recrea-

tion. Figure 1 shows the location of designated parks in the study area. GIS software allows

us to overlay housing transaction locations and park locations to measure the distance from

the edge of each housing parcel to the edge of the nearest park.

The final data component is the “park view score” for each housing observation. We use

Google Streetview to obtain pictures of each property in our sample. These pictures define

the view across the street for each household, as well as the image of the property itself. We

also obtain parcel maps from Denver Open Data Catalog to accurately measure the view

angle for each household. In a later section, we describe in detail the process of using these

images to calculate V iew Park and Own Park, as well as scores for other categories.

In important component of this analysis is separately estimating the amenity value of

proximity, presumable a recreation-based value, from the amenity of a view. For econome-

tric identification, we need to observe properties with park-like views that are not in close

proximity to designated parks: i.e. properties with high view scores but low proximity. The

correlation coefficient for these two variables is 0.013. In Table II we report the mean and

standard deviation of our park view score for deciles of the proximity distribution. Mean

and standard deviation of predicted park score are calculated for all observations that are

bounded below and above by proximity values. It is clear that park view score is relatively

independent of our proximity measure. Along with a simple correlation coefficient, these

statistics demonstrate a joint distribution of variables that supports empirical identification.

7

Page 8: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

5.1 Preliminary Regressions

Before discussing our approach to measuring the view amenity, we estimate a hedonic price

function in which a house is determined to have a park view if it is facing a designated

park, similar to Weicher and Zerbst (1973). Here, we simply hope to motivate the role

of a view in the hedonic price function. This approach considers only designated parks.

Therefore, estimates cannot capture the value of any other park-like land that may convey

the same view amenity. In addition, all properties with a view of a park will also be in very

close proximity to a usable park, so that this model does not have the same identification

power in separately estimating values derived from a view amenity versus those derived from

proximity (i.e. recreation value).

We limit our sample to roughly 17,000 observations of only those homes that are within

100 meters of an officially designated park. Using residential parcel maps, we determine

which houses have a park view based on the angle of the property’s frontage and it’s proximity

to a park. Specifically, we assign Park V iew = 1 if an orthogonal line drawn from the house

parcel boundardy nearest to the park intersects the park boundary prior to intersecting

another parcel. We than estimate

lnPijt = β0 + βXX + τt + γj + β1Proxi + β2Prox2i + βV Park V iewi + εi. (3)

Results show a positive and statistically significant effect of Park V iew. The estimated

coefficient is .0197, with a standard error of .0061. This coefficient can be estimated as

approximately a 1.97% increase in the price of a home when it is facing a park. Based on a

sample mean home price of $444,834, this translates to an increase of $8,763.

We treat these results, along with results from the literature, as preliminary evidence

that view matters as an amenity. Given the shortcomings of this approach, we now move on

to our image classification technique to better construct the view amenity.

6 Image Classification

To create the variables V iew Park and Own Park, we re-train Google’s image recognition

program to detect an array of visual property-type attributes. Our objective is to score

each property in our dataset based on its visual amenity. We define this visual amenity as

similarity to a designated park. This technique involves two steps: 1) Train the model on

a subset of data to understand what is a park-like view, and 2) Apply the model to the

rest of the sample to classify view types for each house observation in our dataset. This

classification involves predicting a view score for different view types.

8

Page 9: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

6.1 Training

We use Google’s implementation of the Inception V3 algorithm, a deep convolutional neural

network. This is an open source software library that is trained for image recognition. The

algorithm is trained to identify 1000 different images. However, our analysis is focused on

identifying views that look like parks. Therefore, we re-train (re-estimate) the final layer of

the neural network, effectively teaching the algorithm to distinguish park-like images from

other urban land uses.3

To train the algorithm to propertly classify view types, we obtain pictures of parks and

other categories from Google Streetview. These images serve as input data to estimate pa-

rameters of the algorithm. Figure 2 shows images designated as parks for training purposes.

In addition to training the model on what a park is, we also train to model on other types of

views to avoid false positives. Figure 3 shows four other classifications used in the training

stage. We define four alternative categories: residential, commercial, intersection, and park

usage. While the first three are fairly obvious, the final category, park usage, indicates the

view of a park that is made up of parking lot and/or facilities: i.e. attributes that are not

associated with a high-quality visual amenity.

6.2 Prediction

The image classification algorithm is trained to classify images based on how much the image

resembles each of the categories. Classification assigns a score for each category to an image.

We use the park category score as a measure of the visual amenity. We obtain images of

the view from each house in our dataset and an image of the house itself so that we can

then calculate V iew Park and Own Park. Figure 4 shows four observations of view images

along with the predicted V iew Park. Figure 5 displays two histograms of predicted park

scores. Panel 5a shows the full distribution of predicted park views scores. Note the high

concentrations of predictions at very low end. In Panel 5b, we show only the distribution of

predicted scores above 0.05 to better illustrate variation in predictions.

Identification of a visual amenity separate from the proximity good requires variation

in the form of 1) properties that have high park scores but are not in close proximity to

parks and 2) properties that have low park scores and are in close proximity to parks. The

correlation between between parks scores and our proximity measure is 0.030. Similarly, the

Spearman rank correlation is −.033. These low correlations indicate variation in the data

due to view amenities that are due to properties other than designated parks. In addition,

3The advantage of using a pre-trained algorithm and estimating the final layer is that 1) less data and lessvariation in the data is required for identification and 2) the computational burden is reduced considerably.For more on implementation of this alogorithm: https://www.tensorflow.org/tutorials/image retraining

9

Page 10: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

proximity to a park does not ensure a high-quality view amenity. Overall, such variation

suggests strong potential for identification.

7 Results

We now turn to estimation results from the housing price hedonic in Equation 3. Table III

shows estimates and standard errors for our baseline specification that includes continuous

scores corresponding to park, commercial, and residential views. We also report results from

a specification that includes only proximity to the nearest designated park to illustrate the

role of distance-based consumption and replicate the conventional approach in the hedonic

literature. Across specifications, the quadratic effect of proximity to parks is initially negative

but turns positive at a proximity of approximately 0.52. This value is in the 96th percentile

of the proximity distribution. The impact is consistent when view scores are accounted for.

Estimates indicate a positive and significant impact of a park-like view, evidenced by

the coefficient on V iew Park. The score variable is measured on a 0 to 1 scale, so this

coefficient can be interpreted approximately as a 9% price premium for the highest-quality

park-like view relative to no park-like view. Alternatively, this coefficient can be interpreted

as approximately a 1.1% price premium for a one standard deviation increase in park score. A

1.1% increasing measured at the mean housing value of $444,834 is roughly $4,893. The view

score associated with residential and commercial properties are statistically insignificant.

Overall, we see evidence that the view amenity, separate from proximity to parks, associated

with park-like properties conveys a considerable capitalization effects.

In Table IV, we show estimates in which our view variables are defined as dummy variables

equal to 1 if the view score is above a particular threshold. Various specifications explore

different thresholds used to define “park-like”. We define this dummy as equal to 1 if the

park view score is greater than the qth quantile in the score distribution or, as in the last

specification, if the park view score is the largest among all view categories. Again, we see

positive and significant price impacts of a park-like view. The coefficient on park score is

generally consistent across thresholds, indicating a price increase ranging from 3.2% to 5.4%

for having a park-like view. A pattern emerges in these estimates, where the estimated

effect of a park view increases when the is definition of such a park view becomes more

stringent. When assigning a park view only if park score is the highest of all the view

categories, however, the estimate falls, perhaps since some view amenities get lost in this

definition. Views of commercial properties carry a negative price premium, while the effect

of residential score is generally insignificant.

In the next set of specifications, we use Own Park to control for the attributes of the

10

Page 11: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

home itself. Results are shown in Table VI. The estimated effect shows a considerable

amount of variation, ranging from 1.5−10%. These impacts are consistently positive and sig-

nificant, indicating a price premium for aesthetically nice properties. Inclusion of Own Park

has no discernible impact on the estimated effect of the park-like view across the street.

Finally, we look at specifications that include interaction effects among our proximity

and view variables. These results are presented in Table VII. In column 1, we report

earlier estimates for comparison. In column 2, we interact V iew Park with both proximity

and Own Park. The former tests for substitutability/complementarity between proximity

(recreational amenity) and the view amenity. The coefficient on the interaction is statistically

insignificant. The interaction of V iew Park and Own Park is positive and significant,

suggesting that the positive value of a park-like view across the street is amplified by a park-

like appearance of one’s own home, and vice versa. This effect is possibly capturing a highly

localized effect of well-kept properties. In column 3, we again see that the coefficient on

V iew ParkxOwn Park is insignificant. A negative significant coefficient on the interaction

between proximity and Own Park suggests that a household considers a park-like score of

its own property as a substitute for proximity to a designated park. This effect may arise as

residential properties with high park view scores may also offer private use/recreation value.

8 Discussion

Our results provide strong evidence for the capitalization of a view amenity in housing prices.

While the continuous magnitude of the view score variable is somewhat difficult to interpret,

the monetary value of having a park-like view, relative to a purely residential or commercial

view, is several percentage points of a home’s value. Interestingly, the lower-bound on our

range of estimates is approximately equal to our preliminary regression results in which

only houses that are facing a designated park are considered to have the park-like view

amenity. Thus failing to properly account for the view amenity across the city leads to an

underestimate.

We conduct a back-of-the-envelope calculation to determine the total capitalized value of

the view amenity in the study area. We take the average observed view amenity and housing

value in our sample and extrapolate to the total number of homes in entire City and County

of Denver. We are therefore assuming that our observed transactions are a random sample

of the locality. The average home has a view amenity of 0.0426 and the 2010 census reports

622,900 owner-occupied homes in the city. We use a coefficient estimate of .0822, coming

from the specification in which view score is treated as a continuous variable and we control

for score of the home itself. An average capitalization value of $1,557 leads to an aggregate

11

Page 12: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

value of $970.3 million. Using 95% confidence intervals on the coefficient estimate, the total

capitalization value is between $665.5 - $1,275.1 million.

While a valuable view amenity creates potential for welfare gains through spatial al-

location of open space that maximizes total view to the area, there are tradeoffs to such

allocation. For example, while the perimeter of an open space parcel is important for the

view amenity, total size or depth of open space may generate larger values linked to recre-

ational opportunities or ecosystem services. Functional aspects, such as parking lots and

facilities may increase recreational value but diminish quality of the view. Our results sug-

gest, however, an important additional component of open space, namely the quality and

quantity of households’ view of such space, that should be considered in valuation.

9 Conclusion

In this analysis, we provide evidence for the amenity capitalization of a household’s view.

Using machine learning techniques, we develop a new approach to quantitatively measuring

view. This allows us to identify the view amenity separately from any amenity value that

works through proximity. This second channel, proximity to open space, has dominated the

literature on valuation of open space. Instead, our results indicate significant effect of view

and suggest that value is created through properties other than publicly designated parks.

From a policy perspective, such identification could indicate the optimal size and location

of open space for welfare maximization. In addition to illustrating an important component

of the nonmarket valuation of open space, our approach to defining a new variable is widely

applicable in hedonic analysis.

12

Page 13: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

References

Ian. J. Bateman, Andy. P. Jones, Andrew. A. Lovett, Iain. R. Lake, and Brett H. Day.

Applying geographic information systems (gis) to environmental and resource economics.

Environmental and Resource Economics, 22(2202):219–269, 2002.

Edward L. Glaeser, Scott Duke Kominers, Michael Luca, and Nikhil Naik. Big data and big

cities: The primises and limitations of improved measures of urban life. NBER Working

Paper, 21778, 2015. NBER Working Paper.

Xiangping Liu, Laura Taylor, Timothy Hamilton, and Peter Grigelis. Amenity values of

proximity to national wildlife refuges: An analysis of urban residential property values.

Ecological Economics, 94:37–43, 2013.

Nihil Naik, Jade Philipoom, Ramesh Raskar, and Cesar A. Hidalgo. Streetscore – predicting

the perceived safety of one million streetscapes. Proceedings of the 2014 IEEE Conference

on Computer Vision and Pattern Recognition Workshops, Washington, DC, June 2014.

Nihil Naik, Scott Duke Kominers, Ramesh Raskar, Edward L. Glaeser, and Cesar A. Hidalgo.

Computer vision uncovers predictors of physical urban change. Proceedings of the National

Academy of Sciences, 114(29):7571–7576, July 2017.

Bradley C. Neumann, Kevin J. Boyle, and Kathleen P. Bell. Property price effects of a

national wildlife refuge: Great meadows national wildlife refuge in massachusetts. Land

Use Policy, (26):1011–1019, 2001.

Robert W. Paterson and Kevin J. Boyle. Out of sight, out of mind? using gis to incorporate

visibility in hedonic property value models. Land Economics, 78(3):417–425, August 2002.

Sherwin Rosen. Hedonic prices and implicit markets: Product differentiation in pure com-

petition. The Journal of Political Economy, 82(1):34–55, January-February 1974.

Steven D. Shultz and David A. King. The use of census data for hedonic price estimates of

open-space amenities and land use. Journal of Real Estate Finance and Economics, 22(2):

239–252, 2001.

Margaret Walls, Carolyn Kousky, and Ziyan Chu. Is what you see what you get? the value

of natural landscape views. Resources for the Future Discussion Paper, RFF DP 13-25,

2013.

John C. Weicher and Robert H. Zerbst. The externalities of neighborhood parks: An empi-

rical investigation. Land Economics, 49(1):99–105, February 1973.

13

Page 14: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Table I: Summary Statistics

Mean Standard Deviation 0.05 Percentile Median 0.95 Percentile

Housing

# of Bedrooms 2.48 0.88 1 2 4

# of Bathrooms 2.14 0.94 1 2 3.5

Square Feet 1437.23 673.90 685 1234 2851

Lot Area 4552.51 3013.79 432 4627 9460

# of Stories 1.45 0.61 1 1 3

Year Built 1964.88 37.38 1898 1965 2015

Year of Sale 2014 1.62 2011 2015 2017

Park Proximity 0.0283 0.1472 0.0004 0.0030 0.0358

View Scores

park 0.0426 0.1218 0 0.0009 0.2357

residential 0.5537 0.3288 0.0145 0.5971 0.9782

commercial 0.1262 0.2517 0 0.0049 0.8439

park (own) 0.0688 0.1846 0 0.0009 0.5201

Table II: Park Summary Conditional on Proximity

Proximity Quantile

(upper bound)Park Score Mean Park Score S.D.

0.1 0.0406 0.1299

0.2 0.0720 0.1810

0.3 0.0447 0.1277

0.4 0.0372 0.1076

0.5 0.0376 0.1120

0.6 0.0359 0.1020

0.7 0.0328 0.1030

0.8 0.0333 0.1062

0.9 0.0387 0.1079

1.0 0.0289 0.0970

14

Page 15: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Table III: Baseline Estimates

Dependent Variable = ln(pricei)

(1) (2) (3)

Proxi -0.9603*** -1.2393*** -1.2495***

(0.2778) (0.2805) (0.2806)

Prox2i 0.9289*** 1.2021*** 1.2126***

(0.2749) (0.2775) (0.2776)

V iew Park 0.0978*** 0.0919***

(0.0141) (0.0160)

V iew Res -0.0100

(0.0092)

V iew Com 0.0088

(0.0133)

n 31290 31290 31290

r.squared 0.6952 0.6957 0.6957

Standard errors in parentheses. All specifications include year-of-sale and block group fixed effects, in addition to additional housing attributes.

Table IV: View as Discrete Variable

Dependent Variable = ln(pricei)

quantile threshold Park Score Commercial Score Residential Score

0.4 0.032 -0.0288 -0.0042

(0.0051) (0.0082) (0.0128)

0.5 0.0353 -0.0241 -0.0001

(0.0052) (0.0083) (0.0128)

0.6 0.0321 -0.0194 0.0016

(0.0054) (0.0085) (0.0130)

0.7 0.0455 -0.0103 0.0107

(0.0059) (0.0087) (0.0131)

0.8 0.0535 -0.0067 0.014

(0.0070) (0.0089) (0.0132)

0.9 0.0544 -0.0127 0.0057

(0.0095) (0.0090) (0.0132)

max 0.0348 -0.0238 -0.0053

(0.0117) (0.0089) (0.0131)

Standard errors in parentheses.

15

Page 16: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Table V: View as Discrete Variable: Park Score Coefficient

Dependent Variable = ln(pricei)

quantile threshold Coefficient Standard Error

0.4 0.0320 (0.0051)

0.5 0.0353 (0.0052)

0.6 0.0321 (0.0054)

0.7 0.0455 (0.0059)

0.8 0.0535 (0.0070)

0.9 0.0544 (0.0095)

max 0.0348 (0.0117)

Standard errors in parentheses.

Table VI: Include Own Park Score

Dependent Variable = ln(pricei)

View Park Own Park

Continuous 0.0822 0.1604

(0.0157) (0.0207)

0.4 0.0317 0.0149

(0.0050) (0.0050)

0.5 0.035 0.0174

(0.0051) (0.0050)

0.6 0.0307 0.0228

(0.0053) (0.0052)

0.7 0.0431 0.0291

(0.0058) (0.0057)

0.8 0.0502 0.0333

(0.0068) (0.0068)

0.9 0.0534 0.076

(0.0093) (0.0103)

max 0.0425 0.1046

(0.0111) (0.0179)

Standard errors in parentheses.

16

Page 17: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Table VII: Interaction Effects

Dependent Variable = ln(pricei)

(1) (2) (3)

Proxi -1.1438 -1.1106 -1.1174

(0.2768) (0.2770) (0.2772)

Prox2i 1.1046 1.0767 1.0996

(0.2739) (0.2739) (0.2739)

V iew Park 0.0822 0.0624 0.0843

(0.0157) (0.0170) (0.0160)

Own Park 0.1604 0.1211 0.1662

(0.0207) (0.0231) (0.0210)

Proxi xV iew Park

-0.0477 -0.0684

(0.0795) (0.0796)

Proxi xOwn Park

-0.2056

(0.1278)

V iew Park xOwn Park

0.2889

(0.0765)

Standard errors in parentheses.

17

Page 18: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Census Block GroupsParks

Figure 1: Study Area

18

Page 19: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

Figure 2: Park images for training

19

Page 20: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

(a) Residential (b) Commercial

(c) Intersection (d) Usage

Figure 3: Images for training on other categories

20

Page 21: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

park 0.99962

com 0.00000

res 0.00016

usage 0.00021

int 0.00000

park 0.99976

com 0.00000

res 0.00001

usage 0.00023

int 0.00000

park 0.63219

com 0.00013

res 0.33452

usage 0.03316

int 0.00000

park 0.00003

com 0.00001

res 0.99970

usage 0.00023

int 0.00000

Figure 4: Predicted View Scores

21

Page 22: Using Machine Learning and Google Street View to Estimate ...thamilt2/...Using Machine Learning and Google Street View to Estimate Visual Amenity Values Timothy L. Hamilton Department

0

3000

6000

9000

0.00 0.25 0.50 0.75 1.00

Park Score

coun

t

(a) Full Distribution

0

300

600

900

0.00 0.25 0.50 0.75 1.00

Park Score

coun

t

(b) park score > 0.05

Figure 5: Distribution of Park Scores

22