statistical analysis of installed wind capacity in the united states

8
Statistical analysis of installed wind capacity in the United States Andrea Staid n , Seth D. Guikema Johns Hopkins University, 3400 N. Charles St., Ames Hall 313, Baltimore, MD 21218, USA HIGHLIGHTS We conduct a statistical analysis of factors inuencing wind capacity in the U.S. We nd that state policies do not strongly inuence the differences among states. Driving factors are wind resources, cropland area, and available percentage of land. article info Article history: Received 23 August 2012 Accepted 21 May 2013 Available online 14 June 2013 Keywords: Wind power Policy impacts Statistical analysis abstract There is a large disparity in the amount of wind power capacity installed in each of the states in the U.S. It is often thought that the different policies of individual state governments are the main reason for these differences, but this may not necessarily be the case. The aim of this paper is to use statistical methods to study the factors that have the most inuence on the amount of installed wind capacity in each state. From this analysis, we were able to use these variables to accurately predict the installed wind capacity and to gain insight into the driving factors for wind power development and the reasons behind the differences among states. Using our best model, we nd that the most important variables for explaining the amount of wind capacity have to do with the physical and geographic characteristics of the state as opposed to policies in place that favor renewable energy. & 2013 Elsevier Ltd. All rights reserved. 1. Introduction The United States had over 40,000 MW of installed wind power capacity at the end of 2010 (U.S. DOE, 2011). However, there is a great disparity among the states as to where this wind capacity exists. Many states use specic policies to encourage renewable energy development, or even wind energy development speci- cally. These policies can vary widely among states both in scope and implementation. Many states have chosen to imple- ment renewable portfolio standards (RPS) in order to incentivize renewable energy. There is no federal goal for renewable energy, which allows the states to set targets at a level that they choose, even if it is none at all. While the RPS is one of the most popular forms of renewable energy policy, it does not always include requirements for specic sources of renewable energy (Berry and Jaccard, 2001). The effectiveness of these and other policies that have been put in place is not always reected in the actual energy makeup of individual states. The installed wind capacity has been growing steadily over the past decade, as shown in Fig. 1. This growth is not consistent across all states, and this paper examines some of the possible reasons for this inconsistency. Wind development is heavily dependent on federal policies that incentivize renewable energy investment, but these apply to all states and do not account for the differences in wind power across states. For example, the Renewable Energy Production Tax Credit (PTC) has helped to make wind power much more cost effective. Under this policy, produ- cers receive 2.2 cents per kilowatt-hour (adjusted for ination) of qualied renewable energy that is produced and sold, including wind power (IRS, 2011). However, the inconsistent history of the PTC is clearly represented in the amount of wind capacity that has been built. The PTC was initially instituted in 1992 and expired in 1999. It was then extended through the end of 2001, when it was allowed to expire. Another brief extension carried it through to the end of 2003, when it again expired and was not renewed until late in 2004. It has continued to be extended since 2005, but only for a couple of years at a time, which does not allow for nancial long term planning (U.S. DOE, 2012). Of particular note in Fig. 1 are the sharp drops in new capacity being built from 2001 to 2002 and from 2003 to 2004. These correspond to the years when the PTC was allowed to expire. This shows that federal policies can have signicant effects on energy investment and development, and it is expected that state policies also inuence these same decisions. The reasons behind decisions to build wind capacity are complicated and the results have proven difcult to model. Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/enpol Energy Policy 0301-4215/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.enpol.2013.05.076 n Corresponding author. Tel.: +14105166024. E-mail address: [email protected] (A. Staid). Energy Policy 60 (2013) 378385

Upload: seth-d

Post on 12-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical analysis of installed wind capacity in the United States

Energy Policy 60 (2013) 378–385

Contents lists available at SciVerse ScienceDirect

Energy Policy

0301-42http://d

n CorrE-m

journal homepage: www.elsevier.com/locate/enpol

Statistical analysis of installed wind capacity in the United States

Andrea Staid n, Seth D. GuikemaJohns Hopkins University, 3400 N. Charles St., Ames Hall 313, Baltimore, MD 21218, USA

H I G H L I G H T S

� We conduct a statistical analysis of factors influencing wind capacity in the U.S.

� We find that state policies do not strongly influence the differences among states.� Driving factors are wind resources, cropland area, and available percentage of land.

a r t i c l e i n f o

Article history:Received 23 August 2012Accepted 21 May 2013Available online 14 June 2013

Keywords:Wind powerPolicy impactsStatistical analysis

15/$ - see front matter & 2013 Elsevier Ltd. Ax.doi.org/10.1016/j.enpol.2013.05.076

esponding author. Tel.: +14105166024.ail address: [email protected] (A. Staid).

a b s t r a c t

There is a large disparity in the amount of wind power capacity installed in each of the states in the U.S.It is often thought that the different policies of individual state governments are the main reason forthese differences, but this may not necessarily be the case. The aim of this paper is to use statisticalmethods to study the factors that have the most influence on the amount of installed wind capacity ineach state. From this analysis, we were able to use these variables to accurately predict the installed windcapacity and to gain insight into the driving factors for wind power development and the reasons behindthe differences among states. Using our best model, we find that the most important variables forexplaining the amount of wind capacity have to do with the physical and geographic characteristics ofthe state as opposed to policies in place that favor renewable energy.

& 2013 Elsevier Ltd. All rights reserved.

1. Introduction

The United States had over 40,000 MWof installed wind powercapacity at the end of 2010 (U.S. DOE, 2011). However, there isa great disparity among the states as to where this wind capacityexists. Many states use specific policies to encourage renewableenergy development, or even wind energy development specifi-cally. These policies can vary widely among states both inscope and implementation. Many states have chosen to imple-ment renewable portfolio standards (RPS) in order to incentivizerenewable energy. There is no federal goal for renewable energy,which allows the states to set targets at a level that they choose,even if it is none at all. While the RPS is one of the most popularforms of renewable energy policy, it does not always includerequirements for specific sources of renewable energy (Berry andJaccard, 2001).

The effectiveness of these and other policies that have been putin place is not always reflected in the actual energy makeup ofindividual states. The installed wind capacity has been growingsteadily over the past decade, as shown in Fig. 1. This growth is notconsistent across all states, and this paper examines some of the

ll rights reserved.

possible reasons for this inconsistency. Wind development isheavily dependent on federal policies that incentivize renewableenergy investment, but these apply to all states and do not accountfor the differences in wind power across states. For example, theRenewable Energy Production Tax Credit (PTC) has helped to makewind power much more cost effective. Under this policy, produ-cers receive 2.2 cents per kilowatt-hour (adjusted for inflation) ofqualified renewable energy that is produced and sold, includingwind power (IRS, 2011). However, the inconsistent history of thePTC is clearly represented in the amount of wind capacity that hasbeen built. The PTC was initially instituted in 1992 and expiredin 1999. It was then extended through the end of 2001, whenit was allowed to expire. Another brief extension carried it throughto the end of 2003, when it again expired and was not reneweduntil late in 2004. It has continued to be extended since 2005,but only for a couple of years at a time, which does not allow forfinancial long term planning (U.S. DOE, 2012). Of particular note inFig. 1 are the sharp drops in new capacity being built from 2001 to2002 and from 2003 to 2004. These correspond to the years whenthe PTC was allowed to expire. This shows that federal policies canhave significant effects on energy investment and development,and it is expected that state policies also influence these samedecisions. The reasons behind decisions to build wind capacity arecomplicated and the results have proven difficult to model.

Page 2: Statistical analysis of installed wind capacity in the United States

Fig. 1. Installed wind power capacity in the United States, 1999–2010.

A. Staid, S.D. Guikema / Energy Policy 60 (2013) 378–385 379

2. Background

There have not been many attempts made at using statisticalmethods to explain or predict differences in wind power (or anyother renewable energy sources) across regions. Much of theresearch into the discrepancies between renewable investmentsamong countries, regions, or states has been qualitative in nature,and the results present more of a discussion of the reasons behindany differences that were found. There have been a number of casestudies that focused on European countries or regions, but thereare also a few attempts at studying these issues in the UnitedStates as well.

Breukers and Wolsink (2007) performed a case study toevaluate the large differences in wind power development inthe Netherlands, England, and the German state of North RhineWestphalia. They compare a number of parameters such aspolicies and policy stability, government involvement and owner-ship, public opinion and involvement, and institutional capacitybuilding. Their results are discussion-based and do not includequantitative analysis of what worked or suggestions for futuresuccesses. Similarly, Toke et al. (2008) compare the wind develop-ment in six European countries: Denmark, Spain, Germany,Scotland, the Netherlands, and England/Wales. They acknowledgethat wind resources in a region are not always the driving factor,and they study other factors in addition to the wind potential of acountry. They find that one of the most important factors is thelevel at which investment and siting decisions are made and whois involved in those decisions. They also find that most oppositionto wind power is extremely localized, and supportive nationalpolicies are not always enough to overcome local opposition. Bothof these studies were highly qualitative, and much of the data fortheir comparisons came from policy research and interviews withthose involved in wind development.

In a related study, Toke uses regression analysis to look at localfactors that influence the outcomes of wind power projects inEngland and Wales (Toke, 2005). He focuses on all of the players atthe local level, and looks at the influence that support of variousgroups has on the outcome of a project. Toke's findings show thatdevelopers can achieve much better outcomes by engaging localgroups in the decision-making process. Economics also seem toplay a big role, as local communities seek to benefit directly fromwind development and are unsatisfied if policies are not laid out todo so. Toke's findings are insightful, but the studies were con-ducted on a small and local scale, as the analysis was conductedfor individual wind projects in England and Wales. The data was

focused on individual opinions of various local parish councils,planning officers, and environmental groups. Policies play a muchsmaller role here, as all of the sites in question fall under the samenational policies and regulations of the United Kingdom.

It is useful to take note of what has worked for certaincountries in Europe, but the politics, cultures, and economies ofthe United States differ substantially from Europe and many of theresults of these case studies will not apply in the U.S. Bird et al.(2005) studied this issue as applied to the United States, and theylooked at the policies, financial incentives, gas price volatility, andmarket rules to see which were influential in driving winddevelopment in specific states. They studied the states in whichwind power has been successfully developed, and attempt topropose lessons learned that could be used to encourage windprojects in other states. They find that the two most importantfactors are state financial incentives and renewable portfoliostandards (RPS) and that these work most effectively in stateswith high wind resources to begin with. They also find that highnatural gas prices are a particularly strong driver for winddevelopment, since it makes wind power much more cost-competitive. It should be noted that this paper was published in2005, when gas prices were still rising quickly. The drastic drop inprice in 2008 makes this finding much less relevant in today'smarkets.

Menz and Vachon (2006) were the first to attempt regressionanalysis for factors affecting wind power in the United States.They used four different response variables with the same set ofcovariates. They looked at the fits of wind capacity in 2003,capacity growth from 2000 to 2003, capacity growth from 1998to 2003, and the number of “large” projects using data thatincluded wind energy potential and the following policy aspects:renewable portfolio standards, generation disclosure require-ments, mandatory green power offering, public benefits funds,and retail choice. They also looked at the duration that thesepolicies had been in effect in order to capture the time-depen-dence and lasting effects of certain policies. Their analysis wasperformed for only a subset of states. They left out states with littleor no wind development or with poor quality wind resourcesavailable. After some initial regression analysis, they also chose toleave out California and Texas, with the reason being that thesetwo states had significantly higher wind capacity than the otherstates being evaluated and they were throwing off the results.This left them with an analysis of 37 states, all with a moderateamount of wind capacity. This introduces a selection bias in theresults, since the records were chosen to be fairly similar to each

Page 3: Statistical analysis of installed wind capacity in the United States

A. Staid, S.D. Guikema / Energy Policy 60 (2013) 378–385380

other and the very high and very low capacity states werepurposely left out. In addition, since this paper was published in2006, Texas has become even more of a wind power leader, and ismore of an outlier today. California, on the other hand, has fallento third in terms of wind capacity, having been passed by Iowa,with several other states following closely behind. Menz andVachon only used linear models to fit their data, and they lookedonly at R-squared values of fit when forming their conclusions.They did not attempt to predict values based on their dataset.This makes it hard to judge whether the factors they analyzed arethe true reasons behind a given state's wind capacity, especiallygiven the number of states that were left out of the analysis.

On a much more detailed scale, another analytical study wasdone by Mann et al. (2012) for the state of Iowa. They divided thestate up into one square kilometer blocks and used a logisticregression model to predict the locations where wind powerdevelopments have been built. The covariates in their analysisincluded wind energy density at two different heights, populationdensity, cropland, and distance from power lines, highways, andairports. Their results show that analysis on an extremely smalland local scale can result in fairly accurate predictions. Unfortu-nately, many of the covariates do not apply when looking at thestate level, since there are significant differences in factors such aspolicies and economics across states.

3. Data

The data used in this analysis consists of variables that arethought to have a possible influence on the amount of wind powercapacity built in each state in the United States. The responsevariable of interest is the built capacity, in megawatts, as of theend of the year 2010 in each of the fifty states. The data have beengathered from multiple sources in an attempt to collect informa-tion that can be used to describe the reasons behind the amount ofwind capacity that has been built in this country. The data werechosen with the hope that these factors will be able to givea reasonably accurate picture of the causes that affect the vastdifferences in wind power capacity among the states and to informdecisions as to which factors are most effective in encouraginginvestments in wind power.

The variables that were used in this analysis cover a broadrange of categories, and they include factors such as the politicalleaning of the individual state's government and the level of thestate's renewable energy portfolio standard. The response variableis the amount of built wind power capacity in megawatts that eachstate had as of the end of 2010. Similarly, one of the covariates isthe installed wind capacity for the year 2000, because it wasthought that a state's past energy decisions could have a potential

Table 1Wind power dataset variable information.

Variable Mean S

Wind capacity in 2010 (MW) 805.34 1Available land (sq. km) 43,827.65 7Available of land (%) 18.2 2RPS mandate (%) 15.35 1Electricity rate (cents per kw h) 10.04 3Median income (2009 $) $50,647 $Democratic portion of Gov't 0.50 0Tax incentives (binary) 0.94 0Rebate incentives (binary) 0.38 0Loan incentives (binary) 0.88 0Other incentives (binary) 0.74 0Wind capacity in 2000 (MW) 50.79 2Amount of cropland (acres) 8128,498.18 8

impact on the installed capacity ten years later. This capacityinformation is from the U.S. Department of Energy (2011). Two ofthe covariates deal with the wind potential for each state, sincesmall and highly developed states are less likely to be able toinstall widespread wind farms than large, rural states. For thedataset, we included the amount of land available for winddevelopment (in square kilometers) and the percentage of eachstate that is available for wind development. Available land isdefined as areas with a capacity factor of at least 30% at an 80-meter height, while accounting for excluded regions such asNational Parks, wetlands, and urban areas. The National Renew-able Energy Laboratory (NREL) performed the wind-potentialanalysis, and the data was accessed through the Department ofEnergy (NREL, 2012). Next, we included a few demographicindices. One of these indices is the median income in each stateaveraged between 2008 and 2010 (in 2009 dollars), and this datacame from the U.S. Census Bureau (2011). In addition, we includedthe percentage of the state legislature that identifies as Democratin the year 2006, with the thinking that any wind project wouldtake several years to be built, so a lag of several years was used.This information was obtained from the National Conference ofState Legislatures (2011). We also chose to use the amount ofcropland in each state (in acres) as a covariate. It has beensuggested that agricultural landscapes and the people living thereare more accepting of wind development, and by using it as acovariate, the analysis will show whether the data supports thisclaim (Sowers 2006). The crop data came from the US Departmentof Agriculture's Economic Research Service (2012). Lastly, weincluded variables that are related to the electricity market andincentives in place for renewable energy. The first of these variablesis the average price paid for electricity in each state, and the datawere obtained from the U.S. Energy Information Administration(2011). Also included is the renewable portfolio standard data foreach state, in terms of the target percentage of their energy that eachstate is aiming to get from renewables. This information comes fromthe Federal Energy Regulatory Commission (2011). The final pieces ofinformation come in the form of any financial incentives for renew-able energy that are in place in each state. These demonstrate whichtypes of incentives each state has in place, and the incentiveprograms are broken down based on the form that the incentivetakes such as tax, rebate, loan, or other. The tax incentives includepersonal tax, corporate tax, sales tax, and property tax incentives.The incentive programs that fall into the “other” category includethings such as grants, bonds, and performance-based incentives.This information was available from the Database of State Incentivesfor Renewables and Efficiency that is funded through the U.S.Department of Energy (2010).

Table 1 shows the general characteristics of each variable. Manyof the variables have relatively high standard deviations, which

tandard deviation Minimum Maximum

606.64 0 10,089.436,569.58 0 380,305.98.2 0 92.01.60 0 50.55 6.2 25.127,607 $36,850 $66,303.15 0.19 0.87.24 0 1.49 0 1.33 0 1.44 0 133.43 0 1615.99868,885.29 24,457 33,667,177

Page 4: Statistical analysis of installed wind capacity in the United States

A. Staid, S.D. Guikema / Energy Policy 60 (2013) 378–385 381

shows that there is very high variability among the states in manyof these areas. This may be obvious to anyone who is familiar withthe United States and renewable energy, but it a useful reminderas to the wide ranges that we dealt with when analyzing thisdataset.

4. Statistical models

Several different types of models were compared in order tofind the best performer in terms of predictive accuracy. The mostbasic model used was a generalized linear model, or GLM.This uses a linear function to model the relationship betweenthe covariates and the response variable by assigning a coefficientto each covariate (Hastie et al., 2009). This type of model tends towork well when the relationship between the covariates and theresponse variable is linear, or close enough to be well approxi-mated by a linear function. In addition to a GLM, a generalizedadditive model, or GAM, was also used in the analysis. In contrastto a linear model, a GAM does not assume a linear relationshipbetween the response variable and the covariates. Instead, it uses asmooth, continuous function that can take on any form (Hastieet al., 2009). GAM's can work well when the data is highlynonlinear, but they can often over-fit the data due to their abilityto pick up even slight fluctuations in the data. It is essential thatGAM's be compared on their predictive accuracy and not their fitfor this reason. A multivariate adaptive regression spline model, orMARS, was also tested on this dataset. This type of model usespiecewise-linear functions to build up an estimate of the relation-ship between the covariates and the response. These local linearapproximations are added together to create the full basis func-tion, after which individual terms are deleted in order to simplifythe model and obtain better predictive accuracy (Friedman, 1991).It uses generalized cross validation to decrease the size of themodel, thus avoiding the problem of over-fitting the data.

Lastly, a number of tree-based models were tested. Classifica-tion and Regression Trees (CART), bagged CART, Bayesian AdditiveRegression Trees (BART), and random forest models were appliedto this dataset. The CART model grows a large tree by splitting thedata recursively into smaller and smaller partitions and thenprunes back the large tree into a smaller and simpler tree bypenalizing complexity in the model (Breiman et al., 1984).The Bagged CARTmodel is similar, but it applies bootstrap aggregationin order to improve predictive accuracy. The BART model usesBayesian priors and builds a large number of very simple, weaklearner trees, and then aggregates the predictions from each treeusing a Markov chain Monte Carlo method to sample from theposterior distribution until convergence is reached (Chipman et al.,2010). This method allows for each relatively simple tree to beused to explain only a small portion of the response variable.

The last tree-based method that was applied to this dataset israndom forest. This method again uses bootstrapping, but it growsa tree from each bootstrap sample. This results in relativelyuncorrelated trees. By averaging predictions across a large numberof these trees, the variance of the predictions is generally reduced(Breiman, 2001). The trees in random forest use a randomlyselected subset of variables and choose optimal splitting points

Table 2Holdout analysis results: mean absolute error and mean squared error.

GLM1 GLM2 GAM BART

Mean absolute error means 1048 1033 1171 600Mean absolute error std. dev. 882 949 1339 290Mean squared error means 10049311 10643473 21379770 1510463Mean squared error std. dev. 36187916 43562519 82762292 2439893

until a selected minimum node size is reached. These trees are notpruned, as they are in a CART model, for example. Instead, theminimum node size and random bootstrap samples usually keepthe individual trees from being overly complex, and the averagingacross all of the trees can often result in accurate predictions fora dataset.

In order to compare the predictive accuracy of these models,a holdout analysis cross-validation was performed. This methodtrains the models on a portion of the data, setting aside a testsample to be used later. For this analysis, we held out a randomlyassigned 20% of the data for each of 100 holdout runs andcompared the results of different model types for each run.We included two linear models, since they are relatively simpleand a good point of comparison. One used variables based on someinitial studies of GLM fit, and the second GLM used a step-basedform of variable selection that allowed the model to choosethe best variable combination within the cross-validation.We included one GAM for comparison as well, although someearlier attempts at using GAM's for this data did not produce verygood results, and the GAM was not expected to performwell in theholdout analysis. We included a MARS model, even though thistype of model is fairly similar to a GAM and is again not expectedto predict well for this dataset. Lastly, we included the tree modelsdiscussed previously.

The variables in the dataset did not have high correlationfactors, and the variable inflation factors were low for the linearmodels, so a principal component analysis transformation of thedata was deemed to be unnecessary. The data was standardizedbefore the models were trained. Many of the covariates are ondifferent orders of magnitude and standardizing the data allowsfor a more intuitive comparison of variable influence. Manyvariable combinations were tried for the GLM's and GAM's.Although it is easy to evaluate for fit using methods such as thelikelihood ratio test, assessing a model's performance for predic-tion is only possible in the context of a holdout analysis. Even so,some of the variables could be removed from the models due toextremely high p-values. One of the linear models used in theholdout analysis contains a subset of the covariates based on thebest fitting model. The other linear model allows for automaticvariable selection by adding and subtracting covariates in order toend up with the best predictive model. Similarly, it is easyto choose variables for a GAM based on fit, but the GAM alsoallows for automatic variable elimination in order to end up at asimpler model.

5. Results and discussion

The holdout analysis compared 8 different models in additionto the mean-only prediction, resulting in 9 total predictions tocompare. The prediction error results are shown in Table 2.The random forest model has the lowest overall mean absolute error(MAE) at 514. It also had the lowest mean squared error (MSE.)

In order to accurately compare the results of the 9 differentmodels, a Bonferroni correction was applied to the results oft-tests between all of the model combinations. A Bonferronicorrection accounts for the multiple, simultaneous hypothesis

CART MARS Bagged CART Random forest Mean only

636 1140 555 514 924388 965 322 299 286

1705590 11051960 1456377 1280061 21745012415892 43083895 2474729 2277692 2993220

Page 5: Statistical analysis of installed wind capacity in the United States

Table 3Corrected t-test values for model comparison using Bonferroni correction.

GLM1 GLM2 GAM BART CART MARS Bagged CART Random forest

GLM2 1.00GAM 1.00 1.00BART 0.00015 0.00098 0.00222CART 0.00129 0.00603 0.00729 1.00MARS 1.00 1.00 1.00 0.00002 0.00012Bagged CART 0.00002 0.00018 0.00067 1.00 1.00 0.00Random forest 0.00 0.00003 0.00019 1.00 0.47602 0.00 1.00Mean 1.00 1.00 1.00 0.00 0.00 1.00 0.00 0.00

0 2000 4000 6000 8000 10000

0

1000

2000

3000

4000

5000

6000

Random Forest Predictions

Wind Capacity in 2010 (Actual)

Pre

dict

ed W

ind

Cap

acity

Fig. 2. Random forest predictions vs. actual wind capacity.

0.0e+00 1.0e+07 2.0e+07 3.0e+07

Random Forest Variable Importance Plot

Available Land

Amount of Cropland

Wind Capacity in 2000

Available % of Land

Electricity Rate

Median Income

RPS Mandate

Loan Incentives

Other Incentives

Rebate Incentives

Tax Incentives

Variable Importance

Fig. 3. Variable importance plot for random forest model.

A. Staid, S.D. Guikema / Energy Policy 60 (2013) 378–385382

tests being conducted on the same data. It is necessary in this casesince there are 9 models being compared. The corrected t-testvalues are shown in Table 3. The random forest model issignificantly better than the mean-only model in terms of predictiveaccuracy. It also outperforms many of the other models tested, andwe identified it as the best model for this dataset.

The predictions for the random forest model are shown inFig. 2. As expected, the plot for random forest shows low variancein the predictions. The data points are closely grouped alonga recognizable line, and the predictions match up well with theactual values. It had the lowest absolute errors overall, although itappears that it tends to underestimate capacity slightly acrossthe board.

The random forest model does not assume linearity. Treemodels in general are very flexible and can handle high variancein the data very well (Breiman, 2001). The wind data that we areusing for these models is, for the most part, non-linear. This is oneof the reasons that the random forest model outperforms some ofthe other models. Some of the other tree-based models alsoperformed reasonably well, probably for the same reason ofaccounting well for non-linearity in the data. The GLM, GAM,and MARS models have errors that are higher than the mean-onlymodel. It is likely that the GAM and MARS models tended to over-fit the data, which resulted in poor predictions. The GLM modelsmay have been too simplistic to represent the relationships inthe data.

The importance of variables in a random forest model isdetermined by assessing the degree to which removing a variablefrom the model causes a reduction in the out-of-bag error.Removing the most important variables will account for thelargest reductions in predictive accuracy, since the model is thenleft to use a subset of less influential variables. This importancemeasure, which can be interpreted as the relative influence of thecovariates, is shown in a Variable Importance Plot in Fig. 3. The topfour most influential variables are consistent with the mostsignificant variables identified in many of the other models.The amount of available land is identified as the most important interms of error reduction, followed by the amount of cropland, thebuilt wind capacity in 2000, and the available percentage of land.

In order to visualize the level of influence of each of thesevariables, partial dependence plots are shown in Fig. 4. Each plotshows the marginal influence that that particular variable has onthe response. This is done by varying one variable at a time over itsrange and then evaluating how the response variable changes withrespect to that variable alone. It should be noted that the ranges ofeach variable have been standardized for these plots, so a value ofzero on the x-axis represents the mean value for each variable.These partial dependence plots are highly nonlinear for most ofthe variables of interest. This helps to explain why the randomforest model generally outperforms many of the models thatassume linear relationships between the covariates and theresponse, since it is able to deal with highly nonlinear data.The individual plots depict the nature of that variable's influence

on the response variable, which is the amount of installed windcapacity in the year 2010. The partial plot for the amount ofavailable land for wind development shows a positive influence onthe response, and there seems to be a threshold at which theresponse takes a big jump up, showing that the states withextremely high wind resources are more likely to invest in windpower. A similar relationship can be seen for the amount of cropland.

Page 6: Statistical analysis of installed wind capacity in the United States

0 1 2 3 4

500

1000

1500

2000

2500

Partial Dependence on Available Land

Available Land-1 0 1 2 3

1000

1500

2000

2500

Partial Dependence on Cropland

Amount of Cropland

0 1 2 3 4 5 6

1000

1500

2000

Partial Dependence on Capacity in 2000

Wind Capacity in 2000-0.5 0.0 0.5 1.0 1.5 2.0 2.5

700

800

900

1000

1100

1200

Partial Dependence on Available Percent of Land

Available Percentage of Land

-1 0 1 2

700

720

740

760

780

800

820

840

Partial Dependence on Median Income

Median Income-1 0 1 2 3

750

800

850

Partial Dependence on RPS Mandate

RPS Mandate

Fig. 4. Partial dependence of selected variables in random forest model.

A. Staid, S.D. Guikema / Energy Policy 60 (2013) 378–385 383

It has a steadily increasing positive influence on the wind capacityin 2010, and the plot also jumps up towards the end for states withextremely large areas of cropland. The plot for the wind capacity in2000 shows a positive influence, as expected, but it stays levelafter the initial jump. This seems to show that there is a barrier toa state's initial investment in wind power, but after the firstproject, they are more likely to continue investing in wind.The available percentage of land in each state has a positiveinfluence on the response variable for the most part, up until thehigher values when it starts to drop off slightly. In addition to thetop four variables, the partial plots for median income and RPS

mandate are also included. The median income of a state seems tohave a slight positive influence, but only up to the point of theinitial jump, and the plot stays fairly steady from there on.The shape of the RPS mandate plot seems to be counterintuitive.Higher renewable portfolio standards would be likely to drivemore investment in wind projects, but that does not necessarilyseem to be the case. After the large initial spike and subsequentdrop, the latter part of the plot levels out. The reason for the initialspike is unknown, but we hypothesize that it is greatly influencedby a few individual states, such as Texas and Iowa. Texas has, byfar, the most installed wind capacity, but it also has one of the

Page 7: Statistical analysis of installed wind capacity in the United States

A. Staid, S.D. Guikema / Energy Policy 60 (2013) 378–385384

lowest RPS mandates. Similarly, Iowa has the second most windcapacity and a very low RPS mandate. These states show that therelationship between the RPS mandate and wind capacity is not aswell defined as expected. It should be noted that the importance ofthese last few variables (available percentage, mandate, andincome) are much lower than the top three mentioned previously.Looking again at Fig. 3, there is a large gap between theimportance of the capacity in 2000 and the rest of the variablesbelow that.

6. Conclusion

As expected, the amount of wind resources in a given state hasa strong positive influence on the amount of built wind capacity inthat state, and it proves to be the most influential parameter inmaking accurate predictions. This result is not surprising, nor isthe finding that the amount of cropland also has a strong, positiveinfluence. Previous studies have looked at the reasons for this andnoted that agricultural areas are receptive to wind power.The results of our analysis support the idea proposed by Sowersthat cropland would be a positive predictor for the amount ofwind development (Sowers, 2006). The most surprising finding isthat state policies seem to play a very small role in predicting windpower development. From the qualitative studies discussed earlier,policies should be very influential in driving state renewableenergy investments. However, this turned out not to be the casein the models tested. Since these policies are specifically devel-oped to incentivize investments in renewable energy (and some-times even in wind energy explicitly,) they do not seem to be aseffective as intended. A state's renewable portfolio standard isdesigned to encourage investment and development of renewableenergy sources, and yet it is only the fifth most important variablein the random forest model. Of course, the RPS is not specific towind energy, and a state can meet its requirements by using anynumber of renewable sources. States that have a large solarresource, for example, would be more likely to meet the RPS usingsolar power as opposed to wind, and this flexibility could accountfor part of the reason behind the RPS not showing up as significantin the model. Our study also does not account for renewableenergy contracts between states or the trading or renewableenergy credits. Renewable energy credits (REC's) allow for statesto meet their RPS mandates by purchasing credits for energygenerated elsewhere. This leads to a complicated system of tradingclean energy, which is not captured in the models presented.There were also financial incentives included in the dataset, andthese did not show up as being significant at all. One reason forthis could be that federal policies take precedence when decidingto invest in wind energy, and something like the production taxcredit drives development far more than any local incentives thatare put in place by individual states. The production tax credit hasbeen shown to be very effective in encouraging wind energydevelopment nationwide, but the state-based incentives do notseem to have much impact at all on wind power expansion ina given state.

It makes sense that the amount of wind resources available ina region would be important in terms of predicting the windpower development in that region. High quality wind resourceswill increase the likelihood that a project is profitable, since thefarm will be producing more power than a farm located in an areaof poor wind resources. The states that had previously madeinvestments in wind power have been shown to be more likelyto continue building their capacity. Although this study does notlook into the reasons for the investment in wind prior to 2000, it issafe to assume that the same major factors, such as wind resourceand cropland, were driving those decisions as well.

One of the drawbacks to this study is the small dataset. Withonly 50 states, it is difficult to deal with any outliers that may exist.States with extremely high or extremely low values of installedwind capacity can throw off the results of the trained model.Texas, for example, has almost three times more installed windcapacity than Iowa, which is second in capacity values. Moreaccurate results could be possible by breaking the country up intosmaller regions. Since state policies do not seem to have a stronginfluence on wind development, the geographic and demographicparameters could easily be captured on a smaller scale, and mayprove to give even more accurate predictions for each region. Thisapproach would be similar to the study conducted by Mann et al.discussed earlier (Mann et al., 2012).

The interactions of policies and incentives for developingrenewable energy in general, and wind power specifically, arecomplex. The decisions made by investors looking to build newwind farms are surely based on a number of considerations thatare not captured in these models. Even so, this study has shownthat there are some factors that are easy to measure and track,which can help to explain many of the differences in winddevelopment among states. The policies currently in place donot provide enough of an incentive for significant investment inwind development. If individual states are looking to take advan-tage of wind power, the current policies need to be evaluated foreffectiveness. If the United States is committed to developingrenewable energy, it is likely that new policies will need to beinstituted that will be more effective than the ones currentlyin place.

Acknowledgements

The research was conducted with the support of the Lee &Albert H. Halff Doctoral Student Award, the Gordon Croft Fellow-ship as part of the Environment, Energy, Sustainability & HealthInstitute (E2SHI) of Johns Hopkins University, and the NationalScience Foundation Grant CMMI 0968711. The views in this paperare those of the authors and not necessarily those of the fundingorganizations.

References

Berry, T., Jaccard, M., 2001. The renewable portfolio standard: design considerationsand an implementation survey. Energy Policy 29, 263–277.

Bird, L., Bolinger, M., Gagliano, T., Wiser, R., Brown, M., Parsons, B., 2005. Policiesand market factors driving wind power development in the United States.Energy Policy 33, 1397–1407.

Breiman, L., 2001. Random forests. Machine Learning 45, 5–32.Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984. Classification and Regression

Trees. Wadsworth International Group, New York.Breukers, S., Wolsink, M., 2007. Wind power implementation in changing institu-

tional landscapes: an international comparison. Energy Policy 37, 2737–2750.Chipman, H.A., George, E.I., McCulloch, R.E., 2010. BART: Bayesian Additive Regres-

sion Trees. The Annals of Applied Statistics 4, 266–298.Federal Energy Regulatory Commission (FERC). ⟨http://www.ferc.gov/market-over

sight/othr-mkts/renew.asp⟩ 2011.Friedman, J., 1991. Multivariate adaptive regression splines. The Annals of Statistics

19, 1–141.Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning:

Data Mining, Inference, and Prediction, second ed. Springer Series in Statistics.Internal Revenue Service (IRS). Form 8835. Renewable Electricity, Refined Coal, and

Indian Coal Production Credit. 2011.Mann, D., Lant, C., Schoof, J., 2012. Using map algebra to explain and project spatial

patterns of wind energy development in Iowa. Applied Geography 34, 219–229.Menz, F.C., Vachon, S., 2006. The effectiveness of different policy regimes for

promoting wind power: experiences from the states. Energy Policy 34,1786–1796.

National Conference of State Legislatures (NCSL). State Legislatures Composition.⟨http://www.ncsl.org/⟩ 2011.

National Renewable Energy Lab (NREL) (via U.S. Department of Energy.) ⟨http://www.windpoweringamerica.gov/windmaps/resource_potential.asp⟩ 2012.

Sowers, J., 2006. Fields of opportunity: wind machines return to the plains. GreatPlains Quarterly. 131.

Page 8: Statistical analysis of installed wind capacity in the United States

A. Staid, S.D. Guikema / Energy Policy 60 (2013) 378–385 385

Toke, D., 2005. Explaining wind power planning outcomes: some findings froma study in England and Wales. Energy Policy 33, 1527–1539.

Toke, D., Breukers, S., Wolsink, M., 2008. Wind power deployment outcomes: howcanwe account for the differences? Renewable and Sustainable Energy Reviews12, 1129–1147.

U.S. Census Bureau. Current Population Survey, 2009 to 2011 Annual Social andEconomic Supplements. Three-Year-Average Median Household Income byState: 2008–2010. ⟨http://www.census.gov/hhes/www/income/data/statemedian/⟩ 2011.

U.S. Department of Energy (U.S. DOE). Database of state incentives for renewablesand efficiency . ⟨http://www.dsireusa.org/summarytables/finre.cfm⟩ 2010.

U.S. Department of Energy (U.S. DOE). Energy efficiency and renewable energy:wind powering America. ⟨http://www.windpoweringamerica.gov/wind_installed_capacity.asp⟩ 2011.

U.S. Department of Energy (U.S. DOE). Database of state incentives for renewablesand efficiency. ⟨http://dsireusa.org/incentives/incentive.cfm?Incentive_Code=US13F⟩ 2012.

U.S. Department of Agriculture (USDA): Economic research service. ⟨http://www.ers.usda.gov/StateFacts/⟩ 2012.

U.S. Energy Information Administration (EIA). Average price by provider. ⟨http://205.254.135.7/state/state-energy-profiles-more-prices.cfm⟩ 2011.