initiatives and outcomes ii.pdf

An Observational Study of Ballot Initiatives and State Outcomes Luke Keele 2140 Derby Hall 150 North Oval Mall Ohio State University Columbus, OH 43210 Tele: 614-247-4256 Email: [email protected] June 6, 2009

Upload: pennstate

Post on 19-Nov-2023




0 download


An Observational Study of Ballot Initiatives and StateOutcomes

Luke Keele2140 Derby Hall

150 North Oval MallOhio State UniversityColumbus, OH 43210

Tele: 614-247-4256Email: [email protected]

June 6, 2009


It has long been understood that the presence of the ballot initiative process leads todifferent outcomes among states. In general, extant research has found that the presence ofballot initiatives tends to increase voter turnout and depress state revenues and expenditures.I reconsider this possibility and demonstrate that past findings are an artifact of incorrectresearch design. Failure to account for differences in states often leads to a confoundingassociation between ballot initiatives and voter turnout and fiscal policy. Here, I conductan observational study based on a counterfactual model of inference to analyze the effectsof ballot initiatives. The resulting research design leads to two analyses. First, I utilize thesynthetic case control method, which allows me to compare over time outcomes in states withinitiatives to states without initiatives while accounting for pretreatment baseline differencesacross states. Second, I use matching to assess voter turnout differences across metro areaswith and without ballot initiatives. In both analyses, I find that ballot initiatives rarely havespillover effects on voter turnout and state fiscal policy.


One feature of the political arena in some states is the initiative process. While the method

by which direct legislation is implemented varies, in 24 states citizens can place legislative

statutes directly on the ballot for passage by the electorate. Though mostly due to reforms

passed during the populist and progressive eras during the early 20th century, the initiative

process must alter the behavior of state legislators today. While the initiative process is

often decried as populism run amok in the popular press, the consequences of initiatives are

thought to be benign to favorable in much of the academic literature (Matsusaka 2004; Lupia

and Matsusaka 2004; Smith and Tolbert 2004). Initiatives are important policy mechanisms

in many states, and it is important to understand how ballot initiatives not only change

policy directly but alter the political behavior of citizens and state legislators. For example,

initiatives are widely credited with helping George W. Bush win Ohio in 2004. It is thus

critical to correctly assess whether the mere presence of initiatives are responsible for altering

democratic outcomes.

For good or ill, few doubt that direct legislation changes outcomes across states, particu-

larly on the issue area in question. It is also thought, however, that initiatives have spillover

effects on outcomes unrelated to the policy issue on the ballot, but assessing the effect of

initiatives on the behavior of states is a difficult task. Given that the initiative process was

not randomly assigned across states and that states are very heterogeneous, we must exercise

great caution before assuming initiatives cause a particular outcome. Both of these obstacles

are common when attempting to make causal inferences with observational data. The prob-

lem is that it is quite likely a confounder exists which might be correlated with the presence

of direct legislation and the outcome in question. We must account for baseline differences

across states before any valid comparisons can be made across states with and without direct

legislation. I argue that past studies have drawn erroneous conclusions about the effects of

the initiative process because they have not accounted for such baseline differences across

states or taken into account causal heterogeneity.

To do this, I conduct an observational study. While we might assume that all analyses

conducted with nonexperimental data are observational studies, the term actually has a

more specific definition. Rosenbaum (2002) defines an observational study as as an attempt

to elucidate causal effects outside the realm of controlled experimentation. A key component


of an observational study is to design the analysis to reproduce as nearly as possible some

of the strengths of an experiment. In what follows I use the counterfactual reasoning of an

observational study to make causal inferences about the spillover effects of ballot initiatives.

Specifically, I attempt to gauge the effect of the initiative process on voter turnout and state

fiscal policy while accounting for baseline differences and heterogenous treatment effects.

First, I examine why we might expect states with initiatives to have higher levels of voter

turnout and a more conservative fiscal policy. I then outline a research design that will allow

us to draw causal inferences about the spillover effects of initiatives. I next outline the three

different statistical techniques that I use for estimation: synthetic case control, matching,

and differences-in-differences and the results from the analyses. I conclude by considering

the substantive import of the findings.

1 Ballot Initiatives and State Fiscal Policy

While the causal mechanism between ballot initiatives and state fiscal policy is not well

understood, it is generally thought that lawmakers in states with the initiative process face

fiscal policy constraints that lawmakers in states without initiatives do not. Citizens in

initiative states may respond to lawmakers with not only electoral retribution but also by

proposing counter legislation (Gerber 1996). Moreover, we would also expect legislators to act

strategically and thus craft policy with an eye to possible counter-proposals from citizens.

In past work, three reasons have been advanced for why state expenditures and revenues

might differ across states with and without direct legislation (Matsusaka 1992, 1995, 2004).

All of these reasons focus on why state legislative outcomes might not match median voter

preferences without corrections from voters in the form of initiatives.

The first arises from general theories about the behavior of legislative bodies. Logrolling

is understood to be the result of lawmakers’ efforts to reduce the transaction costs for passing

legislative initiatives (Weingast, Shepsle, and Johnson 1981; Weingast and Marshall 1988).

One byproduct of logrolling is that median voter outcomes may not be enacted. Direct

legislation allows voters to alter legislative outcomes that may diverge from median voter

preferences due to logrolling. Another reason why state legislative outcomes might not match

the median voter is due to limited information. While legislators might fully intend to comply


with the wishes of the median voter, policymakers often have less than full information

about voter preferences. Thus despite the best of intentions, legislators may implement

policies that deviate from the median voter (Matsusaka 1992). The final reason one might

expect state fiscal policy to differ is that ballot initiatives can alter agenda control in the

legislature. Without direct legislation, the state legislature has monopoly control over the

agenda. Agenda control allows the agenda setter to force outcomes away from the median

voter (Romer and Rosenthal 1979). When the initiative process is present the legislature no

longer has complete agenda control. Any citizen or interest group that can overcome the

signatory requirements may also use the initiative process to set the legislative agenda.

Others have documented empirically that revenue and expenditure patterns appear to

differ across states with direct legislation as compared to those that do not (Matsusaka 1992,

1995, 2004). Given that states are required to balance their budgets, if citizens pass initia-

tives that lower taxes, this will lower revenues and require legislators to lower government

expenditures. Thus direct legislation may lower tax rates forcing legislators to spend less.

Legislators may, however, find other sources of revenue to make up the shortfall.

2 Ballot Initiatives and Voter Turnout

Explanations for why citizens vote (or fail to vote) tend to be based on one of three general

models of political participation: the socioeconomic status model, the rational choice model,

or the mobilization model. The most controversial of these three models is the rational

choice model, which tends to focus on the instrumental calculations behind the decision to

participate in politics. Under this perspective, one votes if the following is true:

PB − C > 0

where P is the probability that one’s vote is decisive, B is the net benefit from having one’s

preferred candidate win, and C is the net cost of voting. This model describes the “calculus

of voting” as one where voting occurs if the benefits of voting and the probability of being

decisive outweigh the costs of voting (Downs 1957; Riker and Ordeshook 1968). While it is

difficult to precisely define the size of B and C, P is equal to 1/n, where n is the size of the

electorate. As such, the probability of being decisive in most elections is very small; making


it unlikely that the benefits ever outweigh the costs no matter their size.

This model of participation has been widely criticized, and even Riker and Ordeshook

acknowledged that purely instrumental calculations are insufficient to cause people to vote.

They introduced a term for the experiential benefits of voting to account for the fact that

the decision to vote is more than a cost benefit analysis on the part of citizens. While the

calculus of voting model has been criticized on many fronts and revised in a number of ways,

it remains a useful model for understanding voter turnout as it helps focus attention on the

incentives for participation.1

The calculus of voting model, however, provides an explanation for why the presence of

ballot initiatives might increase participation in elections. The presence of a ballot initiative

might change the calculus of voting in two ways. First, an initiative may reduce the size of

P . Since ballot elections are held at the state level this reduces the size of the electorate

participating and therefore increases P relative to a national election. Moreover, the more

initiatives there are on the ballot, the chance a voter will be pivotal increases. In some states

as many as 10-20 initiatives may be on the ballot in a single election. The reduction in P ,

however, is likely to be trivial in most instances and is not greater than it would be for any

statewide office. Even in Wyoming, nearly 200,000 citizens voted in 2004. While this may

be true in municipal elections, it is unlikely then that initiatives reduce the probability of

being decisive enough to matter in state elections. Ballot initiatives can, however, increase

the benefits of voting and more importantly make those benefits more salient.

In a presidential or Congressional election, voter estimates of B must be imprecise. Elec-

toral promises from candidates are often necessarily general and possibly ambiguous. Even if

a candidate were to promise a large tax cut or a large increase in targeted public goods, once

elected the politician may renege on the promise and even presidents can do little immedi-

ately without Congresional approval. Thus electoral victory does not ensure the payoff of B.

In contrast, initiatives can have precise payoffs (a reduction in taxes or a ban on smoking)

and become law in a relatively short period of time if not immediately after the election.

Therefore, initiatives are more likely to provide immediate and precise payoffs to voters and

make the benefits of voting more salient.

1Whiteley (1995) provides a useful overview of the debate over rational choice models of participation.


While it would appear that initiatives can increase the benefits of voting, the nature of

initiatives makes the incentive to vote based on initiatives conditional: not every initiative

promises clearly defined benefits; the benefits of many initiatives are diffuse and not well

defined. While Proposition 13 in California offered an obvious payoff to a well-defined con-

stituency in the form of lower property taxes, Proposition 60 on the California ballot in 2004

required that all parties participating in a primary election would advance their candidate

with the most votes to the general election. Passage of this initiative would provide a diffuse

to negligible benefit for most voters, which may not be enough to outweigh the costs of voting

in that election for many voters. The varying content of initiatives also implies that each

additional ballot initiative does not necessarily increase turnout by some increment. Five

initiatives without defined benefits may not increase turnout as much as a single initiative

promising an obvious benefit.

It possible, however, that the effect of initiatives on turnout is not conditional. Perhaps

initiatives coax first time voters to the polls. These first time voters may then become

activated to vote regularly after a history of nonvoting. Here the incentive provided by

initiatives are critical in producing habitual voting. Therefore, if initiatives affect turnout,

we might expect the pattern of effects to take two forms. First, we might see increases in some

elections when a particularly salient issue is on the ballot, or we might witness a permanent

increase due to activation by new voters.

There are, however, several reasons to believe that initiatives may not increase turnout.

First, the promised benefit of any initiative may not be enough to offset the costs of voting

or the small probability of being pivotal. Second, it may be the case that only those with

sufficient individual resources will understand the benefits promised by an initiative. Finally,

the passage of an initiative does not guarantee it will be enacted. Many initiatives depend

on cooperation from the state legislature, and there is evidence that state politicians do not

always cooperate (Gerber et al. 2001). From a theoretical standpoint, then, it is unclear

whether we should expect differences in voter turnout across states with and without direct

legislation. The empirical literature, however, offers an unequivocal answer. Several studies

in the extant literature have found that states with initiatives have higher levels of voter

turnout than states that do not (Tolbert, Grummel, and Smith 2001; Smith and Tolbert


2004; Tolbert and Smith 2005).

3 Research Design

Regardless of whether we are interested in state fiscal policy or voter turnout, the basic re-

search question remains: does direct legislation have spillover effects? Researchers who have

attempted to answer this question have relied on a panel study research design. The outcome

of interest is regressed on an indicator variable for whether a state has the initiative process

along with a set of relevant controls, and the direction and magnitude of the estimated coef-

ficient for the indicator variable provides evidence for whether states with direct legislation

differ from those that do not. Unfortunately, such an analysis ignores several important con-

siderations. In this section, I outline the complications that underlie the research question

and present a research design that engages these complications.

One way to conceptualize the problem is to use the framework of treatment effects and

causal inference. This framework emphasizes the importance of developing the correct coun-

terfactual for any statistical analysis (Holland 1986). That is, we could think of the initiative

process as a treatment that is administered at the state level, and we wish to observe whether

the treatment affects the behavior of those who live in the state such that turnout increases,

or state fiscal policy is changed. More formally, let there be J + 1 states that could receive a

treatment and let Yit be the potential outcome (i.e. level of voter turnout or level of expendi-

ture, etc.) for state i at time t. Each state has two potential outcomes. The first outcome is

Y Tit if state i is exposed to an intervention at time T0. The second potential outcome is Y C

it if

state i is not exposed to the intervention at time T0. Here, I assume that only the first state

of the J + 1 states receives the treatment. Under this assumption, Y Tit is the outcome for

state i = 1 if exposed to the treatment at T0. Let Xit be a matrix of pretreatment observed

and unobserved characteristics for all J + 1 states. If Xit is independent of Y Tit and Y C

it , then

at T0: Y Tit = Y C

it . Assuming this is true,

αit = Y Tit − Y C

it , (1)

and αit is the causal effect of the treatment for state i at time T0 + 1 if the unit is exposed


to the treatment. If we assume linearity and additivity, we can also re-write Equation 1 as

Y Tit = Y C

it + αitTi (2)

where Ti is an indicator variable that is 1 if state i received the treatment but is zero otherwise.

We wish to estimate: a1,T0+1, . . . , an,T0+t.

As is always the case when estimating causal effects, we face a missing data problem. For

the treated state, we do observe the outcome before and after the intervention at T0, but we

do not observe Y Cit at T0 + 1 for this state since it is a counterfactual: it is how a treated

state would behave if it did not receive the treatment. If we could conduct an experiment,

we would randomize the application of Ti such that some states receive the treatment and

some did not. Randomization of the treatment ensures that the outcomes are independent

of the treatment or in formal notation: {Y Tit , Y

Cit ⊥ Ti}. Under this scenario, the inference

is straightforward since the observed and unobserved baseline variables contained in Xit for

both treatment and control groups are balanced: states in the control group should be no

different from states in the treatment group other than differences due to random error. As

a consequence, the treatment would be independent of these baseline variables. For example,

the average turnout among states that did not receive the treatment would serve as an

estimate of Y Cit , and we could estimate the treatment effect of direct legislation as:

αit = E(Y Tit |Ti = 1)− E(Y C

it |Ti = 0). (3)

Using least squares, we could simply regress the outcome on Ti for an unbiased estimate of

αit. With observational data, however, we cannot randomize the treatment, which produces

a variety of complications. To infer that ballot initiatives cause a change in either state fiscal

policy or voter turnout, we must confront three different challenges. First, we must be careful

to avoid posttreatment comparisons. One flaw with current work is that all the comparisons

across the treated and control units are made posttreatment. The only valid comparison to

be made is one where we compare state outcomes both before and after application of the

treatment. This protects against the possibility that the outcomes differed pretreatment,

which of course would rule out a causal effect. Second, we must attempt to ensure that


Y Tit = Y C

it holds at T0. That is we must attempt to verify that the counterfactual states are

identical to treated states before treatment. Only then can we hope that some confounder is

not the cause of differences found posttreatment. Finally, we must account for heterogenous

treatment effects. Undoubtedly, the treatment effect of direct legislation differs across states.

The differing content of initiatives alone ensures that we should expect the treatment effect

to differ across states. Heterogenous treatment effects renders the least squares estimates

of treatment effects inconsistent (Morgan and Winship 2007). In the next section, I outline

strategies for addressing these complications.

4 Causal Inference for Observational Data

Causal inference with observational data is always challenging, but when the treatment occurs

at a highly aggregated level additional complications arise. For the problem at hand, the

treatment occurs at the state level. One outcome, state spending and tax revenues, also

occurs at the state level. For the second outcome, voter turnout, however, we can measure

the outcome at either the macro or micro level. When the outcome and covariates is measured

at the micro level, methods based on matching and propensity scores may be used for the

estimation of causal effects with observational data. Matching, however, is implausible with

large units like states. With small units of geography such as precincts or people, we can

reasonably find units that are nearly identical on a variety of observed characteristics. For

states, it is nearly impossible to find matches on many dimensions. Is there really any state

that one could match to California or Florida to make a valid comparison?

Differences in state baselines are a particular concern since we are comparing states with

highly differentiated political cultures. The majority of the states that adopted the initiative

process were Western or Midwestern states with a history of strong progressive or populist

parties. The difficulty is constructing an appropriate control group for these states. Past

studies have used research designs that rely on a basic over time comparison of states with

and without initiatives while including a set of relevant controls. This form of analysis is

problematic since such a simple comparison of turnout or fiscal policy across states may not

only reflect the treatment effect of the initiative process but also a large number of other

possible relevant confounders. Moreover, we should expect treatment heterogeneity, since the


use of direct legislation varies dramatically from state to state. In the sections that follow,

I discuss a method for estimating causal effects with state level data. I also describe more

traditional matching methods since micro level data is available for voter turnout.

4.1 Synthetic Case Control

The basic objective in this study is to make comparisons between states with direct legisla-

tion to states that do not. The problem is that a simple comparison is not valid since we

expect that pre-initiative differences between states almost certainly also affected subsequent

outcomes. More precisely, we lack an appropriate counterfactual for the comparison. Panel

data, however, presents us with opportunities for identification of the causal effects not pos-

sible with cross sectional data. One method for estimating treatment effects with panel data

is the differences-in-differences model. The logic behind the differences-in-differnces estima-

tor is based on using fixed effects to make the treatment and control groups as similar as

possible. Consider the following fixed effects model for Yit

Yit = φTit + δt + αi + εit (4)


Tit =

1 if state receives treatment in period t,

0 otherwise(5)

The terms δt and αi denote time and unit specific effects. The unit specific effects can be

eliminated through first differencing:

∆Yit = φTit + (δt − δt−1) + εit (6)

If we restrict the model to only two time periods and so long as treatment only occurs in the

second time period such that Ti1 = 0 for all units in period 1 and Ti2 = 1 for the treated and

0 for the nontreated, we can drop the t subscript from the last equation and estimate

∆Yit = φTit + δ + ν (7)


using OLS. Here the treatment effect is

φ̂ = ∆Y T=1 −∆Y T=0 (8)

This is called the differences-in-differences (DID) model since one estimates the over time

change in control and treatment groups and then takes the difference of the two over time

differences. One can show that the DID estimate of the treatment effect can be estimated

with the following OLS regression:

Yit = β0 + β1Dit + β2Ti + β3Dit × Ti + ui + γt (9)

In the above equation, Dit = 1 in the posttreatment period and Ti = 1 if the unit is in the

treatment group. Therefore Dit × Ti equals 1 for treated units in the posttreatment period,

β3 is the DID estimate of the treatment effect, and ui and γt are fixed effects for states and

years (Cameron and Trivedi 2005). The differences-in-differences model is a significant im-

provement over a simple regression since it allows for the presence of unobserved confounders,

but it assumes that the effects of those confounders are constant in time.

While the DID model is one option, for most of the analyses, I instead use a method known

as synthetic case control (Abadie and Gardeazabal 2003; Abadie, Diamond, and Hainmueller

2007). Synthetic case control is a generalization of the differences-in-differences estimator

for treatment effects, where we construct a synthetic counterfactual comparison group. The

counterfactual unit is comprised of a weighted combination of states that do not have the

initiative process chosen to closely resemble a particular state with direct legislation. This

weighted average of states is meant to serve as a synthetic state without direct legislation

against which I can compare to a state with ballot initiatives. I opt for the method of

synthetic case control for several different reasons. First, synthetic case control is superior

to the differences-in-differences model, since it allows the effects of confounding unobserved

characteristics to vary with time (Abadie, Diamond, and Hainmueller 2007). Moreover,

accounting for heterogenous treatment effects is easily done with this method. Finally, the

synthetic case control allows for clear observation of pre- and posttreatment behavior.

To estimate treatment effects with synthetic case control, let J be the number of states


that do not have direct legislation, and W = {w1, . . . , wj}′ is a vector of nonnegative weights

which sum to one. Each scalar element of W is the weight for state j in the synthetic

control. Different weights in W produce a different synthetic control, so the weights must

be chosen so that the synthetic state most closely resembles the treated state before the

adoption of the initiative process. Let X1 be a K × 1 vector of predictors for either turnout

or state fiscal policy before the adoption of the initiative process for the state that adopts

direct legislation. The term X0 is a K × J matrix of the same predictor variables for

the J states that do not have direct legislation. To construct the synthetic control, we

minimize (X1−X0W)′V(X1−X0W), where V is diagonal matrix with nonnegative elements

that reflect the relative importance of the different predictors of expenditures or turnout.

This minimization is subject to the following constraints: wj ≥ 0 (1 = 1, 2, . . . , J) and

w1 + · · ·+ wj = 1. These constraints prevent extrapolation beyond support in the data.

The result is vector of optimal weights W∗ which define the combination of non-initiative

states which best resemble a particular state with direct legislation before that state adopted

the initiative process. Next, I define Y1 as a T × 1 vector of either turnout or a measure

of state fiscal policy for a state with the initiative process where T represents the number

of the time periods under observation, and Y0 is a T × J matrix of the same outcomes for

non-initiative states over the same time period. I approximate the over time path of either

turnout or state fiscal policy for the synthetic control state to serve as the counterfactual

for comparison to the treated state in question. This counterfactual outcome (turnout level,

expenditures, or revenue) for the synthetic state is Y1∗ = Y0W∗. Once the counterfactual

is formed, I simply compare Y1 and Y1∗. Differences between the two after the enactment

of direct legislation is evidence of a treatment effect, while no differences between the two is

evidence against a treatment effect. I make the comparison with a plot of the overtime level

of the outcome before and after the period that a state adopted the initiative process. I also

plot the outcome for the counterfactual synthetic state. The outcomes for the pretreatment

time period should be approximately the same. If adoption of the initiative process changes

the level of the outcome, the treated unit should diverge from the synthetic control unit after

the initiative process has been adopted. This provides for better counterfactual comparisons

than is possible with traditional regression-based research designs. Finally, the method also


necessitates analyzing each state individually. While slightly cumbersome, this also allows the

effect of direct legislation to vary across each state and accounts for heterogenous treatment


That said the analysis that follows is not a panacea. While synthetic case control al-

lows one to make pre- and posttreatment comparisons and account for heterogeneity, it is

still an open question whether we can find appropriate counterfactuals. Here, each treated

state is compared to a weighted average of other states, and it may be impossible to find a

synthetic control that closely matches the pretreatment baseline. However, the method es-

sentially choses the set of states most like the treated state, and this is a great improvement

over current practice of simply using all states without initiatives as the comparison group.

Moreover, synthetic case control retains the fixed effects of DID models thus providing some

leverage over unobserved confounders.

4.1.1 Inference via Placebo Tests

Standard large sample inferential techniques are poorly suited to analyses of aggregate data

since the data are not a random sample drawn from any known population. Moreover, the

sample sizes used here are small since we are confined to subsets of 50 states. For these

reasons, inference for the synthetic control method proceeds via a series of placebo tests akin

to permutation or randomization tests (Abadie and Gardeazabal 2003; Abadie, Diamond,

and Hainmueller 2007). For the placebo test, one of the states in the control group from

which the synthetic case is constructed is used as the treated unit. I then apply the synthetic

control method to this unit repeating this process for every unit in the control group. This

gives me a set of placebo estimates, estimates that compare control to controls for a set of

null effects. I then compare all the placebo estimates to the estimate for the true treated unit,

which allows us to observe whether the outcome for the treated unit is large or small relative

to the estimates for all of the states in the control group. The result is an approximately

exact inference regardless of the number of control units or the number of time periods.2

2The inference is exact in that we are comparing the treated unit to all control units. The inference is not exactin the sense that randomization tests are since randomization has not occurred to ensure exchangeability of theunits.


4.2 Matching

For the analysis of voter turnout, I am also able to conduct analyses with individual level

data as well as aggregate data. To analyze the individual level data, I use matching methods.

With matching the goal is to form essentially equivalent treatment and control groups based

on observables. For this analysis, I am unable to conduct the analysis before and after the

adoption of the initiative process, but I can better identify causal effects for specific initia-

tives. A matching analysis proceeds as follows. First, define X as a matrix of covariates

that are measured before an election with a particular initiative. I then attempt to form

two essentially equivalent groups based on X: one group from states without initiatives,

the control group, and another group from states with initiatives, the treated group. With

matching we assume that observables account for the selection process into treatment and

control and any remaining error in assignment is then unrelated to the assignment process.

Assessment of this assumption can be done through sensitivity tests (Rosenbaum 2002).3 Af-

ter the matched groups are formed, one must evaluate the matching procedure for balance to

see if the distributions of the control and treatment groups are identical across the covariates

in X. Here, the analyst examines group differences in means as well as higher moments. See

Sekhon (2008) for an overview on matching methods.

Once we complete the balance diagnostics, we can estimate several different quantities.

The first quantity of interest is the average treatment effect (ATE), which is simply the

average difference across the control and treatment groups for the outcome Yi. While we are

often interested in the ATE, the average treatment effect on the treated (ATT) is typically

of greater interest to the analyst. The ATT gauges the size of the treatment effect for those

individuals who are either assigned, or who would assign themselves, to the treatment. More

formally, if we condition on observed covariates Xi and achieve balance, following Rubin

(1974, 1978) the ATT is estimated as:

E[αi|Ti = 1] = E(Y Tit |Ti = 1)− E(Y C

it |Ti = 1).

3Another assumption is the Stable Unit Treatment Value assumption typically referred to as SUTVA (Holland1986; Rubin 1978). For SUTVA to hold, the treatment for any unit must be independent of potential outcomesfor all other units and that the treatment is defined identically for all units. The SUTVA assumption is requiredin both experimental settings as well as when using matching.


With these various approaches, I am able provide multiple tests for spillover effects. Moreover

each of these tests better accounts for aspects of causality that must be true before we can

conclude that spillover effects exist.

5 Voter Turnout

The analysis of voter turnout proceeds in two stages. First, I start with an analysis of

aggregate level data, which allows for comparisons of states before and after the adoption of

the direct legislation process. Next, I conduct a micro level analysis that allows me to avoid

the problem of ecological inference and to assess the effects of specific initiatives. Before

starting either analysis, I estimate the effects of initiatives on turnout using a set of naive

models. These naive models do not take into account the counterfactual research design

described earlier and as such all estimates are based on posttreatment comparisons. This

analysis closely parallels that of Smith and Tolbert (2004) and Tolbert and Smith (2005).

Table 1 contains the results from three models of voter turnout for the period from 1970 to

2006, where voter turnout is modeled as a function of state education levels, age, race, income,

closing date before the election, indicators for a midterm or gubenatorial election, and the

democratic share of the vote in the prior election. I also include an indicator variable for

whether the state has the initiative process. The model in column 1 reports panel-corrected

standard errors with no other corrections, and the models in columns 2 and 3 include year

fixed effects and random effects respectively. Regardless of which specification I use, turnout

is nearly three percentage points higher in states with direct legislation. It is difficult to

attribute this difference to initiatives, however, since these estimates are all posttreatment.

It is quite possible that voter turnout was higher in these states before enactment of the

initiative process. With the current research design, however, it is impossible to know. I now

turn to my proposed research design.

5.1 Aggregate Analysis

The synthetic case control method requires several periods of pretreatment data. While

data for two or three pretreatment periods are adequate, it is preferable to have data for at

least five to ten pretreatment time periods (Abadie, Diamond, and Hainmueller 2007). This

data requirement precludes analyzing the outcomes for three states with direct legislation:


Table 1: Naive Estimates of Initiatives Effect on Voter Turnout 1970-2006

Turnouta Turnoutb Turnoutc

Initiative State 2.69 2.79 2.96(0.54) (0.49) (1.04)

Percent With High School 0.60 0.78 0.50Degree (0.08) (0.08) (0.07)

Percent With College 0.31 −0.06 0.23Degree (0.11) (0.06) (0.16)

Median Age 0.15 0.55 0.05(0.15) (0.11) (0.17)

Percent Black 0.02 0.06 0.09(0.04) (0.03) (0.02)

Median Income −0.63 −0.23 −0.51(0.08) (0.06) (0.08)

Closing Date −0.21 −0.19 −0.18(0.03) (0.03) (0.03)

Gubenatorial Election 2.12 1.80 2.23(0.58) (0.61) (0.49)

Midterm Election −17.43 −39.19 −17.18(1.31) (2.05) (0.47)

Presidental 0.13 0.15 0.04Vote (0.06) (0.05) (0.04)

Constant 17.02 −1.93 28.12(5.60) (4.78) (5.28)

N 733 733 733

R2 .73 .73 .73a Panel Corrected Standard Errorsb Panel Corrected Standard Errors with Year Fixed Effects (Omitted).c Random Effects


Alaska, Oklahoma, and Arizona. All three states included the initiative process in their state

constitutions at their inception, therefore no data exists for the pretreatment periods. For

the remaining 21 states, all but four adopted direct legislation between 1898 and 1918. The

four states that did not were Florida (1968), Wyoming (1968), Illinois (1970), and Mississippi

(1992). Among these four states only Florida and Wyoming use the initiative process with

any frequency. Mississippi has had only two initiatives placed on the ballot since adoption,

and Illinois has had only a single initiative make it onto the ballot. For purposes of analysis,

we exclude Mississippi and Illinois as both treated and control units.

Data availability necessitates two separate analyses for each time period. While data

is available for both time periods, more data is available for the later time period, and

therefore, I begin with the analysis of Florida and Wyoming. First, I define the pretreatment

period for each state. While Florida and Wyoming first approved the initiative process in

1968, the first initiative was not placed on the ballot until 1976 in Florida and 1992 in

Wyoming. Initiatives have appeared on the ballot regularly since then in both states. For

Florida, I define the pretreatment time period as 1970 to 1976 and for Wyoming from 1980

to 1992.4 To construct the synthetic case control unit, I use the following set of covariates to

construct weights: Democratic presidential vote share, pretreatment levels of voter turnout,

the time between the closing date for registration and election day, the percentage with a

high school diploma, the percentage with a college degree, median age, median income, and

the percentage of the state that is African-American. The synthetic control unit is then

weighted to match the states of Florida and Wyoming along these dimensions.

The results from the analysis are in Figure 1. We see that for both states the level of

turnout does appear to be higher in the post-initiative period than in the period before the

use of initiatives. The difference between the treated unit and the control is more defined

for Wyoming than for Florida, as the Wyoming level of turnout clearly exceeds those of

the control case in all years. For Florida, the increase only occurs in some election years.

The pretreatment fit for Florida and its synthetic case control is poor, but is reasonable for

Wyoming. The figure suggests that no set of states make a good counterfactual for Florida.

Figure 2 compares the levels of turnout in Florida and Wyoming to a series of placebo tests,

4The reason I don’t extend the pretreatment period for Florida back farther in time is that most of the dataare based on census figures and thus would not be available other than in 1960.


where each control unit served as the treated unit. Again in both instances the level of

voter turnout is higher for these two states than for most of the placebo estimates. In both

instances, there appears to be some evidence that ballot initiatives increased turnout levels.

The limitation, here, is that I can only make comparisons for two of the states with direct

legislation. In the next analysis, I examine the results for an additional 16 states.

1970 1980 1990 2000








FloridaSynthetic Control

1980 1990 2000








WyomingSynthetic Control

Figure 1: Effects of Ballot Initiatives on Voter Turnout, Florida and Wyoming

Most states adopted the initiative process between 1898 and 1918. In that period nearly

20 states allowed for the provision of direct legislation. I repeat the synthetic case control

analysis for this time period, to further test whether the presence of initiatives increased voter

turnout. For this analysis, both Arizona and Oklahoma must be excluded since the initiative

process was included in their founding state constitutions, so no pretreatment period exists.

This leaves the states of Arkansas, California, Colorado, Idaho, Maine, Massachusetts, Michi-

gan, Missouri, Montana, Nebraska, Nevada, North Dakota, Ohio, Oregon, South Dakota, and

Washington for analysis. For these states, I used the date of the first initiative on the ballot

as the period in which the treatment occurred as opposed to the date the initiative process

was enacted. For most states, the first initiative appeared within two years but for sev-


1970 1980 1990 2000








1980 1990 2000









Figure 2: Effects of Ballot Initiatives on Voter Turnout, Placebo Tests for Florida and Wyoming

eral states a much longer period passed before any initiatives made it onto the ballot. For

the historical analysis, I construct the synthetic control unit by with weights constructed

from the following measures: the percentage African-American, the percentage foreign-born,

turnout levels in the pretreatment period, the Democratic share of the two party vote, and

the amount of the poll tax in 1900 dollars. The pretreatment period is generally from 1888

until the year a state adopted the initiative process, but for a few states the pretreatment

period does not begin until 1892 due to differing years of statehood. The results for this

analysis are in Figures 3-6.

In the analysis with more recent data, initiatives appeared to increase turnout, but the

results here suggest that the initiative process, in general, did not increase turnout. For

example in Figure 3, in the turnout rate is lower in both Arkansas and California than for

the control in the posttreatment era. In Colorado turnout increased marginally, with larger

increases in Nebraska and Maine. For Nebraska, however, the pretreatment fit is poor. Of

the 16 states, only 4 appear to have even slightly higher levels of turnout after the adoption

of the initiative process. Of those states, only Nebraska and Maine have much higher levels of


turnout compared to the control case for the posttreatment period. For several other states,

turnout in the post-initative period actually is lower than turnout in the control case. The

evidence from this analysis indicates that initiatives generally did little to boost turnout.

In some states it appears that turnout increased after the introduction of the initiative

process, but for many states little happened. This analysis underscores the need to take

heterogeneity into account, since we find positive, negative, and null effects depending on

the state. Moreover, unlike in past work we can see that in some states there were obvious

differences in turnout during the pretreatment period. This demonstrates how important it is

to account for causal heterogeneity and to carefully construct an appropriate counterfactual.

The clear differences that occurred in the naive analysis are much less obvious. I now turn

to an individual level of analysis to better assess how initiatives may change turnout.

5.2 Micro Analysis

One deficiency in the analysis thus far is that while the treatment in the form of a ballot

initiative is at the state level, the outcome occurs at the individual level. For turnout, it would

be preferable to observe whether the treatment has an effect at the micro-level. Differences

in voter turnout at the state level are not generalizable to the behavior of actual voters

without running afoul of the ecological inference fallacy. That is we must be careful not

to assume that individual members of a group have the same average characteristics of the

group at large. Inferences about individuals based aggregates can lead to incorrect inferences

(Achen and Shively 1995; Goodman 1959). A micro-level analysis also allows for a better

understanding of whether specific initiatives are more likely to spur turnout than others.

For a micro-level analysis, one could use survey data and compare turnout rates across

states with and without initiatives, but large differences exist across state turnout levels

that we might attribute to a number of factors other than the presence of ballot initiatives.

A better strategy is to exploit some component of the study design to make the correct

counterfactual comparison. To identify the appropriate the counterfactual of interest, I use

two methods. The first method that I use to identify the appropriate counterfactual was to

find metropolitan areas that straddled a state border. While many metro areas satisfy this

criterion, I identified only two metropolitan areas which straddle a state border where one

state has the initiative process and the other does not. The first of these is the Cincinnati


1890 1900 1910 1920 1930 1940









ArkansasSynthetic Control

1890 1900 1910 1920 1930 19400








CaliforniaSynthetic Control

1890 1900 1910 1920 1930 1940









ColoradoSynthetic Control

1920 1930 1940 1950









IdahoSynthetic Control

Figure 3: Effects of Ballot Initiatives on Voter Turnout 1892-1940


1890 1900 1910 1920 1930 1940









MaineSynthetic Control

1890 1900 1910 1920 1930 19400








MassachusettsSynthetic Control

1890 1900 1910 1920 1930 1940









MichiganSynthetic Control

1890 1900 1910 1920 1930 1940









MissouriSynthetic Control

Figure 4: Effects of Ballot Initiatives on Voter Turnout 1892-1940, Contd.


1890 1900 1910 1920 1930 1940









MontanaSynthetic Control

1890 1900 1910 1920 1930 19400








NebraskaSynthetic Control

1890 1900 1910 1920 1930 1940









NevadaSynthetic Control

1900 1910 1920 1930 1940





North Dakota




North DakotaSynthetic Control

Figure 5: Effects of Ballot Initiatives on Voter Turnout 1892-1940, Contd.


1890 1900 1910 1920 1930 1940









OhioSynthetic Control

1890 1900 1910 1920 1930 19400








OregonSynthetic Control

1900 1910 1920 1930 1940





South Dakota




South DakotaSynthetic Control

1890 1900 1910 1920 1930 1940









WashingtonSynthetic Control

Figure 6: Effects of Ballot Initiatives on Voter Turnout 1892-1940


metro area. While much of the Cincinnati metro area is in Ohio, which has initiatives, a

significant part of the metro area subsumes parts of Kentucky and Indiana, which do not. In

fact, the Kentucky border lies immediately south of downtown Cincinnati and the Cincinnati

airport is actually in Kentucky. The second metro area that satisfies the criterion is Kansas

City where the city is roughly split between Missouri and Kansas. Citizens in Missouri

regularly use the initiative process, while Kansas does not have an initiative provision. For

both metro areas, we may rightly suspect that there are differences across the areas of the

city in each state. However, in both cases the metro areas are part of a single media market.

Moreover, matching will ensure that the individual citizens are comparable. In both metro

areas, while we may assume there are differences across the two states, the areas are better

counterfactuals for each other than comparisons across states. Table 2 contains a brief

description of the initiatives that were on the ballot in Missouri and Ohio between 1994 and



Table 2: Ballot Initiatives in Missouri and Ohio

Missouri Ohio1994

Campaign Finance Reform Sales Tax Reform

Riverboat GamblingTax Increase

1996Sales Tax Riverboat Gambling

Informed Voter LawIncrease Minimum Wage

1998Expanding Gambling Ban Dove Hunting

Animal RightsBillboard Ban

Campaign Spending Limits2000

None Conservation Bond2002

Collective Bargaining For Public Sector Change Drug Crime PenaltiesCigarette Tax Increase

2004Fuel Taxes Gay Marriage Ban

2006Increase Minimum Wage Increase Minimum WageAllow Stem Cell Research Slot Machines

Increase Tobacco Tax Ban Smoking IVeteran Property Tax Exemption Ban Smoking II

Strip Pension For Felons

Certain years make for better comparisons than others. In 1996 in Ohio, for example,

an initiative allowing riverboat gambling would have disproportionately affected Cincinnati,

since it is the largest city on the Ohio river. The 2002 Ohio initiative to reduce penalties for

certain drug crimes was highly controversial and narrowly defeated. Finally, the 2004 gay

marriage ban initiative in Ohio has widely been thought to have helped elect George W. Bush

in the state. In Missouri, the 2006 stem cell research initiative received national attention

and narrowly passed. Clearly, several initiatives involved high profile political issues that

might have produced higher turnout. This is not always true. For example, it would be

unsurprising if we find no differences between respondents in Ohio and Kentucky in 1998

when a ban on dove hunting was the only initiative on the ballot.


For micro level data, I use the Current Population Survey (CPS) from November of

election years which includes a question on voter turnout and includes a number of relevant

items on characteristics known to be related to turnout such as age, level of education and

income. To improve the quality of the counterfactual, I matched respondents from the

states without initiatives to respondents in the state with initiatives on level of education,

sex, race, weekly earnings, whether the respondent belonged to a union, age, and length of

residence in the metro area. There are other factors that might influence turnout that I

cannot control. For example, Ohio is a perennial battleground state, while Kentucky is not.

These differences should be fairly small, however, since the media market will be the same for

both areas, and respondents will be exposed to the same campaign messages despite living

in a non-battleground state. As such, by limiting the analyses to these metro areas and

then matching on relevant respondent characteristics, the counterfactual of interest should

be appropriate. To match respondents, I used a genetic matching algorithm (Sekhon and

Diamond 2005; Sekhon 2007). For Missouri the respondents were evenly distributed across

the two states, so I performed one-to-one matching with replacement. Tables 7 and 8 in the

appendix report results from a series of balance tests across all election years for the Missouri


For the Ohio analysis, the procedure was slightly different. In the CPS data, a much larger

number of respondents lived in Ohio than in Kentucky or Indiana. This made it difficult to

find suitable matches for the Ohio residents. To solve this problem, I instead estimated

the average treatment effect for the control, which allowed me to use the greater number of

Ohio residents to find better matches for the more limited number of Kentucky and Indiana

residents in the sample. With the Ohio data, I matched with replacement but matched 3

treated units to each control unit. Tables 9 and 10 in the appendix report results from the

balance tests for the Ohio data. To test whether levels of voter turnout were different across

the states, I calculated the observed proportion voting for each state in the matched data,

which are reported in Table 3. To determine whether the results were statistically significant,

I used two different methods. First, I used the Ambadie-Imbens standard error to perform

the usual difference in proportions test. I also calculated the standard Pearson χ2 test for the

two-by-two table of outcomes. Both methods produced identical inferences. Later, I report


results from McNemar’s test in a sensitivity analysis. I next discuss the results from the

matching analysis.

We would expect that voter turnout should be higher in Missouri and Ohio. Yet this

result occurs at a statistically significant level for only three elections. Turnout was higher

in Missouri in 1994 and 2000, and it was higher in Ohio in 1994. The only other statistically

significant difference that I find was in 2006, when turnout in Kentucky and Indiana exceeded

that in Ohio. The general pattern in the data does little to provide support for the theory that

initiatives increase turnout. For example, in 2006 Ohio had several highly salient initiatives

on the ballot including an increase in the minimum wage, as well as proposals to ban smoking

in bars and restaurants and to allow slot machines, but I find that turnout was substantially

higher in Kentucky and Indiana than in Ohio that year. It also appears that the gay marriage

ban initiative in Ohio did little to create a difference between voting rates across the two

states in 2004. This is a surprising finding given that this initiative was widely credited with

helping George W. Bush defeat John Kerry through greater turnout. As is often the case,

accounting for differences across treatment and control groups leads to surprising differences.

In Missouri, there is also limited evidence that turnout may be higher due to initiatives. In

1994, turnout in Missouri was much higher than in Kansas. With three initiatives on the

ballot that year, one cannot infer which one may have spurred turnout. Interestingly, I find

a significant difference in 2000 as well. In that year, however, there were no initiatives on

the ballot in Missouri, which makes it difficult to attribute the difference to the presence of

an initiative. Out of seven election cycles, I find only two significant differences in turnout

that might be attributable to ballot initiatives.

A concern with adjustments via matching is that we might have failed to match on some

relevant covariate that is unobserved. A sensitivity analysis is a specific statement about the

magnitude of hidden bias that would need to be present to explain the associations actu-

ally observed (Rosenbaum 2005). For matching analyses, we can make a specific statement

about whether a unobserved confounder can explain the statistically significant differences

we observed in Table 3. In a sensitivity analysis for matching, the analyst manipulates the

Γ parameter which measures the degree of departure from random assignment of treatment.

Two subjects with the same observed characteristics may differ in the odds of receiving the


Table 3: The Effect of Ballot Initiatives On Voter Turnout: A Matched Analysis

Kansas Missouri Kentucky Ohio& Indiana

1994 36.7 56.1 40.5 51.11996 64.5 65.4 50.0 57.21998 54.1 50.5 47.6 46.92000 73.6 84.2 71.7 65.72002 55.4 60.4 53.2 49.52004 78.4 75.2 83.2 83.52006 57.2 58.6 66.7 53.3Note: Cell entries are estimated percentages of voterturnout after matching. For cell entries in bold thedifference in proportions is statistically significant.

treatment by at most a factor of Γ. In a randomized experiment, randomization of the

treatment ensures that Γ = 1. In an observational study, if Γ = 2, and two subjects are

identical on matched covariates, then one might be twice as likely as the other to receive the

treatment because they differ in terms of an unobserved covariate (Rosenbaum 2005). While

the true value of Γ is unknown, we can try several values of Γ and see if the conclusions of

the study change. Specifically, I calculate an upper bound on the p-value for each Γ value. If

this upper bound on the p-value exceeds the conventional 0.05 threshold, then we conclude

that a hidden confounder of that magnitude would explain the observed association. If the

study conclusions hold for higher Γ values this indicates that the study is fairly robust to

the presence of a hidden confounder. In Table 4, I present results from a sensitivity analysis

for the two years where we see significantly higher turnout levels in the treated states. The

analysis is based on Rosenbaum’s sensitivity test using McNemar’s statistic for paired data

(Rosenbaum 2002). In both cases, the results are fairly sensitive to the presence of hidden

bias. I find that for two individuals that are identical on observed covariates, our conclusions

change if one is 1.2 times as likely to receive the treatment due to some unobserved covari-

ate. That is if Γ = 1.20, then the p-value exceeds the usual 0.05 threshold. The result for

Ohio is particularly fragile, since the effect is no longer significant here when Γ is 1.10. So

while, I find some limited evidence that turnout was higher for some initiatives, the results

are sensitive to hidden bias due to an unmeasured confounder. I, next, anlayze the effect of


ballot initiatives on state fiscal policy.

Table 4: Sensitivity Analysis for Matched Estimates

Missouri 1994Γ Minimum p-value Maximum p-value1 0.010 0.0101.10 0.004 0.0471.20 0.000 0.1331.50 0.000 0.483

Ohio 1994Γ Minimum p-value Maximum p-value1 0.043 0.0431.10 0.006 0.1671.20 0.000 0.3881.50 0.000 0.927

6 State Finances

I now estimate the causal effect of ballot initiatives on state finances. A bevy of statis-

tical analyses have found an association between state revenues and expenditures and the

presence of the initiative process showing that both tend to be lower in states with direct

democracy (Matsusaka 1992, 1995, 2000, 2004). As I have argued previously, these estimates

are contaminated by posttreatment bias. While data on state revenues and expenditures

are readily available for the post-World War II era, these data are measured infrequently

between 1898 and 1920, when most states were enacting the initiative process. Due to these

data limitations, I conduct two separate analyses: one for the post-war period with synthetic

case control, and one with the historical data with a set of differences-in-differences models.

I start with the post-war analysis by presenting a set of naive estimates in Table 5, which

contains two panel data models of state revenues and expenditures for the period from 1970

to 2000. I model revenues and expenditures as a function of state income, federal aid, pop-

ulation, the percentage that live in an urban area, population growth, and indicators for

Southern, Western, and initiative states, which is the same specification used by Matsusaka

(2004). In both instances, we see that revenues and expenditures are lower in states with

direct democracy. Again, these estimates do not distinguish between pre- and posttreatment

time periods, and as such they are suspect.


Table 5: Naive Estimates of Initiatives Effect on Revenues and Expenditures 1970-2000

Revenues ExpenditureInitiative State −133.38 −147.26

(76.32) (70.86)

Income 0.11 0.11(0.01) (0.01)

Federal Aid 1.17 2.08(0.29) (0.32)

Log Population 48.12 39.60(73.75) (71.42)

Metro Population (%) 0.55 1.87(2.21) (1.97)

Population Growth 4.25 10.54(5.90) (6.10)

Southern State −214.56 −212.35(99.27) (101.31)

Western State 182.89 167.29(91.25) (88.53)

N 1488 1488

R2 .99 .99Standard Errors Adjusted for ClusteringModels Include Year Fixed Effects (Omitted).


The post-war analysis is, again, limited to Florida and Wyoming, the only two states

to enact and use the initiative process since 1918. I again exclude Illinois and Mississippi

given their infrequent use of direct legislation. To construct the synthetic control unit, I use

measures of Federal grants-in-aid, state income, population, percentage of urban population,

the average NOMINATE score for the state’s two senators, and a measure of citizen ideology

from Berry et al. (1998). I condition on the same set of measures that Matsusaka (2004)

employed in the specification of his panel analysis and that I used in Table 5. In the analysis

of both revenues and expenditures, the variables are in real dollars per capita. The results

are presented in Figures 7 and 8.

1970 1980 1990 2000









FloridaSynthetic Control

1980 1985 1990 1995 2000









WyomingSynthetic Control

Figure 7: Effects of Ballot Initiatives on State Expenditures, Florida and Wyoming

Figure 7 contains the results for expenditures. For Florida, we observe that while expen-

ditures were lower in the posttreatment period than the synthetic control unit, they were

also lower in the pretreatment period. Therefore, while expenditures are lower it is difficult

to ascribe that difference to a ballot initiative treatment effect since Florida appears to have

had lower expenditures compared to other states before it had the initiative process. For

Wyoming, expenditures are also lower in the posttreatment period but again the change oc-


1970 1980 1990 2000










FloridaSynthetic Control

1980 1985 1990 1995 2000









WyomingSynthetic Control

Figure 8: Effects of Ballot Initiatives on State Revenues, Florida and Wyoming

1970 1980 1990 2000





Year (1970−2006)





1980 1985 1990 1995 2000






Year (1980−2006)





Figure 9: Placebo Test for Effects of Ballot Initiatives on State Expenditures, Florida andWyoming


1970 1980 1990 2000





Year (1970−2006)





1980 1985 1990 1995 2000






Year (1980−2006)T

ax R



Figure 10: Placebo Test for Effects of Ballot Initiatives on State Revenues, Florida and Wyoming

curs in the pretreatment period. Moreover, the fit in the pretreatment period for Wyoming is

poor, thus there doesn’t appear to by any weighted average of states that closely approximate

Wyoming. We observe the same pattern for revenues in Figure 8. While there are differences

between Florida and Wyoming and the synthetic case control units, these differences occur

during the pretreatment period. In both instances, we observe lower levels of expenditures

and spending in Florida and Wyoming, but the timing of the differences makes it unlikely

that these differences are attributable to initiatives since these difference existed before direct

legislation. I next turn to inference for these two analyses.

Figures 9 and 10 contains plots of the estimates for Florida and Wyoming along with

the placebo estimates, estimates when control states are used as treated states. For Florida,

we observe the same pattern as before: revenues and expenditures tend to be lower than

the placebo estimates in both the pre- and posttreatment periods. Therefore, we cannot

ascribe the difference in Florida to a ballot initiative treatment effect. The same is true for

Wyoming on the revenue side, but for expenditures Wyoming tends to have a higher level of

spending than the placebo estimates. Regardless, the timing makes it difficult to attribute


these differences to a ballot initiative treatment effect.

The evidence from this analysis directly contradicts other analyses, where ballot initia-

tives are found to lower both revenues and expenditures (Matsusaka 2004). My analysis

underscores why one might find such an effect with standard regression analyses based on

posttreatment data. The regression models will detect the average difference across states,

but such an analysis does not allow one to rule out that the outcomes were different before the

treatment. Here, revenues and expenditures may be lower in initiative states, but one cannot

credit those differences to initiatives when the outcomes differ in the pretreatment period. In

my analysis, while I find differences between Florida and Wyoming and the synthetic control

units, these differences do not correspond to the timing of initiative use. I now turn to an

analysis of the historical data on state fiscal policy.

As mentioned previously, data on state finances for the populist/progressive era is not

as plentiful. The Census Bureau did conduct a census of state and local finances in 1902,

1913 and 1932 and gathered data on state revenues and expenditures. The sparseness of the

data precludes the use of synthetic case control. Instead I use the differences-in-differences

model outlined earlier, which is a special case of the synthetic case control method (Abadie,

Diamond, and Hainmueller 2007). For the analysis, I focus on states that adopted the

initiative process after 1902, so that 1902 can serve as the pretreatment baseline. I conduct

two different analyses: one in which 1913 is the posttreatment outcome, and another in which

1932 is the posttreatment outcome. When 1913 is the posttreatment baseline, I designated

Arkansas, California, Colorado, Missouri, Michigan, Montana, Maine, Oregon, and Oklahoma

as the treated units since all of these states adopted the initiative process between 1902 and

1911. When 1932 is the posttreatment outcome, I add Nebraska, Idaho, Massachusetts,

Nevada, Ohio, and Washington to the first group of states. All of these states, with the

exception of Massachusetts, adopted the initiative process in 1912, and 1913 is too soon to

expect any difference due to direct legislation, especially since it typically took two years

before an initiative appeared on the ballot. South Dakota is excluded from the control group

since the initiative process was adopted in 1898, and as such no pretreatment baseline exists.

I conducted these analyses for both state revenues and expenditures.

Before presenting the results from the DID model, one additional complication must be


addressed. The standard errors in DID models estimated with OLS rely on the fairly strong

assumptions of homoskedasticity, normal errors, and independence between observations.

These assumptions are often violated in DID models (Bertrand, Duflo, and Mullainathan

2002; Cameron and Trivedi 2005), and it is possible that we might violate all three assump-

tions with the aggregate state-level data used in this analysis. One possible solution is to use

the standard Huber-White robust standard errors. More to the point, this is population level

data and one might argue that classical statistical inference is nonsensical with such data.

Another solution is to develop a placebo test similar to the one used with the synthetic case

control analysis. The placebo test that I develop here is suggested by Rosenbaum (2002)

and Bertrand, Duflo, and Mullainathan (2002) who develop a different placebo test for DID


The goal is to test hypotheses about the parameter β3 from Equation 9, which is the

estimated change in revenues and expenditures for states with initiatives. I exploit the fact

that under the null of no effect, units in the treatment and control groups are assumed to

be statistically the same. One can then generate a set of placebo interventions and estimate

the placebo effect. Repeated estimation of the placebo effect provides a distribution for the

treatment effect under the null of no effect. If the original estimate for the treatment effect

lies in this 95th percentiles of the placebo distribution, we will be unable to reject the null

that the treatment has no effect.

To construct the placebo test distribution, I randomly selected states from the control

group and designated these states as the treatment group.5 I then estimated the DID treat-

ment effect for these placebo states compared to the states that remain in the control group.

I repeated this process 200 times saving the placebo estimate each time. To form a 95%

confidence interval for the null of no treatment effect, I used the .025 and .0975 percentile

values from the placebo distribution of estimates. If the estimated treatment effect from

Equation 9 falls in this interval, I will be unable to reject the null of no treatment effect. The

results are in Table 6.6

The evidence from the DID model indicates that ballot initiatives did not affect state fiscal

behavior. For all four models, the point estimates are not estimated with enough precision to

5The number of states from the control group depends on which year serves as the posttreatment outcome.6The data are in constant 1902 dollars using the CPI as the deflator.


Table 6: Differences-in-Differences Estimates of Initiatives Treatment Effect

1913 1932Expenditures Revenue Expenditures Revenue

Treatment Effect 145.03 −159.12 −3245.87 −7144.51(1089.54) (1319.66) (6968.77) (8778.36)

Placebo Interval [-2802, 2530] [-4545, 3491] [-21998, 18894] [-26921, 24069]N 94 94 94 94

Note: Standard errors are Huber-White robust standard errors adjusted for clustering.

reach standard levels of statistical significance. In fact, in every instance the standard errors

exceed the point estimates by a large margin. In all four models, the estimated treatment

effect falls within the 95% interval from the placebo distribution. Moreover, three of the

estimates are negative indicating that states with direct legislation had lower revenues and

expenditures after they adopted the initiative process. I should also note that three of the

estimates are oppositely signed from the estimates in Matsusaka (2004) using the same data.

For voter turnout, I found some limited evidence that initiatives may have had spillover

effects. For state fiscal policy that is not the case. The evidence presented in the extant

literature appears to be an artifice of estimates based on posttreatment data.

7 Discussion and Conclusions

The initiative process is undoubtedly an important part of the American political landscape.

For example, the adoption of term limits was done almost entirely through direct legislation.

The results from this study, however, must challenge much of the current research on how

initiatives alter political outcomes. Past research has suggested that the mere presence of an

initiative process greatly alters both how citizens and lawmakers behave. The evidence in this

study suggests that the effects of initiatives are far more subtle. First and foremost, the effect

of initiatives appears to vary across states. While initiatives appear to have boosted turnout

in Wyoming, there is little evidence of an increase in turnout for other states. Again the

treatment effects metaphor is useful. One must assume heterogeneous treatment effects since

the difficulty of placing initiatives on the ballot varies and given variability in state political

behavior. In general, future research should attempt to understand how particular initiatives


affect outcomes as opposed to attempting to make sweeping conclusions about the effects of

all initiatives. Greater care needs to be taken to ensure that appropriate counterfactuals are

used in the analysis of state level interventions.

The findings also speak more broadly to democratic politics. The evidence here suggest

that voters are not more widely motivated to vote when the issues on the ballot might directly

affect their lives. This suggests that the barriers to voting are not about incentives but instead

about the costs of voting and the cognitive skills and resources need for participation. The

evidence, here, also suggests that while voters may use initiatives to alter specific policies,

state legislators do not alter the general fiscal behavior of the state just because the initiative

is available. Thus if voters wish to alter fiscal policy more generally, this will have to be done

through either changing the composition of the legislature or specific initiatives.



Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2007. “Synthetic Control Meth-ods for Comparative Case Studies: Estimating the Effect of California’s Tobacco ControlProgram.” NBER Technical Working Paper 335.

Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A CaseStudy of the Basque Country.” American Economic Review 93 (March): 112-132.

Achen, Christopher, and W. Phillips Shively. 1995. Cross-Level Inference. Chicago: Univer-sity of Chicago Press.

Berry, William D., Evan Ringquist, Richard C. Fording, and Russell L. Hanson. 1998. “Mea-suring Citizen and Government Ideology in the American States.” American Journal ofPolitical Science 42 (January): 327-348.

Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. 2002. How Much Should WeTrust Differences-in-Differences Estimates? Technical Report 8841 National Bureau ofEconomic Research.

Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applica-tions. New York, NY: Cambridge University Press.

Downs, Anthony. 1957. An Economic Theory of Democracy. New York: Harper and Row.

Gerber, Elisabeth R. 1996. “Legislative Response to the Threat of Popular Initiatives.”American Journal of Political Science 40 (February): 99-129.

Gerber, Elisabeth R., Arthur Lupia, Mathew D. McCubbins, and D. Roderick Kiewiet. 2001.Stealing the Initiative: How State Government Responds to Direct Democracy. UpperSaddle River, NJ: Prentice-Hall.

Goodman, Leo. 1959. “Some Alternatives to Ecological Correlation.” American Journal ofSociology 64 (May): 610-625.

Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American StatisticalAssociation 81: 945-960.

Lupia, Arthur, and John G. Matsusaka. 2004. “Direct Democracy: New Approaches to OldQuestions.” Annual Review of Political Science 7: 463-82.

Matsusaka, John G. 1992. “Economics of Direct Legislation.” Quarterly Journal of Economics107 (May): 541-571.

Matsusaka, John G. 1995. “Fiscal Effects of The Voter Initiative: Evidence From The Last30 Years.” Journal of Political Economy 103 (June): 578-623.

Matsusaka, John G. 2000. “Fiscal Effects of The Voter Initiative in The First Half of the20th Century.” Journal of Law and Economics 43 (October): 619-648.

Matsusaka, John G. 2004. For The Many Or The Few: The Initiative, Public Policy, andAmerican Democracy. Chicago, IL: Chicago University Press.

Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal Inference:Methods and Principles for Social Research. New York, NY: Cambridge University Press.


Riker, William H., and Peter C. Ordeshook. 1968. “A Theory of the Calculus of Voting.”American Political Science Review 62 (March): 25-42.

Romer, Thomas, and Howard Rosenthal. 1979. “Bureaucrats Versus Voters: On the PoliticalEconomcy of Resource Allocation by Direct Democracy.” Quarterly Journal of Economics93 (November): 563-87.

Rosenbaum, Paul R. 2002. Observational Studies. 2nd ed. New York, NY: Springer.

Rosenbaum, Paul R. 2005. “Observational Study.” In Encyclopedia of Statistics in BehavioralScience, ed. Brian S. Everitt and David C. Howell. Vol. 3 John Wiley and Sons.

Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Non-randomized Studies.” Journal of Educational Psychology 6: 688-701.

Rubin, Donald B. 1978. “Bayesian Inference for Causal Effects: The Role of Randomization.”Annals of Statistics 6: 34-58.

Sekhon, Jasjeet S. 2007. “Multivariate and Propensity Score Matching Software with Auto-mated Balance Optimization: The Matching Package For R.” Journal of Statistical Soft-ware Forthcoming.

Sekhon, Jasjeet S. 2008. “The Neyman-Rubin Model of Casual Inference and Estimationvia Matching Methods.” In The Oxford Handbook of Political Methodology, ed. Janet Box-Steffensmeir, Henry E. Brady, and David Collier. Oxford Handbooks of Political ScienceOxford: Oxford University Press.

Sekhon, Jasjeet S., and Alexis Diamond. 2005. “Genetic Matching for Estimating CausalEffects.” Presented at the Annual Meeting of the Political Methodology, Tallahassee, FL.

Smith, Daniel A., and Caroline J. Tolbert. 2004. Educated By Initiative: The Effects ofDirect Democracy On Citizens And Political Organizations In The American States. AnnArbor, MI: University of Michigan Press.

Tolbert, Caroline J., and Daniel A. Smith. 2005. “The Educative Effects of Ballot Initiativeson Voter Turnout.” American Politics Research 33 (March): 283-309.

Tolbert, Caroline J., John A. Grummel, and Daniel A. Smith. 2001. “The Effects of BallotInitiatives on Voter Turnout In The American States.” American Politics Research 29(November): 625-648.

Weingast, Barry R., Kenneth A. Shepsle, and Christopher Johnson. 1981. “The PoliticalEconomy of Benefits And Costs: A Neoclassical Approach to Distributive Politics.” Journalof Political Economy 89 (August): 642-64.

Weingast, Barry R, and William J. Marshall. 1988. “The Industrial Organization of Congress;or, Why Legislatures, Like Firms, Are Not Organized as Markets.” Journal of PoliticalEconomy 96 (February): 132-63.

Whiteley, Paul F. 1995. “Rational Choice and Political Participation-Evaluating the Debate.”Political Research Quarterly 48 (March): 211-233.



Below is a list of statewide offices that may have increased turnout in the non-initiativestates of Kansas and Kentucky. In general, since these are not battleground states mostnational or statewide offices such as governor are not competitive.

Kentucky1994- House1995- Governor1996- President, Senate - Mitch McConnell vs. Steven Beshear1998- Senate - Jim Bunning vs Scotty Baesler 50 to 49.1999- Governor2000- President2002- Senate - Mirch McConnell vs. Lois Weinberg2004- President, Senate - Bunning vs. Mongiardo 50.7 to 49.3.2006- House Only.

Kansas1994- Gov 1996- President, Two Senate Races: Sally Thompson vs Pat Roberts 38 to 62. JillDocking vs. Sam Brownback 46 to 54.1998- Senate - Brownback vs Paul Feleciani Jr, 65 to 35.2000- President2002- Senate, Pat Roberts Unnopposed Gov 2004- President, Senate Brownback 69%.2006 Governor


Table 7: Balance Tests for Missouri

Before Matching After MatchingStandardized p-valuea Standardized p-valuea

Bias Bias1994Education −44.99 0.000 -1.42 0.847Sex 1.21 0.901 -1.21 0.828Black −20.12 0.067 2.53 0.655Weekly Earnings −7.66 0.976 0.80 0.999Employment Status −13.67 0.093 0.28 0.822Union Member 8.89 0.309 0 1Age −12.76 0.278 -1.07 0.941Length of Residence 3.94 0.569 -0.43 0.8301996Education −15.73 0.000 -0.39 0.283Sex 6.09 0.512 -1.89 0.825Black −4.71 0.622 3.46 0.706Weekly Earnings −35.65 0.031 0.89 0.599Employment Status −4.72 0.730 0.473 0.977Union Member 7.62 0.364 0 1Age −10.51 0.76 -1.07 0.893Length of Residence −2.61 0.357 -0.59 0.8471998Education −28.32 0.008 -0.93 0.879Sex 11.11 0.234 0 1Black 3.49 0.700 0 1Weekly Earnings −10.14 0.230 -0.396 0.992Employment Status 12.39 0.100 0.983 0.971Union Member −8.16 0.347 0 1Age 25.45 0.004 -0.66 0.971Length of Residence −24.41 0.002 -0.38 0.634a p-value is from t-test or bootstrapped KS test with 1000 bootstrap resamples.


Table 8: Balance Tests for Missouri Continued

Before Matching After MatchingStandardized p-valuea Standardized p-valuea

Bias Bias2000Education −18.32 0.013 -0.18 0.940Sex −6.99 0.464 0 1Black −4.05 0.678 0 1Weekly Earnings −13.05 0.440 0.16 0.892Employment Status 1.19 0.472 0.559 0.869Union Member 8.23 0.336 0 1Age 19.84 0.077 -0.31 0.907Length of Residence 33.06 0.000 -0.819 0.3892002Education −12.61 0.099 -0.15 0.886Sex 2.79 0.712 0 1Black 0.752 0.920 0 1Weekly Earnings −28.21 0.021 0.044 0.876Employment Status 2.92 0.780 0.15 0.992Union Member −18.23 0.015 0.88 0.991Age 0.51 0.723 -0.43 0.697Length of Residence 29.55 0.000 -0.268 0.8982004Education −41.55 0.000 -0.488 0.976Sex −7.51 0.344 0.76 0.809Black −14.39 0.088 0 1Weekly Earnings −7.75 0.622 0.49 0.614Employment Status 5.49 0.194 -0.17 0.963Union Member 7.41 0.547 0.77 0.966Age 12.46 0.023 -0.95 0.726Length of Residence −7.87 0.209 -1.07 0.7982006Education −35.38 0.000 -2.20 0.613Sex −8.14 0.320 0 1Black 12.66 0.095 0 1Weekly Earnings 7.78 0.180 0.934 0.739Employment Status −3.80 0.483 0.517 0.928Union Member 8.21 0.273 0 1Age 7.65 0.456 3.71 0.433Length of Residence 14.12 0.009 -1.74 0.847a p-value is from t-test or bootstrapped KS test with 1000 bootstrap resamples.


Table 9: Balance Tests for Ohio

Before Matching After MatchingStandardized p-valuea Standardized p-valuea

Bias Bias1994Education 32.41 0.002 2.77 0.403Sex 12.11 0.31 0 1Black 26.89 4.47 0.000 0.739Weekly Earnings 22.87 0.096 -0.43 0.769Employment Status 9.33 0.558 -2.17 0.924Union Member −1.87 0.879 -4.77 0.715Age −3.35 0.663 -2.81 0.484Length of Residence 2.53 0.087 0.46 0.9991996Education 28.10 0.033 2.42 0.984Sex −5.39 0.639 0.89 0.942Black 30.98 0.000 0 1Weekly Earnings −7.58 0.870 -4.83 0.925Employment Status −25.09 0.015 -0.29 0.952Union Member 8.39 0.335 0 1Age −8.29 0.334 -4.13 0.759Length of Residence 0.99 0.109 3.29 0.4681998Education 16.99 0.037 -0.77 0.285Sex 13.88 0.243 2.99 0.825Black 37.59 0.000 0 1Weekly Earnings −11.47 0.107 -0.17 0.31Employment Status 14.31 0.094 1.25 0.191Union Member −14.24 0.129 -0.86 0.99Age 6.17 0.399 -0.79 0.637Length of Residence −19.18 0.035 0.72 0.2492000Education 27.44 0.028 5.81 0.573Sex 17.13 0.145 3.46 0.384Black 27.34 0.000 0 1Weekly Earnings −3.60 0.282 -3.05 0.576Employment Status 4.29 0.371 0 0.856Union Member −8.79 0.55 -10.20 0.317Age 0.51 0.397 3.11 0.265Length of Residence 0.35 0.516 0.28 0.828a p-value is from t-test or bootstrapped KS test with 1000 bootstrap resamples.


Table 10: Balance Tests for Ohio Continued

Before Matching After MatchingStandardized p-valuea Standardized p-valuea

Bias Bias2002Education 27.86 0.004 -0.74 0.905Sex 3.43 0.76 1.42 0.91Black 28.89 0.000 0 1Weekly Earnings 23.76 0.031 -0.32 0.815Employment Status −1.42 0.880 -0.61 0.968Union Member 24.06 0.030 0 1Age −6.77 0.811 -1.27 0.542Length of Residence −10.91 0.435 3.18 0.8202004Education 5.61 0.659 1.04 0.510Sex −2.78 0.800 0 1Black 27.45 0.000 0 1Weekly Earnings 6.67 0.236 -0.68 0.691Employment Status 12.55 0.160 0.43 0.941Union Member 17.95 0.140 0 1Age 9.71 0.221 0.21 0.903Length of Residence −4.71 0.897 -0.11 0.942006Education −5.84 0.651 5.90 0.502Sex 1.29 0.904 -0.92 0.898Black 32.82 0.000 6.76 0.445Weekly Earnings −14.65 0.553 -2.34 0.685Employment Status 19.74 0.020 -1.11 0.904Union Member 0.24 0.829 0 1Age 20.05 0.106 -3.99 0.230Length of Residence 16.29 0.347 2.77 0.539a p-value is from t-test or bootstrapped KS test with 1000 bootstrap resamples.