comparing and combining list and endorsement experiments ... · discussion about the design of list...

21
Comparing and Combining List and Endorsement Experiments: Evidence from Afghanistan Graeme Blair Princeton University Kosuke Imai Princeton University Jason Lyall Yale University List and endorsement experiments are becoming increasingly popular among social scientists as indirect survey techniques for sensitive questions. When studying issues such as racial prejudice and support for militant groups, these survey methodologies may improve the validity of measurements by reducing nonresponse and social desirability biases. We develop a statistical test and multivariate regression models for comparing and combining the results from list and endorsement experiments. We demonstrate that when carefully designed and analyzed, the two survey experiments can produce substantively similar empirical findings. Such agreement is shown to be possible even when these experiments are applied to one of the most challenging research environments: contemporary Afghanistan. We find that both experiments uncover similar patterns of support for the International Security Assistance Force (ISAF) among Pashtun respondents. Our findings suggest that multiple measurement strategies can enhance the credibility of empirical conclusions. Open-source software is available for implementing the proposed methods. E liciting truthful responses to sensitive survey ques- tions is one of the major methodological chal- lenges in modern social science research. Asking direct questions about such issues as racial prejudice and support for militant groups can lead to significant biases due to dishonest responses and refusals. Even seemingly innocuous questions about turnout in an election are known to result in unreliable answers due to social de- sirability bias. In some research environments, notably conflict settings, reliance on direct questions may even endanger enumerators and respondents in addition to yielding inaccurate measurements. In recent years, list and endorsement experiments have become increasingly popular as survey methodol- Graeme Blair is a PhD candidate, Department of Politics, Princeton University, Princeton, NJ 08544 ([email protected]). Kosuke Imai is Professor, Department of Politics, Princeton University, Princeton, NJ 08544 ([email protected]). Jason Lyall is Associate Professor of Political Science, Department of Political Science, Yale University, 115 Prospect Street, New Haven, CT 06520 ([email protected]). The proposed methods can be implemented via an R package, “list: Statistical Methods for the Item Count Technique and List Experiment” (Blair and Imai 2011), which is freely available at the Comprehensive R Archive Network (http://cran.r-project.org/package=list). Repli- cation data are available at http://hdl.handle.net/1902.1/21243. Financial support for the survey from Yale’s Institute for Social and Policy Studies Field Experiment Initiative and the Macmillan Center for International and Area Studies is gratefully acknowledged. Additional support from the Air Force Office of Scientific Research (Lyall; Grant FA9550-09-1-0314) and the National Science Foundation (Imai; Grant SES–0849715) is also acknowledged. This research was approved by Yale’s Human Subjects Committee under IRB protocol #1006006952. We thank Yuki Shiraito for his methodological advice and Aila Matanock, Justin Phillips, and the seminar participants at Kyusyu University, Princeton University, the University of Sydney and the University of Michigan for helpful comments. The editor and three anonymous reviewers provided useful suggestions. 1 Another popular survey methodology is the randomized response technique (Warner 1965), which is not studied here (see also Gingerich 2010). ogy to overcome this measurement problem (see, e.g., Blair and Imai 2012; Bullock, Imai, and Shapiro 2011; Corstange 2009; Glynn 2013; Gonzalez-Ocantos et al. 2012; Holbrook and Krosnick 2010; Tsuchiya, Hirai, and Ono 2007). 1 These survey experiments represent indi- rect questioning techniques in which the individual re- sponses to sensitive questions are not directly revealed. List experiments, also known as the item count tech- nique, use aggregation: respondents are asked to count the number of items on a list that includes a sensitive item (Kuklinski, Cobb, and Gilens 1997; Miller, 1984; Raghavarao and Federer 1979). By contrast, endorsement experiments rely upon subtle cues: respondents are asked to rate their support for policies endorsed by socially American Journal of Political Science, Vol. 58, No. 4, October 2014, Pp. 1043–1063 C 2014, Midwest Political Science Association DOI: 10.1111/ajps.12086 1043

Upload: others

Post on 15-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

Comparing and Combining List and EndorsementExperiments: Evidence from Afghanistan

Graeme Blair Princeton UniversityKosuke Imai Princeton UniversityJason Lyall Yale University

List and endorsement experiments are becoming increasingly popular among social scientists as indirect survey techniques forsensitive questions. When studying issues such as racial prejudice and support for militant groups, these survey methodologiesmay improve the validity of measurements by reducing nonresponse and social desirability biases. We develop a statisticaltest and multivariate regression models for comparing and combining the results from list and endorsement experiments.We demonstrate that when carefully designed and analyzed, the two survey experiments can produce substantively similarempirical findings. Such agreement is shown to be possible even when these experiments are applied to one of the mostchallenging research environments: contemporary Afghanistan. We find that both experiments uncover similar patternsof support for the International Security Assistance Force (ISAF) among Pashtun respondents. Our findings suggest thatmultiple measurement strategies can enhance the credibility of empirical conclusions. Open-source software is available forimplementing the proposed methods.

Eliciting truthful responses to sensitive survey ques-tions is one of the major methodological chal-lenges in modern social science research. Asking

direct questions about such issues as racial prejudice andsupport for militant groups can lead to significant biasesdue to dishonest responses and refusals. Even seeminglyinnocuous questions about turnout in an election areknown to result in unreliable answers due to social de-sirability bias. In some research environments, notablyconflict settings, reliance on direct questions may evenendanger enumerators and respondents in addition toyielding inaccurate measurements.

In recent years, list and endorsement experimentshave become increasingly popular as survey methodol-

Graeme Blair is a PhD candidate, Department of Politics, Princeton University, Princeton, NJ 08544 ([email protected]). Kosuke Imaiis Professor, Department of Politics, Princeton University, Princeton, NJ 08544 ([email protected]). Jason Lyall is Associate Professorof Political Science, Department of Political Science, Yale University, 115 Prospect Street, New Haven, CT 06520 ([email protected]).

The proposed methods can be implemented via an R package, “list: Statistical Methods for the Item Count Technique and List Experiment”(Blair and Imai 2011), which is freely available at the Comprehensive R Archive Network (http://cran.r-project.org/package=list). Repli-cation data are available at http://hdl.handle.net/1902.1/21243. Financial support for the survey from Yale’s Institute for Social and PolicyStudies Field Experiment Initiative and the Macmillan Center for International and Area Studies is gratefully acknowledged. Additionalsupport from the Air Force Office of Scientific Research (Lyall; Grant FA9550-09-1-0314) and the National Science Foundation (Imai; GrantSES–0849715) is also acknowledged. This research was approved by Yale’s Human Subjects Committee under IRB protocol #1006006952.We thank Yuki Shiraito for his methodological advice and Aila Matanock, Justin Phillips, and the seminar participants at Kyusyu University,Princeton University, the University of Sydney and the University of Michigan for helpful comments. The editor and three anonymousreviewers provided useful suggestions.

1Another popular survey methodology is the randomized response technique (Warner 1965), which is not studied here (see also Gingerich2010).

ogy to overcome this measurement problem (see, e.g.,Blair and Imai 2012; Bullock, Imai, and Shapiro 2011;Corstange 2009; Glynn 2013; Gonzalez-Ocantos et al.2012; Holbrook and Krosnick 2010; Tsuchiya, Hirai, andOno 2007).1 These survey experiments represent indi-rect questioning techniques in which the individual re-sponses to sensitive questions are not directly revealed.List experiments, also known as the item count tech-nique, use aggregation: respondents are asked to countthe number of items on a list that includes a sensitiveitem (Kuklinski, Cobb, and Gilens 1997; Miller, 1984;Raghavarao and Federer 1979). By contrast, endorsementexperiments rely upon subtle cues: respondents are askedto rate their support for policies endorsed by socially

American Journal of Political Science, Vol. 58, No. 4, October 2014, Pp. 1043–1063

C©2014, Midwest Political Science Association DOI: 10.1111/ajps.12086

1043

Page 2: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1044 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

sensitive actors (Blair et al. 2013; Bullock, Imai, andShapiro 2011; Lyall, Blair, and Imai 2013; Lyall, Imai,and Shiraito 2013). New statistical methods have beendeveloped so that researchers can conduct multivariateregression analysis for each of these survey techniques(Blair and Imai 2012; Bullock, Imai, and Shapiro 2011;Corstange 2009; Glynn 2013; Imai 2011).

In this article, we develop a statistical test and multi-variate regression models for comparing and combiningthe results from list and endorsement experiments. Oneof the fundamental principles of survey methodology isthe importance of validating measurements with multi-ple instruments. We argue that this principle is even morecrucial when measuring responses to sensitive questions.While they may reduce nonresponse and social desir-ability biases, indirect questioning techniques are oftensusceptible to measurement error and are affected by thedetails of implementation (e.g., Flavin and Keane 2010).In particular, the results of list and endorsement experi-ments may depend upon the choice of control items andpolicy questions, respectively, and sometimes yield sub-stantial design effects that complicate inference. Com-paring the results from list and endorsement experimentscan serve as an effective diagnostic tool and significantlyenhance the credibility of empirical findings.

We first develop a statistical test and a graphicalmethod for directly comparing the responses to list andendorsement experiments. The proposed methodologyoffers a simple way to examine whether these two surveyexperiments are measuring the same underlying concept.Next, we demonstrate how to compare list and endorse-ment experiments in the context of multivariate regres-sion analysis. To do this, we show that the models used forlist and endorsement experiments are implicitly linked bya latent level of support for the actor or issue in question.We then demonstrate how to estimate the proportionof supporters with data from either survey experimenttechnique. This enables researchers to investigate whethertwo survey experiments uncover similar relationships be-tween respondents’ characteristics and their support levelfor the sensitive actor or issue.

Finally, while indirect questioning techniques mayreduce bias in response, by construction they elicit lessinformation than direct questioning and thus tend toresult in inefficient estimates. We demonstrate how topartially recoup this loss of efficiency by combining thedifferent measurements from both list and endorsementexperiments. The method is based on the idea that thesame underlying quantity can be measured by these twosurvey experiment techniques to produce more preciseestimates under a single statistical model.

We apply the proposed methods to list and endorse-ment experiments conducted in an extremely challenging

environment, namely, contemporary Afghanistan (Lyall,Blair, and Imai 2013). Specifically, we motivate this articleby tackling the substantively important issue of measuringsupport for the International Security Assistance Force(ISAF), the NATO-led mission in Afghanistan currentlyembroiled in a decade-long effort to create a stable Afghangovernment while defeating an entrenched insurgency.

This task is made difficult for several reasons. First,we are attempting to measure support for an organizationthat, while spending billions in an attempt to sway “heartsand minds,” is inherently viewed as an occupying army inthe eyes of many Afghans. This situation greatly compli-cates efforts to measure civilian attitudes since there aresuch clear incentives to either dissemble in an attempt tocontinue to receive ISAF-distributed assistance or, con-versely, to suppress pro-ISAF sentiment to avoid riskingbacklash from neighbors or insurgent organizations. Sec-ond, we conducted the survey within Pashtun-dominatedprovinces and districts, the very heart of support for theTaliban. Third, and perhaps unsurprisingly, these areasare highly violent, creating logistical issues while rais-ing concerns about both enumerator and respondentsafety. Finally, for cultural reasons, respondents were in-terviewed publicly, creating additional incentive to answerquestions instrumentally. Taken together, these issues allpoint to the pressing need to embrace innovative mea-surement strategies to minimize biases and to seek moreaccurate estimates across multiple survey instruments.

While our empirical example is support for ISAF inAfghanistan, the proposed diagnostic tests and statisti-cal models have widespread applicability across multi-ple issue areas and subfields. The measurement of prej-udice toward minorities and immigrants, for example, isone obvious issue where combining different experimen-tal approaches could yield important insights. Similarly,tracking perceptions of corruption across different polit-ical actors, parties, or government agencies with multipleexperiments could improve estimates of a notoriouslydifficult concept to measure. A similar strategy can beapplied to the challenges of measuring partisan and co-ethnic bias in the study of voting behavior and collectivegoods provision. Clearly, these examples are not exhaus-tive. Instead, they highlight the wide applicability of ourproposed methodology.

The rest of the article is organized as follows. Inthe next section, we briefly describe list and endorse-ment experiments as survey methodologies for elicit-ing truthful answers to sensitive questions. We also ex-plain how these survey experiments were conductedin Afghanistan. In the third section, we propose newmethodologies for comparing and combining the resultsfrom list and endorsement experiments. In particular, wefirst describe a statistical test and its associated graphical

Page 3: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1045

method for assessing the compatibility of the two surveymeasurements. We then develop a multivariate statisti-cal model for combining them. We present the results ofour multivariate analysis in the fourth section. Finally,we conclude with practical suggestions for applied re-searchers.

List and Endorsement Experiments

In this section, we briefly explain list and endorsementexperiments using our Afghanistan application as therunning example. We demonstrate that these two surveymethodologies can be designed to measure the same un-derlying concept—here, support for ISAF. For additionaldiscussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) andBullock, Imai, and Shapiro (2011), respectively. Detailsabout our survey experiment are provided in Lyall, Blair,and Imai (2013).

The List Experiment

The standard design for list experiments randomizes asample of respondents into two groups where a list ofseveral control items is presented to one group (the “con-trol” group) and a list of the same control items plus onesensitive item of interest is read to the other group (the“treatment” group). Respondents are then asked to countthe number of items on their list that fit certain criteria.In our Afghanistan application, we asked the followingquestion to the control group:

I’m going to read you a listwith the names of differentgroups and individuals on it.After I read the entire list,I’d like you to tell me how manyof these groups and individualsyou broadly support, meaningthat you generally agree withthe goals and policies of thegroup or individual. Please don’ttell me which ones you generallyagree with; only tell me how manygroups or individuals you broadlysupport.Karzai Government; NationalSolidarity Program; Local Farmers

For the treatment group, the same question was readexcept that the list contained an additional sensitive itemreferring to ISAF:

Karzai Government; NationalSolidarity Program; Local Farmers;Foreign Forces

The core idea behind list experiments is that re-spondents do not directly answer the sensitive question.Rather, they only need to provide enumerators with theaggregate count of items on a list, which also containscontrol items. This protection of privacy is designed toincrease respondents’ willingness to provide truthful an-swers to sensitive survey questions. In the Afghan applica-tion, the list experiment was enthusiastically embraced byour enumerators because it shielded the question’s intent,thus reducing concern about asking a sensitive questionin villages that were either hotly contested or principallycontrolled by the Taliban.2

As a result, no respondent in our survey refused toanswer the question or chose “Don’t Know,” even thoughboth were presented as options. This compares favor-ably with ISAF’s own survey effort, the Afghan NationalQuarterly Assessment Report (ANQAR), which relies ondirect questions to assess attitudes on a range of sensitivetopics, including corruption and support of the Afghangovernment, ISAF, and insurgent groups. In a recent AN-QAR wave conducted in November–December 2011, forexample, nearly 50% of respondents refused to partici-pate when approached by an enumerator.3 An additionalsubset of individuals who participated was later droppedfrom the results because they had refused to answer or re-sponded “Don’t Know” too many times (the exact thresh-old for removal is classified). Given these refusal rates,along with a reliance on direct questions asked in publicsettings, it is likely that these responses are shaded by so-cial desirability bias and outright preference falsification.

However, list experiments also have a known majorlimitation: respondents in the treatment group will re-veal their response to the sensitive item if they chooseeither all items or none. That is, if a respondent answers“support all” (“support none”), then we know that he orshe supports (does not support) ISAF. As a consequence,respondents may avoid choosing these extreme answersand provide dishonest responses. Such ceiling and flooreffects may be unavoidable, especially when the additional

2More generally, indirect questions can also reduce selection biasin choice of sampling locations. Indirect questions may fly underthe radar of village gatekeepers, for example, opening up moresites for potential sampling. In addition, indirect questions reduceincentives for enumerators to falsify data, a practice that can arisewhen a selected site is too dangerous to ask direct questions safely.

3The combined refusal to participate and noncontact rate for thisANQAR wave (Wave 14) in our five provinces was as follows:Helmand (52%), Khost (58%), Kunar (14%), Logar (24%), andUruzgan (79%).

Page 4: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1046 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

TABLE 1 Observed Data from the ListExperiment to Measure Support forISAF

Control Group ISAF Treatment GroupResponseValue Frequency Proportion (%) Frequency Proportion (%)

0 188 20.5 174 19.01 265 28.9 278 30.32 265 28.9 260 28.33 200 21.8 182 19.84 24 2.6Total 918 918

Notes: The table displays the number of respondents for each valueof the observed response and its proportions, separately for thecontrol group and treatment group. The proportions do not sumto 100% due to rounding.

item of interest is highly sensitive. Furthermore, the dif-ficulty of ceiling and floor effects is that they cannot beeasily detected and require an additional assumption forstatistical adjustment.

Table 1 provides the summary of responses from theAfghan list experiment. We first conduct a statistical testfor design effects (Blair and Imai 2012), which gives thep-value of 1. While this result suggests that there is noclear evidence for design effects, the lack of statistical sig-nificance does not necessarily imply the absence of designeffects. In particular, the test has no or little statisticalpower for certain design effect magnitudes and propor-tions of sensitive item supporters. Given the sensitivityof the question and the public nature of the interview,it is important to use another measure for validating theresults based on this list experiment.

In sum, while list experiments provide one meansof indirectly asking sensitive survey questions, we needexternal validation in order to know whether the result-ing measurements are reliable. We propose to conductsuch validation by comparing the results of list exper-iments with those of endorsement experiments. Beforedescribing how to compare the two survey experiments,we briefly explain endorsement experiments and theirapplication in Afghanistan.

The Endorsement Experiment

Endorsement experiments offer an indirect way of ask-ing sensitive questions other than list experiments. Likelist experiments, a sample of respondents is randomly di-vided into two groups. In the control group, respondentsare asked to rate the level of their support for a particularpolicy. For those in the treatment group, the same ques-tion is asked, except that the policy is said to be endorsedby an actor of interest. The main idea is to take advantageof subtle cues induced by endorsements (or names) and

interpret the difference in responses between the treat-ment and control groups as evidence of support (or lackthereof) for this actor of interest. Typically, multiple poli-cies are selected so that the measurement does not rely ona single instrument and statistical power is increased byanalyzing them together.

In our Afghan context, four policies were selectedto measure support for ISAF relative to a control thatlacked a specific endorser. These policies were all pub-licly endorsed and advocated by ISAF across multiplemedia and were viewed as a state-building “bundle” de-signed to strengthen the Afghan government collectively.The selected policies include the direct election of districtcouncils, a practice enshrined in the Afghan constitu-tion but to date ignored; reform of overcrowded pris-ons, where squalid conditions are routinely denouncedby citizens and watchdog nongovernmental organiza-tions alike; reform of the Independent Election Commit-tee, which is tasked with ensuring electoral transparency;and strengthening of the Office of Oversight for Anti-Corruption, which leads investigations into corruptionamong government and military officials.

We provide an example script from the prison reformquestion below:

Control Condition: A recent proposalcalls for the sweeping reformof the Afghan prison system,including the construction of newprisons in every district to helpalleviate overcrowding in existingfacilities. Though expensive, newprograms for inmates would alsobe offered, and new judges andprosecutors would be trained. Howdo you feel about this proposal?

Treatment Condition: A recentproposal by ISAF calls for thesweeping reform of the Afghanprison system, including theconstruction of new prisonsin every district to helpalleviate overcrowding in existingfacilities. Though expensive, newprograms for inmates would alsobe offered, and new judges andprosecutors would be trained. Howdo you feel about this proposal?

Respondents were asked to indicate their level of sup-port for this proposal (and all others) on a 5-point scale:strongly agree, somewhat agree, indifferent, somewhat dis-agree, and strongly disagree. As in the list experiment, they

Page 5: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1047

also had the options of “Refuse to Answer” and “Don’tKnow.”

Two points bear emphasis. First, our proposed sta-tistical models (detailed below) pool responses to thesefour questions for estimating ISAF support. This method-ology assumes, however, that these policies are actuallycomparable and occupy the same policy domain. We be-lieve that this is the case. All four proposals are domesticpublic policy programs that center on the Afghan gov-ernment’s ability to address the rampant inefficienciesthat currently undermine its legitimacy. It is possible toempirically test the validity of this assumption. Indeed,our analysis reveals that a fifth endorsement question—one asking about support for ISAF’s withdrawal in 2014—clearly occupied a different (foreign) policy domain whencompared to the other four policies.4

Second, we also assume that these policies providean opportunity to measure an individual’s support fora combatant. Indeed, these policies have been explic-itly endorsed by all combatants in the current war inAfghanistan. While seemingly technocratic, the questionsalso tap into latent support for different combatants sincethey address wider questions about the goals of the war-ring parties and the nature of the current Afghan state—and the one that will emerge from the war. The impor-tance of these issues, as well as the widespread nature ofeach combatant’s endorsement of them, is underscored bythe low rate of refusal and “Don’t Know” response eachquestion obtained.5 Finally, qualitative evidence from ini-tial focus groups and enumerator debriefings suggeststhat respondents treated the ISAF endorsement as equallycredible as the control endorsement. All told, this evidenceindicates that these questions are comparable and usefulvehicles for measuring our latent trait of interest amongrespondents.

Figure 1 presents the overall and province-by-province distributions of responses from the endorsementexperiment in comparison with those from the list exper-iment (left column). While there were no “Don’t Know”(DK) and “Refuse to Answer” responses for the list exper-iment, the endorsement experiment encountered some

4When fitting the model with all five questions, the discrimina-tion parameter, which represents the amount of information eachquestion contributes to the final estimate (see the section “A Sta-tistical Model for Endorsement Experiments” below for details), is1.38 for the ISAF withdrawal question. The same parameter takesthe values of 0.017, 0.014, 0.014, and 0.014 for the direct election,prison reform, Independent Election Commission, and anticor-ruption questions, respectively.

5The combined refusal and “Don’t Know” response for each ques-tion was 5.5%, 4.5%, 7.9%, and 3% for questions about directelections, prison reform, the Independent Election Commission,and corruption reform, respectively.

DKs and refusals, especially in two districts of Helmandprovince, where our enumerators ran into open fight-ing. This suggests that those policies themselves may besensitive in this area. However, the overall nonresponseand refusal rate is still very low (5.8%) when comparedto direct questions used in other surveys. In addition,we observe substantial spatial variation of endorsementeffects across provinces, whereas the pattern is less clearin the list experiment. In Lyall, Blair, and Imai (2013),we present qualitative evidence that the spatial variationobserved in the endorsement experiments is largely con-sistent with conditions on the ground in each province.By comparison, variation across provinces appears to bemore subtle for the list experiment.

While these graphical methods are informative, sim-ply visually comparing the list and endorsement exper-iment results, as done in Figure 1, does not providerigorous evidence that these methods are providing com-parable measurements of support for ISAF. In the nextsection, we therefore propose new statistical methods tocompare and combine these results.

The Proposed Methodology

In this section, we begin by introducing a simple statis-tical test and an associated graphical tool for examiningthe compatibility of two survey measures. We then showhow to compare and combine list and endorsement ex-periments in a multivariate statistical analysis framework.The key insight is the fact that the same latent supportlevel variable can be used in the statistical models for bothtypes of experiments.

A Statistical Test and Graphical Method

Our analysis begins with a simple graphical method forexamining the agreement of measures based on our sur-vey experiments. This method is applicable when eachrespondent answers both the list and endorsement ques-tions under the same treatment condition, as was thecase in our survey. This requires that treatment ran-domization is conducted across respondents, not withineach individual, so that the same respondent is assignedto either the treatment or control group for both ex-periments.6 If interference between survey experimentsis a concern (e.g., Gaines, Kuklinski, and Quirk 2007;

6Technically, the same method can be applied to a subset of listand endorsement questions for which the treatment conditionis identical even when the treatment is randomized within each

Page 6: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1048 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

FIGURE 1 Overall and Province-by-Province Distributions of Responses from the List andEndorsement Experiments in Afghanistan

Control

ISAF

List Experiment

Direct Elections Prison ReformIndependent Election

CommissionAnticorruption

Reform

Control

ISAF

Control

ISAF

Control

ISAF

Control

ISAF

Control

ISAF

Endorsement Experiment

Ove

rall

(N =

1,8

36)

Hel

man

d(N

= 5

70)

Kho

st(N

= 4

20)

Kun

ar(N

= 2

64)

Loga

r(N

= 3

24)

Uro

zgan

(N =

258

)

0 items12

34

Stronglydisagree Disagree Indifferent Agree Strongly

agree Don't know Refused

Note: In the left column, the plot depicts the distribution of responses to the list experiment questions, whereas the remainder ofthe columns plots the distribution of responses to four policy questions for the endorsement experiment. The overall distributionis given in the top row, and the other rows present province-by-province distributions. There were no “Don’t Know” or “Refuse toAnswer” responses for the list experiment questions.

Transue, Lee, and Aldrich 2009), at a minimum oneshould randomize the order of the experiments. Weadopted this approach in our pretests but found littleevidence for the existence of interference. If researchersdecide to conduct list and endorsement experiments ontwo separate random samples of respondents to avoid in-terference, the method proposed here will not apply. Note,however, that the statistical method described in the nextsubsection (“A Statistical Method for Multivariate Anal-ysis”) can be employed even under these conditions.

respondent. However, this will reduce the statistical power of theproposed analysis.

Figure 2 applies the proposed graphical method toour data. Specifically, we plot each “treated” respondent’sanswer from the list experiment against his or her averagenumerical response from the endorsement experiment(right panel) and compare it with the same plot for thecontrol group (left panel). While converting the responsesto the endorsement experiment into, say, a 5-point scalemakes an unrealistic assumption that these categories areequally spaced, this simple graphical analysis can giveresearchers a rough idea about how the treatment canstrengthen the association between the two measures.All else equal, if a respondent supports ISAF, his or hernumerical response under the treatment condition for

Page 7: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1049

FIGURE 2 Comparison of Responses from List and EndorsementExperiments in the Afghanistan Survey

Note: For the endorsement experiment (horizontal axis), we plot the mean numerical response on a5-point scale across the four endorsement questions. For both experiments, a higher numerical responserepresents greater support for the International Security Assistance Force (ISAF). In the left panel, wepresent responses in the control group, and in the right panel, we present responses in the treatmentgroup for ISAF. The size of the open circles is proportional to the number of respondents who gave thecorresponding responses. In addition, the lowess curve is presented (dashed gray line). The Pearson’scorrelation and Kendall’s rank correlation between the two survey measures are represented by � and � ,respectively. The association between the two measurements is stronger under the treatment condition.We reject the null hypothesis of equality between the two correlation coefficients with the one-sidedp-value less than .001 for both Pearson’s and Kendall’s correlations.

both the list and endorsement experiments should begreater than under the control condition. This impliesthat the association between the two measures shouldbe greater for the treatment group than for the controlgroup.

The right (left) panel of Figure 2 plots the responsefrom the list experiment against the average numeri-cal response from the endorsement experiment amongthose who are in the ISAF treatment (control) group. Thesize of the open circle is proportional to the number ofrespondents, and the lowess curve is also plotted (dashedgray line). Indeed, the figure reveals that the two mea-sures have a much higher correlation (� = 0.52) underthe treatment condition than under the control condi-tion (� = 0.16). In addition to Pearson’s correlation co-efficient, we also compute Kendall’s rank correlation (i.e.,Kendall’s �), which is a nonparametric measure of asso-ciation. The result is essentially identical in that the cor-relation is much greater (� = 0.43) under the treatmentcondition than under the control condition (� = 0.10).

We can statistically test the equality of these correla-tion coefficients. For Pearson’s correlation coefficients, itis customary to use the two-sample z-test after applyingFisher’s Z transformation to each of the two sample cor-

relations (Fisher 1915). However, it is known that this ap-proximation can be poor if the distribution of underlyingdata is far from a bivariate normal distribution (Hawkins1989). This is exactly the case here since these responsesare categorical variables. Thus, we use the nonparametricbootstrap procedure to conduct a statistical test.7 Whenthis procedure is applied to the data displayed in Figure 2,we find that the difference between the two correlationcoefficients is statistically significant with the one-sided p-value less than 0.001. The 95% confidence interval of thedifference is relatively narrow, [0.281, 0.356]. In addition,we apply the same bootstrap procedure using Kendall’s� . This analysis confirms the result based on Pearson’scorrelation.

In Figure 3, we conduct a similar comparison betweenlist and endorsement experiment questions by examiningeach endorsement question separately rather than calcu-lating the average of four endorsement questions. Withthe exception of prison reform, the correlation under thetreatment condition remains substantially higher than

7The null hypothesis is the equality of the correlation coefficient be-tween the treatment and control groups and an alternative hypoth-esis is that the correlation coefficient is greater under the treatmentgroup than under the control group.

Page 8: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1050 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

FIGURE 3 Comparison of Responses from List and Endorsement Experiments in the AfghanistanSurvey by Endorsement Question

Note: For the endorsement experiment (horizontal axis), we plot the numerical response on a 5-point scale. For both experiments, ahigher numerical response represents greater support for the International Security Assistance Force (ISAF). The Pearson’s correlationand Kendall’s rank correlation between the two survey measures are represented by � and � , respectively. The value given in parenthesesrepresents the one-sided p-value from the statistical test of the equality of the two correlation coefficients. In the first row, we presentresponses in the treatment group for ISAF, and in the second row, we present responses in the control group. The size of the open circlesis proportional to the number of respondents who gave the corresponding responses. In addition, the lowess curve is presented (dashedgray lines).

under the control conditions, with small p-values fromthe proposed statistical test indicating statistically signifi-cant differences.8 The fact that under the treatment condi-tion the average response from all endorsement questionsexhibits a higher correlation than any individual questiondemonstrates that the two survey experiments are likelyto be measuring the same underlying concept—averagingmultiple instruments of the same construct typically re-duces idiosyncratic noise associated with each surveyinstrument.

Furthermore, we examine whether this high level ofcorrelation remains when analyzing different subsets ofthe data defined by key variables theorized in the lit-erature on civil wars to covary with popular supportfor combatants, notably prior violence (Kalyvas 2006,91–104; Lyall, Blair, and Imai 2013) and the balance ofterritorial control (Kalyvas 2006; Leites and Wolf 1970).Figure 4 presents the results of this analysis. In the left twocolumns, we present the graphical analyses for subsets of

8The 95% confidence intervals of the differences in Pearson’scorrelation coefficients are [0.178, 0.259] for direct elections,[−0.057, 0.026] for prison reform, [0.261, 0.344] for election com-mission reform, and [0.377, 0.457] for corruption reform.

data corresponding to levels of exposure to violence mea-sured by ISAF’s Combined Information Data NetworkExchange (CIDNE) data on ISAF and insurgent attacksfor a one-year period before the survey was conducted.Specifically, the level of violence is measured at the districtlevel, comparing “low violence” (below the mean districtviolence level in the sample) to “high violence” (abovethe mean). We also divide the sample into districts thatare controlled by the Taliban and those for which controlis contested between the Taliban and Afghan/ISAF forces.Across all columns, we find that the correlation betweenthe two measures is much higher under the treatmentcondition than under the control condition.9

In sum, the proposed statistical test and graphi-cal method provide a simple tool to examine whetherseparate measures from list and endorsement experi-ments are compatible. Despite the fact that they arebased on indirect questions, our ISAF treatment inducesa high degree of correlation between the two survey

9The confidence intervals of the differences between the correlationcoefficients are [0.314, 0.585] for low- and [0.228, 0.420] for high-violence districts, and [0.399, 0.634] for Taliban and [0.162, 0.394]for contested territorial control.

Page 9: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1051

FIGURE 4 Comparison of Responses from a List Experiment and Four Endorsement Experimentsfor ISAF by Levels of Violence and Territorial Control

Note: Endorsement experiment measures (horizontal axis) are based on the average numerical responses to all four endorsementquestions. In the first row, responses are presented from the ISAF treatment group, and in the second row, responses from the controlgroup are presented. The Pearson’s correlation and Kendall’s rank correlation between the two survey measures are represented by �and � , respectively. The one-sided p-value based on the statistical test of equality of the two correlation coefficients is presented inparentheses. In addition, the lowess curve is presented (dashed gray lines).

measurements regardless of whether we examine the over-all data or only certain theoretically identified subsets.While this provides some confidence that both survey ex-periments are measuring the same underlying constructof interest, below we propose a statistical methodologythat compares and combines these results within a mul-tivariate analysis framework. This framework also incor-porates the covariates of respondents for predicting theiranswers to sensitive questions.

A Statistical Method for MultivariateAnalysis

We now more formally compare list and endorsement ex-periments using multivariate statistical models. The mainadvantage of this framework is that it allows researchersto directly model the answers to sensitive questions as afunction of respondents’ characteristics. We also proposea new statistical model that combines the results fromlist and endorsement experiments. In what follows, weassume that we have a simple random sample of N re-spondents from a survey in which each respondent com-pletes both list and endorsement experiments. It is alsopossible to apply the proposed statistical methodology to

two separate random samples from the same populationif researchers decide to implement list and endorsementexperiments to different respondents. Due to space con-straints, this extension is not presented in the currentarticle.

A Statistical Model for List Experiments. We begin bybriefly reviewing the statistical model for list experimentsintroduced by Imai (2011) and further developed by Blairand Imai (2012). Let Ti denote the binary treatment as-signment variable for respondent i , which is equal to 1 ifthe respondent is assigned to the treatment group and isequal to 0 otherwise. We use J L to represent the numberof control items, where superscript L stands for list ex-periments. In our survey, respondents with Ti = 1 receivea list of four items including ISAF, whereas those in thecontrol group are presented with a list of three controlitems, i.e., J L = 3.

In list experiments, respondents are asked to provideonly the total number of items for which their truth-ful answers are affirmative. For example, a respondentin the control group would provide an integer answerranging from 0 to J L . We use Y L

i to represent this ag-gregate response for each respondent. The distribution

Page 10: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1052 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

of this response variable in our survey is given inTable 1.

Our analysis is based on two assumptions requiredfor standard statistical analyses of list experiments (Blairand Imai 2012; Imai 2011). First, the assumption of nodesign effect states that the addition of a sensitive itemdoes not alter the aggregate response for the control items.Second, the assumption of no liars implies that respon-dents use truthful answers regarding the sensitive itemwhen giving an aggregate response to the treatment list.Our earlier statistical test suggested no clear evidence forthe existence of design effects in the Afghan data (see “TheList Experiment” above).10

Under this setting, a multivariate statistical model forlist experiments can be constructed within the standardlikelihood framework. Here, we use a binomial model,though other distributional assumptions are also possi-ble. First, for the control group, we model the observedresponse as follows:

Y Li | Ti = 0, Vi ∼ Binom

(J L , logit−1(V�

i �))

, (1)

where Vi is a vector of respondents’ characteristics in-cluding an intercept. For the treatment groups, we useY L

i (0) to denote the potential response under the controlcondition, which under the aforementioned assumptionscan be linked to the observed response Y L

i as follows:

Y Li = Y L

i (0) + Zi , (2)

where Zi is the latent binary response to the sensitiveitem.

Then we model the joint distribution of the latentresponses to the control items and the sensitive itemamong the treated respondents using the following bino-mial probit regression (though again, other distributionalassumptions and link functions are also possible):

Pr(Zi = 1 | Ti = 1, Vi ) = �(V�

i �), (3)

Y Li (0) | Ti = 1, Zi = z, Vi

indep.∼ Binom(

J L , logit−1(V�i �

)), (4)

where �(·) is the cumulative distribution function ofa standard normal random variable. Imai (2011) andBlair and Imai (2012) consider various extensions of thisbasic model by, for example, allowing the aggregate re-sponse to the control items, Y L

i (0), to explicitly dependon the latent response to the sensitive item, Zi , condi-tional on Vi , whereas for the sake of simplicity we as-sume conditional independence between Y L

i (0) and Zi .

10As shown by Blair and Imai (2012), modeling deviations fromthese assumptions is possible, but for the sake of simplicity wemaintain them for the remainder of the article.

We note that researchers may also wish to model theover-dispersion as done in Imai (2011) and implementedin the accompanying open-source software.11 Doing somay decrease the precision of the resulting estimates butyield confidence intervals that better reflect the estimationuncertainty.

For fitting this model, we develop a Bayesian modelof list experiments so that this model can subsequentlybe compared and combined with a statistical model ofendorsement experiments, which is also Bayesian. Wecomplete our model by choosing the following standardsemi-conjugate prior distribution for the parameters inthe above list experiment model:

�i.i.d.∼ N (0, A� ), (5)

�i.i.d.∼ N (0, A� ), (6)

where in our empirical analysis of the Afghan data, A�

and A� are equal to a diagonal matrix with variance equalto 9 for each parameter. “The List Experiment Model”in the appendix gives the Markov chain Monte Carlo(MCMC) algorithm used to sample from the posteriordistribution based on this model.

A Statistical Model for Endorsement Experiments.Next, we describe the statistical model for endorsementexperiments proposed by Bullock, Imai, and Shapiro(2011). Suppose that we have J E policy questions inwhich respondents are asked to rate their support forpolicy j on an Mj -point scale. E stands for endorsementexperiment. In our survey, for example, there are J E = 4questions and Mj = 5 for all j (i.e., strongly agree, agree,indifferent, disagree, strongly disagree).

The key modeling idea is to combine responses acrossmultiple policy questions using the framework of itemresponse theory and obtain a single measure of the levelof support on the underlying policy dimension. The keyassumption underlying this approach is that the selectedpolicies belong to the single dimension, allowing us tocombine responses to these policy questions. As detailedabove, we believe that our four policies occupy the samedomain and measure the same latent trait.

To formally describe our model, let Y Ei j represent

respondent i ’s answer to the question about policy j . Weconsider a variant of the standard ordered probit itemresponse theory model,

Y Ei j | Ti

indep.∼ N (� j

(xi + Ti s

∗i j

)− � j , 1), (7)

11In particular, the beta-binomial model may be substituted for thebinomial model.

Page 11: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1053

where Y Ei j = y if �y j < Y E

i j < �y+1, j with �0 j = −∞,�1 j = 0, and �Mj +1, j = ∞ being cutpoints. Here, � j pa-rameterizes the average unpopularity of policy j , � j rep-resents how much policy j differentiates respondents whohave different ideologies in this relevant policy dimen-sion (e.g., pro- and anti-reform respondents), and xi

characterizes the ideological position of respondent i(e.g., how pro-reform respondent i is).

We define the latent level of support as si j = s ∗i j ·

sgn(� j ) so that, for example, si j > 0 implies that re-spondent i is more likely to voice support for policy jwhen endorsed by the group, which we interpret as evi-dence of respondent i ’s positive support for the group.In our empirical analysis of the Afghan data, we as-sume � j > 0 using a truncated normal distribution forthe prior. This choice can be justified by the substanceof the endorsement questions: answering them affirma-tively means a respondent is supportive of domestic policyreforms.

We model support levels and ideal points as a func-tion of respondent characteristics in the following hierar-chical manner:

xiindep.∼ N (V�

i �, 1), (8)

si jindep.∼ N (V�

i , 2). (9)

We complete the model by using the following inde-pendent conjugate prior distributions for the modelparameters:

(� j � j )i.i.d.∼ N (0, B), (10)

� ∼ N (0, C ), (11)

∼ N (0, D), (12)

2 ∼ �/� 2 . (13)

In “The Endorsement Experiment Model” in the ap-pendix, we present the MCMC algorithm used to samplefrom the posterior distribution.

As discussed by Bullock, Imai, and Shapiro (2011),the results of endorsement experiments can be sensi-tive to the choice of policy questions for two reasons.First, typically we can only include a small number ofpolicy questions in the endorsement experiments forpractical and logistical reasons. This means that the es-timates will be inherently sensitive. Second, all policyquestions may not contribute equally to the estimationof support level because some questions are more in-formative about respondents’ ideology than others (i.e.,the discrimination parameter � j may vary across poli-cies). Therefore, it is important to examine the values

of discrimination parameters and conduct a sensitivityanalysis.

Comparing List and Endorsement Experiments. Giventhe two separate models described above, we now detailhow their results can be compared. The key insight is torecognize that the probability of supporting the groupof interest given respondents’ characteristics Vi can bederived from both models. Specifically, for the list exper-iment model, this quantity is given by

Pr(Zi = 1 | Vi ) = �(V�i �), (14)

which follows directly from the model of the latent re-sponse to the sensitive item given in Equation (3). Sim-ilarly, for the endorsement experiment model, we cancalculate the probability that the support parameter si j

takes a positive value for any given policy question j as,

Pr(si j > 0 | Vi ) = �(V�i ∗), (15)

where ∗ = / is an identifiable parameter.Given this relationship between the two models, there

are several interesting comparisons one can make afterfitting each model separately to the data from list and en-dorsement experiments. First, we can compare the overallproportion of supporters for the group within a targetpopulation based upon the two models. This is done bycomputing the difference in the sample averages of theabove quantities using the observed value of Vi for eachrespondent. This comparison will enable a formal assess-ment of the overall agreement between the two surveyexperiments. Second, it is possible to directly compare thecoefficients for respondents’ characteristics by computingthe difference between � and ∗ because these two pa-rameters are on the same scale of the normally distributedlatent support variable. This analysis will show whethersimilar patterns of association between support level andrespondents’ characteristics can be derived from list andendorsement experiments. Indeed, researchers can go be-yond the comparison of coefficients and examine how theestimated probability of support changes as a function ofcovariates. Finally, we can explore the characteristics ofrespondents who are likely to give more or less compatibleresponses for list and endorsement experiments. Such ananalysis can be conducted by computing the difference inthe probability of supporting the group of interest givencertain respondent characteristics.

For all of these comparisons, one can easily computeuncertainty estimates and conduct appropriate statisticaltests. Since posterior draws for these quantities are avail-able from each of the respective models, this can be doneby simply computing their difference for each posteriordraw.

Page 12: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1054 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

Combining List and Endorsement Experiments. Fi-nally, we demonstrate how to combine the list and en-dorsement experiments using a single statistical model. Todo this, we take advantage of the fact that both models arebased on the latent support level whose distribution is thestandard normal but with different conditional means:V�

i � for the list experiment model and V�i ∗ for the

endorsement experiment model. Thus, in order to com-bine the list and endorsement experiments, we assumethe same data-generating process for the latent levels ofsupport in both experiments. Specifically, we assume thatlatent support levels in both experiments are identicallydistributed by imposing the restriction � = ∗, whichdirectly connects the two models. The resulting modelmakes it possible to analyze list and endorsement exper-iments together and yields a single estimate of supportlevel for the group of interest. “The Combined Model” inthe appendix describes the MCMC algorithm we use tofit this combined model.

The main advantage of this strategy is that it pro-vides a single, coherent set of estimates from the twosurvey experiments and increases the statistical efficiencyof the resulting estimates. The validity of this combin-ing strategy can be formally assessed by examining stan-dard model comparison statistics such as the Bayes factorand Bayesian information criteria (in addition to simplycomparing these two parameters after separately fittingthe two models). We note that the proposed combinedmodel necessarily places greater weight on the endorse-ment experiment rather than the list experiment becausethe former typically comprises several questions. If onewishes to weight them equally, our model can be extendedto partially pool endorsement questions through anotherlevel of hierarchy and then combine them with list exper-iment questions. Finally, while not explored here either,it is also possible to impose a partial restriction whereonly some (but not all) coefficients are assumed to be thesame between the two models. Such an approach mightbe useful if researchers wish to formally model the re-lationships between respondents’ characteristics and thedegree of compatibility of the two survey experiments.We leave these and other extensions to future work.

Results of the Multivariate StatisticalAnalysis

We begin our multivariate statistical analysis by fittingthree separate statistical models. To analyze the endorse-ment experiment, we use the model described in thesubsection “A Statistical Model for Endorsement Experi-

ments” with the same covariates as the ones used by Lyall,Blair, and Imai (2013).12 Specifically, the individual-levelpredictors include whether a respondent was harmed byISAF and the Taliban (both physical harm and propertydamage), whether those who were harmed were subse-quently approached by the perpetrator (implying someform of post-harm mitigation efforts), whether a respon-dent is a member of a pro-Taliban tribe, how often heencounters ISAF, as well as the respondent’s age, income,years of education, and years of madrassa schooling. Themodel also includes several village- and district-level char-acteristics. For example, as mentioned in the third section,we include violence count variables and district-level vari-ables recording prior levels of foreign assistance and eco-nomic aid.

Similarly, we use the model described in the subsec-tion “A Statistical Model for List Experiments” to ana-lyze the list experiment with the same set of covariates.Finally, we conduct a joint analysis of list and endorse-ment experiments by reestimating the combined modeldescribed in the previous section with the same set of co-variates. The full list of variables is given in Table A1 in theappendix, along with estimated coefficients and standarderrors, and Lyall, Blair, and Imai (2013) offer a detailedexplanation of each variable. Finally, we use the MCMCalgorithms described in the appendix to fit these models.We run three chains with over-dispersed starting valuesto monitor convergence (Gelman and Rubin 1992). Aftera sufficient degree of convergence is achieved, we take thelast half of posterior draws to compute our quantities ofinterest.

Estimated Overall Levels of Support

We begin with a simple comparison of the estimated over-all proportion of those who support ISAF. As Figure 5demonstrates, the two modes of survey experiment re-turn nearly identical point estimates, with support forISAF only modest among our respondents (at 16% forthe list experiment, 17.5% for the endorsement exper-iment, and 19% for the combined model), though the95% confidence interval is somewhat wider for the en-dorsement experiment. These estimates are obtained bycomputing the predicted probability of supporting ISAFfor each respondent and then averaging this probabilityacross all respondents in the sample. The fact that theestimated overall difference between the two measure-ment strategies is almost zero is remarkable consideringthe significant differences in question format between listand endorsement experiments. Given the similarity of the

12For the sake of simplicity, we do not employ multilevel modeling.

Page 13: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1055

FIGURE 5 Estimated Overall Proportion ofISAF Supporters Based on the ListExperiment, EndorsementExperiment, and Combined Models

−0.25

0.00

0.25

0.50

Ove

rall

Pro

port

ion

of IS

AF

Sup

port

ers List Endorsement Difference

(List − Endorse)Combined

Note: The difference between the estimates based on the list (firstcolumn) and endorsement experiment (second column) mod-els is presented in the third column. The fourth column showsthe estimate based on the combined model. The vertical linesrepresent 95% confidence intervals. The estimates are essentiallyidentical across all three models.

results between the list experiment and endorsement ex-periment models, it is no surprise that the combinedmodel also yields an essentially identical estimate.13 Takentogether, these results support our assumption that thetwo survey experiments are measuring the same underly-ing concept.

What is the relative efficiency of these estimates? Thewidth of the 95% confidence interval is 0.124 for theendorsement experiment model, 0.080 for the list exper-iment model, and 0.075 for the combined model. Thus,in our case, as expected, the combined model providesthe most efficient estimate, though the efficiency gain rel-ative to the list experiment model is modest. To placethese numbers in perspective, we note that if the directquestioning was used with the same sample size and re-turned a similar estimate (i.e., 0.2), then the width ofthe 95% confidence interval would equal 0.037, which isapproximately half of that from the combined model.

As discussed at the end of the subsection “A Sta-tistical Model for Endorsement Experiments,” the re-sults of endorsement experiments can be sensitive to thechoice of policy questions. Indeed, our estimates of thediscrimination parameter differ somewhat across poli-cies, with (�1, �2, �3, �4) = (0.93, 0.56, 0.74, 0.74), giv-

13It may appear to be counterintuitive that the combined estimateis not a weighted average of the list and endorsement estimates, butthis could arise due to the nonlinearity of the model.

ing the largest weight to the direct elections policy ques-tion. We further conduct a sensitivity analysis by drop-ping each policy question and examining its impact onour estimates. The results are presented in Figure A1 andTable A2 in the appendix. As expected, dropping the di-rect elections question has the largest impact on our es-timate, driving the estimated level of support for ISAFfurther down. By contrast, dropping any of the otherthree policies has relatively little impact on our estimate.The sensitivity analysis suggests that the level of sup-port for ISAF might be well lower than what is presentedhere.

Estimated Marginal Effects on Levelsof Support

We next explore the conditional nature of ISAF supportby first assessing the marginal effect of self-reported Tal-iban and ISAF victimization on respondents’ support forISAF (the left panel of Figure 6) across the different sur-vey methods. We find that Taliban victimization leads to amodest increase in support for ISAF. While the combinedmodel estimates a more precise positive effect, the esti-mates from the list and endorsement experiment modelsare inconclusive. By contrast, ISAF victimization is asso-ciated with a consistently negative effect across the threemodels. While Taliban violence may not drive Pashtunsinto the arms of ISAF, violence perpetrated by ISAF actu-ally pushes respondents toward the Taliban. This asym-metrical result is consistent with the theory and empiricalfindings of Lyall, Blair, and Imai (2013).

We also examine the relationship between supportfor ISAF and post-harm efforts by the Taliban and ISAFto mitigate the effects of their violence. Interestingly, weobserve that the Taliban’s post-harm efforts have an in-consistent effect on attitudes toward ISAF across thesemodels. While the list experiment model estimates a mod-est positive effect, suggesting that Taliban post-harm ef-forts may actually be associated with an increase in ISAFsupport, the endorsement experiment model suggests theopposite, which is also favored by the combined model.By contrast, ISAF post-harm mitigation efforts are asso-ciated with an estimated positive effect on ISAF supportregardless of which survey experiment we analyze. Fur-thermore, the combined model provides a similar pointestimate but with narrower confidence intervals. As withall of these comparisons, results that are supported byboth the list and endorsement experiment models shouldbe viewed as more credible than those that are inconsis-tent across models.

We next proceed to investigate how key com-batant variables are associated with the degree of

Page 14: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1056 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

FIGURE 6 Estimated Average Marginal Effects of Taliban and ISAF Victimization and TheirPost-Harm Mitigation Efforts (“Approach” by Combatants) on the Probability ofSupporting ISAF

−0.25

0.00

0.25

0.50

Effe

cts

on P

roba

bilit

y of

Sup

port

ing

ISA

F

List

Endorse

Combined

List

Endorse

Combined

Victimization

by Taliban by ISAF

−0.25

0.00

0.25

0.50

Effe

cts

on P

roba

bilit

y of

Sup

port

ing

ISA

F

List

Endorse

Combined

List

EndorseCombined

Approach after Victimization

by Taliban by ISAF

Note: Three estimates based on the list experiment, endorsement experiment, and combined models are reported, along with thedifference between the list and endorsement experiment models. The vertical lines represent 95% confidence intervals. Across thethree models, victimization and subsequent approach by ISAF have negative and positive effects, respectively.

agreement (or disagreement) on the estimated levels ofISAF support across the list and endorsement experi-ments. In Figure 7, we present the results with respectto two variables identified by the existing literature asdeterminants of civilian attitudes and actions in civilwar: (1) the amount of aid or development funds al-located to a given village or area (Beath, Christia, andEnikolopov 2011; Berman, Shapiro, and Felter 2011; U.S.Army 2007), here measured by the district-level Com-mander’s Emergency Response Program (CERP) spend-ing (top panel); and (2) the level of control (Kalyvas 2006)exerted by the Taliban and ISAF at the district level inthe months before our survey was conducted (bottompanel).

Beginning with CERP spending, we find an intrigu-ing empirical pattern. Specifically, while the endorse-ment experiment (and combined results, not shown) re-veals little effect of CERP spending on the probabilityof supporting ISAF, the estimates based on the list ex-periment are strongly positive, with a large effect size.While one possible explanation for this list experimentfinding is that aid is shifting attitudes in a pro-ISAF di-rection, we believe that this is unlikely because we donot observe the same increase in the more subtle en-dorsement experiments. Given that list experiments aremore direct in their elicitation approach when comparedwith endorsement experiments, we conjecture that in-dividuals felt pressured to respond in a positive direc-tion when asked about ISAF, perhaps out of a belief

that continued assistance was conditional on obtain-ing positive responses. This result underscores the needto cross-reference empirical results across multiple sur-vey experiments in order to increase confidence in ourfindings.

Finally, we may also be concerned that support isendogenous to the relative level of control exercised bythe combatants. In our case, it is plausible that the prob-ability of supporting ISAF is highest in ISAF-controlledareas, lowest in Taliban-dominated areas, and interme-diate in contested areas as fence-sitting individuals hidetheir views.

We present evidence concerning these hypotheses inthe bottom panel of Figure 7 for our 21 districts. Severaltrends are notable. First, the list and endorsement exper-iments provide broadly similar estimates of the proba-bility of supporting ISAF within Taliban-dominated andcontested districts. These findings underscore the abil-ity of these instruments to solicit opinions on extremelysensitive issues even under difficult conditions. Second,we do observe some divergence across the list and en-dorsement experiment results for government-controlleddistricts. We must be cautious in drawing firm conclu-sions, however, for there are only three government-controlled districts in our sample. Somewhat surprisingly,despite concerns over endogeneity and social desirabilitybias, we find that the lowest estimate for ISAF support isreturned by the list experiment in government-controlleddistricts.

Page 15: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1057

FIGURE 7 Estimated Proportion of ISAF Supporters Based on theList Experiment, Endorsement Experiment, andCombined Models as a Function of the Amount of Aid andTerritorial Control

0.00

0.25

0.50

0 2 4 6 8 10 12

Pro

port

ion

of IS

AF

Sup

port

ers

CERP Aid Spending (hundred thousands)

EndorsementExperiment

ListExperiment

0.00

0.25

0.50

Pro

port

ion

of IS

AF

Sup

port

ers

List

Endorse

CombineList

Endorse

Combine

List

Endorse

Combine

Territorial Controlby Taliban by the Government Contested

Note: In the top panel, the proportion of supporters of ISAF was calculated across varyinglevels of aid spending under the Commander’s Emergency Response Program (CERP) basedon the endorsement experiment and list experiment models. The estimated proportion ofsupporters of ISAF within districts designated by ISAF as controlled by the Taliban, controlledby government forces, and contested districts was calculated based on the list experiment,endorsement experiment, and combined models in the bottom panel. The vertical lines represent95% confidence intervals.

Third, an examination across districts reveals that theestimates from our combined model are similar acrossTaliban-dominated and contested areas. This is a strikingfinding, one at odds with expectations in the literaturethat civilian attitudes toward combatants are purely en-dogenous to control by combatants. The fact that thecombined model yields estimates of the effects on theprobability of supporting ISAF that are only a few pointshigher than the other district types reinforces the fact thatthe relationship between control and attitudes may be farmore complicated than captured by current theoreticalwork.14

14We also divided our population into pro-Taliban and neutralPashtun tribes to determine whether the experiments performed

In sum, the results presented in this section demon-strate that when carefully designed and analyzed, list andendorsement experiments can generate substantively sim-ilar conclusions even when these survey experiments areconducted in a challenging research environment. Wefound that the list and endorsement experiment modelsyield essentially identical estimates of the overall propor-tion of ISAF supporters among Pashtun men. When theresults from the two models diverge, as in the case ofCERP spending, researchers should interpret the findingswith caution. On the other hand, confidence in the ro-bustness of our results is increased when the two models

differently for these subsets. The list and endorsement experimentsprovided similar results within these populations.

Page 16: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1058 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

converge on similar empirical predictions. The combinedmodel itself yields even more precise estimates of thequantity of interest, suggesting that such an approachcould be invaluable in (post-)conflict and other similarsettings where measuring attitudes toward sensitive is-sues or actors often induces noise as well as potentialbias.

Concluding Remarks and PracticalSuggestions

List and endorsement experiments are becoming populartools for eliciting truthful responses to sensitive ques-tions. They have been shown to be effective measurementstrategies, even in extremely challenging research envi-ronments such as in our empirical application: wartimeAfghanistan. However, since these survey techniques arebased on indirect questioning, it is important for re-searchers to cross-validate their measurements using dif-ferent survey instruments. To enable such comparisons,we introduce statistical methods to compare and com-bine the results from these survey experiments. For thefirst time, we demonstrate how to directly link data fromthe two survey experiments through the latent level ofsupport, and then we show how to estimate a key quan-tity of interest—the proportion of support of the actoror issue under study—using data from either questioningtechnique. We also develop multivariate techniques in asingle statistical framework to estimate marginal effectsof covariates on the level of support.

In our empirical application, we find that patternsof estimated support for ISAF among Pashtun men areremarkably consistent across list and endorsement exper-iments. This finding highlights the promise of the pro-posed empirical strategy given that the survey was con-ducted in a difficult research setting marked by insurgentcontrol, high degrees of violence, and persistent effortsby ISAF to sway public attitudes through widespread aidprograms. Furthermore, these results underscore the im-portance of drawing on multiple methods in a systematicfashion to increase confidence that we are accurately mea-suring the quantity of interest.

We conclude this article by offering a set of method-ological and practical suggestions for researchers seekingto combine these and other indirect survey techniques.General methodological recommendations about list andendorsement experiments are given elsewhere (Blair andImai, 2012; Bullock, Imai, and Shapiro 2011). Here, weoffer additional suggestions for those who are interestedin implementing these survey techniques.

First, we recommend that researchers randomize thetreatment across, not within, individuals. That is, a re-spondent should be randomly assigned to a control ortreatment group and then should only receive either allcontrol or all treatment group questions. In cases of mul-tiple treatments, the respondent should only receive ques-tions about one sensitive item. This fusing of a respon-dent to all control or treatment questions should extendacross the survey instruments because it permits directcomparison of the same individual across different sur-vey instruments via our statistical test and its associatedgraphical method.

While the inability to exploit within-respondenttreatment variation may reduce statistical efficiency, thisstrategy has several additional benefits. Specifically, itavoids triggering suspicion that might arise when repeat-edly asking about a range of sensitive items or actors, thusexposing the fact that the survey is seeking comparativeanswers. It also greatly simplifies logistics by eliminatingthe need to manage a large number of different surveyversions to account for all possible combinations of treat-ment assignments.

Second, we note that list experiments appear to bemore prone to social desirability bias than endorsementexperiments. This makes sense because list experimentsare more direct than endorsement experiments. In oursurvey, for example, the list experiment asks, though in-directly, a question about support for ISAF, whereas theendorsement experiment question is about support fora particular policy endorsed by ISAF rather than ISAFitself. Thus, list experiments may not be suitable for ex-tremely sensitive questions. We observed this firsthandwhen we attempted to measure support for the Talibanusing a list experiment. Despite having the same controlitems as our ISAF experiment, the inclusion of such asensitive item triggered dramatic floor and ceiling effects:no respondent answered either “0” (i.e., support none)or “4” (i.e., support all). Individuals clearly strategicallyhid their support (or lack thereof) for the Taliban withinthe remaining empirical distribution, making it impos-sible for us to recover measures of support. As demon-strated in the previous section, the list experiment wasalso notably sensitive to the amount of aid rendered to adistrict.

Third, we recommend that researchers use multi-ple questioning techniques to avoid relying on estimatesfrom a single measurement strategy. Researchers shouldpresent the estimates based on each experimental tech-nique to confirm the robustness of the conclusion tovarying measurement strategies, or to identify the anal-yses that are more sensitive to differences in questionformat.

Page 17: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1059

Fourth, and on a more practical note, we found itessential to engage in multiple pretests and focus groupsin the areas to be sampled (rather than, say, a conveniencesample in Kabul or its outskirts) to test for sensitivity toquestion order. The list experiment proved particularlytime-intensive, both in terms of training the enumera-tors and in ensuring that average Afghan respondentswho have very little formal education could understand

min

⎛⎝1,

∏ni=1

{logit−1 (V�

i � (t))}Y L

i (0)(t−1) {

1 − logit−1 (V�i � (t)

)}J L −Y Li (0)

(t−1)

N (� (t), A� )∏ni=1

{logit−1 (V�

i � (t−1))}Y L

i (0)(t−1) {

1 − logit−1 (V�i � (t−1)

)}J L −Y Li (0)

(t−1)

N (� (t−1), A� )

· N (� (t), S)

N (� (t−1), S)

⎞⎠ ,

the approach. We preceded our list experiment with apractice example to ensure that respondents understoodthe mechanics of the design. We also piloted multiple ver-sions of the list and endorsement experiments to identifysuitable control items and policy questions in order toavoid floor and ceiling effects and other complications.

This research also suggests several natural extensions.The proposed multiple measurement strategy could beemployed in a variety of different (and difficult) empiricalsettings where violence, social desirability concerns, andthe need to assess attitudes on sensitive items collide. Onthe methodological front, a similar modeling strategy canbe used to model other indirect questioning techniques,such as randomized response, as well as variants of list andendorsement experiments by linking different models viathe latent level of support as done in this article. We canalso extend the combined statistical model introduced inthis article to model individual and spatial characteristicsthat are responsible for driving different responses acrossthese survey experiments. Armed with this knowledge,

Z(t)i | � (t), � (t) ∼ Bernoulli

⎛⎝ Binom(

Y Li − 1; J L , �(t)

i

)· �(

�(t)i

)Binom

(Y L

i − 1; J L , �(t)i

)· �(

�(t)i

)+ Binom

(Y L

i ; J L , �(t)i

)·{

1 − �(

�(t)i

)}⎞⎠

researchers would be able to customize an array of exper-imental approaches given the particular challenges theyare likely to encounter in their study population.

Appendix

In this appendix, we describe the details of the Markovchain Monte Carlo (MCMC) algorithms for three models:the list experiment model, the endorsement experimentmodel, and the combined model. We also present theestimated coefficient from each of these three models.

The List Experiment Model

Step 1: Update the coefficients for the control item model,� , using the random walk Metropolis step wherethe proposal draw is obtained from the multivariatenormal distribution with mean equal to its previousdraw and a prespecified tuning variance parameterS. The acceptance ratio is given by

where N (·, ·) represents the (possibly multivariate)normal density function.

Step 2: Update the coefficients for the sensitive itemmodel, � , using the data augmentation algorithm.

Z∗i

(t) | Z(t−1)i , � (t−1)

∼{

1{Z∗i

(t) ≥ 0}N (V�

i � (t−1), 1)

if Z(t−1)i = 1

1{Z∗i

(t) < 0}N (V�

i � (t−1), 1)

if Z(t−1)i = 0

for each i

� (t) | {Z∗i

(t)}ni=1 ∼ N

⎛⎝( n∑i=1

Vi V�i + A�

)−1

×n∑

i=1

Vi Z∗i

(t),

(n∑

i=1

Vi V�i + A�

)−1⎞⎠

Step 3: Sample the latent responses to the sensitive andcontrol items, (Zi , Y L

i (0)) for units with Ti = 1 and1 ≤ Y L

i ≤ J L − 1. For units with Y Li = J L or Y L

i =0, we set Z(t)

i = 1 and Z(t)i = 0, respectively. For each

unit, we sample the latent response to the sensitiveitem according to the following distribution:

where �(t)i = logit−1(V�

i � (t)), �(t)i = V�

i � (t), andBinom(·, ·) represents the binomial density func-tion. For the responses to the control items, we set

Y Li (0)

(t) = Y Li − Z(t)

i .

The Endorsement Experiment Model

Step 1: Sample the cutpoint parameters, {� (t)l j }Mj

l=0,

given {x(t−1)i }n

i=1, {s (t−1)i1 , . . . , s (t−1)

i J E }ni=1, and

{�(t−1)j , �(t−1)

j }J E

j=1 using the Metropolis-Hastingsalgorithm of Cowles (1996). Throughout the

Page 18: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1060 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

iterations, we fix � (t)0 j = −∞, � (t)

1 j = 0, and

� (t)Mj , j = ∞ for each j .

Step 2: Sample (�(t)j , �(t)

j ) given {x(t−1)i }n

i=1,

{s (t−1)i1 , . . . , s (t−1)

i J E }ni=1, and {� (t)

l j }Mj

l=0 for each

j = 1, . . . , J E . This can be accomplished in thefollowing two steps.

Y E (t)

i j | x(t−1)i , s (t−1)

i j , �(t−1)j , �(t−1)

j ∼ 1{

� (t)Y E

i j , j≤ Y E (t)

i j ≤ � (t)Y E

i j +1, j

}N (− �(t−1)

j + �(t−1)j

(x(t−1)

i + s (t−1)i j

), 1),

(�(t)j , �(t)

j ) | {Y E (t)

i j , x(t−1)i , s (t−1)

i j }ni=1 ∼ 1{�(t)

j ≥ 0} N(

�(t−1)n∑

i=1

W(t−1)i Y E (t)

i j , �(t−1)

),

where �(t−1) = (∑n

i=1 W(t−1)i W(t−1)

i

� + B)−1 and

Wi = (−1 , x(t−1)i + s (t−1)

i j )�. The second step is im-

plemented by first drawing �(t)j from its marginal

distribution, which is a univariate normal distribu-tion, and then sampling �(t)

j from its conditional dis-

tribution given �(t)j , which is a univariate truncated

normal distribution.Step 3: Sample s (t)

i j given x(t−1)i , Y E (t)

i j , �(t)j , �(t)

j , (t−1),

and 2(t−1)for each (i, j ). This step can be ac-

complished by the standard Gibbs sampling al-gorithm of the Bayesian normal regression for asingle observation where the outcome variable isY E (t)

i j + �(t)j − �(t)

j x(t−1)i , the predictor is �(t)

j , the er-ror variance is fixed at 1, and the prior distributionfor s (t)

i j is N (V�i (t−1), {(t−1)}2).

Step 4: Sample x(t)i given {s (t)

i j , Y E (t)

i j , �(t)j , �(t)

j }J E

j=1 and

�(t−1) for each i . This step can be accomplished by thestandard Gibbs sampling algorithm of the Bayesiannormal regression where the outcome variable isY E (t)

i j + �(t)j − �(t)

j s (t)i j , the predictor is �(t)

j , the er-ror variance is fixed at 1, and the prior distributionfor xi is N (V�

i �(t−1), 1).

Step 5: Sample (t) and (t) given all s (t)i j . This step can

be accomplished by the standard Gibbs sampling al-gorithm of the Bayesian normal regression where theoutcome variable is s (t)

i j , the predictor is Vi , the vector

of coefficients is (t), the error variance is {(t)}2, andthe prior distribution for (t) is N (0, D).

Step 6: Sample �(t) given x(t)i . This step can be accom-

plished by the standard Gibbs sampling algorithm

of the Bayesian normal regression where the outcomeis x(t)

i , the predictor is Vi , and the error variance isfixed at 1.

The Combined Model

To combine the two models, we assume � = /. All thesteps of the MCMC algorithm are identical to those de-scribed in above, except that Step 2 of the list experimentmodel and Step 4 of the endorsement experiment modelwill be combined into the standard updating of a stackedregression model where the dependent variable consistsof s (t)

i j as well as Z∗i

(t) (which is now sampled from the

truncated normal distribution with mean V�i (t−1) and

standard deviation (t−1)), the independent variable isVi , and the variance parameter is {(t−1)}2. We can usethe standard updating procedure with the semi-conjugateprior distribution.

FIGURE A1 Sensitivity Analysis: EstimatedMean Support Levels, DroppingOne of the Four Policy Questions

0.00

0.25

0.50

Ove

rall

Pro

port

ion

of IS

AF

Sup

port

ers

All FourPolicies

DropDirect

Elections

DropPrisonReform

DropElection

Commission

DropCorruption

Reform

Note: This figure presents the estimates of the same quantities ofinterest as those in Figure 5 based on the models where one ofthe four policies is excluded from the analysis.

Page 19: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1061

TABLE A1 Estimated Coefficients for the Three Models

List EndorsementExperiment Experiment Combined

Est. S.E. Est. S.E. Est. S.E.

Individual levelHarm from Taliban violence 0.41 1.29 0.45 0.32 0.54 0.23Harm from Taliban violence is NA 2.23 2.52 −0.67 0.80 −0.54 0.63Harm from ISAF violence −2.69 1.48 −0.36 0.25 −0.34 0.17Harm from ISAF violence is NA 0.99 2.75 2.54 1.19 1.66 0.81Approach by Taliban after Harm 1.83 1.45 −1.28 0.42 −1.00 0.30Approach by Taliban after Harm is NA 0.24 3.19 1.59 1.96 2.06 1.44Approach by ISAF after Harm 3.77 1.70 0.42 0.47 0.45 0.34Approach by ISAF after Harm is NA 0.21 2.94 2.51 2.30 2.38 2.00ISAF encounter frequency 1.08 0.43 0.10 0.13 0.17 0.09Years of education 0.12 0.08 0.02 0.03 0.01 0.02Age (tens) 0.46 0.29 −0.09 0.09 0.02 0.06Income (Afghanis) 0.09 0.46 0.33 0.16 0.17 0.10Income is NA −0.47 1.38 0.26 0.60 0.01 0.41Schooled in madrassa −1.62 0.67 −0.44 0.22 −0.22 0.15Pro-Taliban tribe −1.99 1.00 0.05 0.29 0.07 0.21Pro-Taliban tribe is NA 0.69 1.41 −0.13 0.48 −0.22 0.32

Village levelAltitude (km) −0.87 0.48 −0.16 0.14 −0.07 0.10Population −0.21 0.68 0.09 0.11 0.07 0.08ISAF-initiated violent events (within 5 km) 1.14 0.82 −0.03 0.14 0.01 0.10Taliban-initiated violent events (within 5 km) −2.72 1.14 −0.07 0.14 −0.10 0.12

District levelSha’ria courts −2.01 2.24 −0.56 0.40 −0.04 0.28CERP project spending 3.69 1.39 −0.01 0.23 −0.08 0.16Opium cultivation (ha.) −6.24 1.37 0.08 0.19 −0.09 0.14CDC project count −0.81 0.52 −0.16 0.11 −0.00 0.08Road length (km) 1.93 0.72 0.17 0.15 0.19 0.11Pakistan border 0.69 1.35 0.26 0.37 0.07 0.25Government territorial control −4.52 2.01 0.43 0.41 0.23 0.30Contested territorial control −2.67 1.31 −0.33 0.28 0.00 0.19Intercept −5.01 1.93 −0.82 0.60 −1.58 0.50

Page 20: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

1062 GRAEME BLAIR, KOSUKE IMAI, AND JASON LYALL

TA

BL

EA

2C

omp

aris

onof

the

Est

imat

edC

oeff

icie

nts

for

the

En

dors

emen

tExp

erim

entM

odel

sT

hat

Incl

ud

eA

llFo

ur

Qu

esti

ons

and

the

Mod

els

Th

atE

xclu

de

On

eof

the

Fou

rP

olic

yQ

ues

tion

s

Dro

pD

rop

Dro

pD

rop

All

Fou

rP

olic

ies

Dir

ectE

lect

ion

sP

riso

nR

efor

mE

lect

ion

Com

mis

sion

Cor

rup

tion

Ref

orm

Est

.S.

E.

Est

.S.

E.

Est

.S.

E.

Est

.S.

E.

Est

.S.

E.

Indi

vidu

alle

vel

Har

mfr

omTa

liban

viol

ence

0.45

0.32

−0.5

11.

180.

610.

350.

400.

260.

490.

36H

arm

from

Talib

anvi

olen

ceis

NA

−0.6

70.

80−1

.04

1.58

−0.5

20.

89−0

.36

0.65

−1.2

20.

91H

arm

from

ISA

Fvi

olen

ce−0

.36

0.25

−0.8

81.

06−0

.46

0.27

−0.4

30.

20−0

.36

0.28

Har

mfr

omIS

AF

viol

ence

isN

A2.

541.

190.

691.

822.

661.

431.

850.

973.

351.

43A

ppro

ach

byTa

liban

afte

rH

arm

−1.2

80.

42−3

.41

2.30

−1.6

80.

55−0

.84

0.33

−1.0

80.

47A

ppro

ach

byTa

liban

afte

rH

arm

isN

A1.

591.

960.

131.

781.

322.

181.

111.

521.

832.

17A

ppro

ach

byIS

AF

afte

rH

arm

0.42

0.47

1.69

1.78

0.38

0.53

0.50

0.39

0.08

0.53

App

roac

hby

ISA

Faf

ter

Har

mis

NA

2.51

2.30

0.11

1.79

2.55

2.54

1.74

1.85

3.46

2.62

ISA

Fen

cou

nte

rfr

equ

ency

0.10

0.13

0.33

0.71

0.09

0.14

0.04

0.11

0.11

0.14

Year

sof

edu

cati

on0.

020.

030.

100.

110.

020.

030.

020.

020.

020.

03A

ge(t

ens)

−0.0

90.

09−0

.77

0.60

−0.0

50.

10−0

.08

0.07

−0.1

20.

10In

com

e(A

fgh

anis

)0.

330.

160.

890.

770.

250.

170.

290.

130.

370.

18In

com

eis

NA

0.26

0.60

−0.5

11.

430.

160.

670.

380.

490.

260.

67Sc

hoo

led

inm

adra

ssa

−0.4

40.

22−2

.05

1.47

−0.3

30.

24−0

.41

0.18

−0.5

20.

26P

ro-T

alib

antr

ibe

0.05

0.29

0.04

1.02

−0.0

10.

330.

100.

240.

140.

33P

ro-T

alib

antr

ibe

isN

A−0

.13

0.48

−0.0

81.

36−0

.23

0.53

−0.0

50.

40−0

.20

0.54

Vill

age

leve

lA

ltit

ude

(km

)−0

.16

0.14

−0.7

00.

79−0

.20

0.17

−0.1

80.

12−0

.15

0.16

Popu

lati

on0.

090.

110.

560.

640.

070.

120.

090.

090.

080.

12IS

AF-

init

iate

dvi

olen

tev

ents

(wit

hin

5km

)−0

.03

0.14

0.47

0.67

0.02

0.15

−0.0

80.

12−0

.07

0.15

Talib

an-i

nit

iate

dvi

olen

tev

ents

(wit

hin

5km

)−0

.07

0.14

−0.3

90.

67−0

.07

0.15

−0.0

20.

11−0

.12

0.15

Dis

tric

tlev

elSh

a’ri

aco

urt

s−0

.56

0.40

−0.9

81.

46−0

.72

0.45

−0.4

90.

33−0

.76

0.48

CE

RP

proj

ect

spen

din

g−0

.01

0.23

0.14

0.87

−0.0

10.

25−0

.03

0.19

−0.0

60.

26O

piu

mcu

ltiv

atio

n(h

a.)

0.08

0.19

−0.8

00.

850.

050.

200.

090.

150.

230.

22C

DC

proj

ect

cou

nt

−0.1

60.

11−0

.58

0.55

−0.1

90.

13−0

.12

0.09

−0.2

30.

13R

oad

len

gth

(km

)0.

170.

150.

720.

810.

190.

170.

100.

120.

220.

18Pa

kist

anbo

rder

0.26

0.37

0.64

1.35

0.28

0.41

0.18

0.31

0.34

0.42

Gov

ern

men

tte

rrit

oria

lcon

trol

0.43

0.41

1.26

1.37

0.38

0.46

0.37

0.34

0.51

0.47

Con

test

edte

rrit

oria

lcon

trol

−0.3

30.

28−0

.51

1.02

−0.2

10.

29−0

.33

0.22

−0.3

70.

32In

terc

ept

−0.8

20.

60−2

.85

2.16

−0.8

40.

66−0

.54

0.48

−0.4

90.

67

Page 21: Comparing and Combining List and Endorsement Experiments ... · discussion about the design of list and endorsement ex-periments, we refer readers to Blair and Imai (2012) and Bullock,

LIST AND ENDORSEMENT EXPERIMENTS 1063

References

Beath, Andrew, Fotini Christia, and Ruben Enikolopov. 2011.“Winning Hearts and Minds? Evidence from a Field Experi-ment in Afghanistan.” MIT Political Science Working PaperNo. 2011-14.

Berman, Eli, Jacob Shapiro, and Joseph Felter. 2011. “Can Heartsand Minds Be Bought? The Economics of Counterinsur-gency in Iraq.” Journal of Political Economy 119: 766–819.

Blair, Graeme, Christine Fair, Neil Malhotra, and Jacob Shapiro.2013. “Poverty and Support for Militant Politics: Evidencefrom Pakistan.” American Journal of Political Science 57(1):30–48.

Blair, Graeme, and Kosuke Imai. 2011. “list: Statistical Methodsfor the Item Count Technique and List Experiment.” Avail-able at the Comprehensive R Archive Network (CRAN),http://CRAN.R-project.org/package=list.

Blair, Graeme, and Kosuke Imai. 2012. “Statistical Analysis ofList Experiments.” Political Analysis 20(Winter): 47–77.

Bullock, Will, Kosuke Imai, and Jacob N. Shapiro. 2011. “Sta-tistical Analysis of Endorsement Experiments: MeasuringSupport for Militant Groups in Pakistan.” Political Analysis19(Autumn): 363–84.

Corstange, Daniel. 2009. “Sensitive Questions, Truthful An-swers? Modeling the List Experiment with LISTIT.” PoliticalAnalysis 17(1): 45–63.

Cowles, Mary Kathryn. 1996. “Accelerating Monte CarloMarkov Chain Convergence for Cumulative-Link Gen-eralized Linear Models.” Statistics and Computing 6(2):101–11.

Fisher, Ronald A. 1915. “Frequency Distribution of the Valuesof the Correlation Coefficient in Samples from an Indepen-dently Large Population.” Biometrika 10(May): 507–21.

Flavin, Patrick, and Michael Keane. 2010. “How Angry Am I?Let Me Count the Ways: Question Format Bias in List Exper-iments.” Technical report, Department of Political Science,University of Notre Dame.

Gaines, Brian J., James H. Kuklinski, and Paul J. Quirk. 2007.“The Logic of the Survey Experiment Reexamined.” PoliticalAnalysis 15(Winter): 1–20.

Gelman, Andrew, and Donald B. Rubin. 1992. “Inference fromIterative Simulations Using Multiple Sequences (with Dis-cussion).” Statistical Science 7(4): 457–72.

Gingerich, Daniel W. 2010. “Understanding Off-the-Books Pol-itics: Conducting Inference on the Determinants of Sensi-tive Behavior with Randomized Response Surveys.” PoliticalAnalysis 18 (Summer): 349–80.

Glynn, Adam N. 2013. “What Can We Learn with StatisticalTruth Serum? Design and Analysis of the List Experiment.”Public Opinion Quarterly 77(S1): 159–172.

Gonzalez-Ocantos, Ezequiel, Chad Kiewet de Jonge, Carlos Me-lendez, Javier Osorio, and David W. Nickerson. 2012. “VoteBuying and Social Desirability Bias: Experimental Evidencefrom Nicaragua.” American Journal of Political Science 56(1):202–17.

Hawkins, D. L. 1989. “Using U Statistics to Derive the Asymp-totic Distribution of Fisher’s Z Statistic.” American Statisti-cian 43(November): 235–37.

Holbrook, Allyson L., and Jon A. Krosnick. 2010. “Social Desir-ability Bias in Voter Turnout Reports: Tests Using the ItemCount Technique.” Public Opinion Quarterly 74(Spring): 37–67.

Imai, Kosuke. 2011. “Multivariate Regression Analysis for theItem Count Technique.” Journal of the American StatisticalAssociation 106(June): 407–16.

Kalyvas, Stathis. 2006. The Logic of Violence in Civil War. Cam-bridge: Cambridge University Press.

Kuklinski, James H., Michael D. Cobb, and Martin Gilens. 1997.“Racial Attitudes and the “New South.”Journal of Politics59(2): 323–49.

Leites, Nathan, and Charles Wolf. 1970. Rebellion and Au-thority: An Analytic Essay on Insurgent Conflicts. Chicago:Markham.

Lyall, Jason, Graeme Blair, and Kosuke Imai. 2013. “ExplainingSupport for Combatants during Wartime: A Survey Experi-ment in Afghanistan.” American Political Science Review. 107(4): 679–705.

Lyall, Jason, Kosuke Imai, and Yuri Shiraito. 2013. “Coeth-nic Bias and Wartime Informing.” Technical Report, YaleUniversity.

Miller, Judith Droitcour. 1984. “A New Survey Technique forStudying Deviant Behavior.” PhD dissertation, The GeorgeWashington University.

Raghavarao, D., and W. T. Federer. 1979. “Block Total Responseas an Alternative to the Randomized Response Method inSurveys.” Journal of the Royal Statistical Society, Series B:Methodological 41(1): 40–45.

Transue, John E., Daniel J. Lee, and John H. Aldrich. 2009.“Treatment Spillover Effects across Survey Experiments.”Political Analysis 17(Spring): 143–61.

Tsuchiya, Takahiro, Yoko Hirai, and Shigeru Ono. 2007.“A Study of the Properties of the Item Count Tech-nique.” Public Opinion Quarterly 71(Summer): 253–72.

U.S. Army. 2007. U.S. Army Field Manual No. 3-24. Chicago:University of Chicago Press.

Warner, Stanley L. 1965. “Randomized Response: A Sur-vey Technique for Eliminating Evasive Answer Bias.” Jour-nal of the American Statistical Association 60(March): 63–69.