administrative records mask racially biased policing...franco, and melo 2015; mummolo 2018a; peyton...

American Political Science Review, of 19

doi:10.1017/S0003055420000039 © American Political Science Association 2020

Administrative Records Mask Racially Biased PolicingDEAN KNOX Princeton University

WILL LOWE Hertie School of Governance

JONATHAN MUMMOLO Princeton University

Researchers often lack the necessary data to credibly estimate racial discrimination in policing. Inparticular, police administrative records lack information on civilians police observe but do notinvestigate. In this article, we show that if police racially discriminate when choosing whom to

investigate, analyses using administrative records to estimate racial discrimination in police behavior arestatistically biased, and many quantities of interest are unidentified—even among investigated individu-als—absent strong and untestable assumptions. Using principal stratification in a causal mediationframework, we derive the exact form of the statistical bias that results from traditional estimation. Wedevelop a bias-correction procedure and nonparametric sharp bounds for race effects, replicate publishedfindings, and show the traditional estimator can severely underestimate levels of racially biased policing ormask discrimination entirely. We conclude by outlining a general and feasible design for future studies thatis robust to this inferential snare.

Concern over racial bias in policing, and the publicavailability of large administrative data setsdocumenting police–civilian interactions, have

prompted a raft of studies attempting to quantify theeffect of civilian race on law enforcement behavior.These studies consider a range of outcomes includingticketing, stop duration, searches, and the use of force(e.g., Antonovics and Knight 2009; Fryer 2019;Ridgeway 2006; Nix et al. 2017). Most research in thisarea attempts to adjust for omitted variables that maycorrelate with suspect race and the outcome of interest.In contrast, this study addresses a more fundamentalproblem that remains even if the vexing issue of omittedvariable bias is solved: the inevitable statistical bias thatresults fromstudying racial discrimination using recordsthat are themselves the product of racial discrimination(Angrist and Pischke 2008; Elwert and Winship 2014;Rosenbaum 1984). We show that when there is anyracialdiscrimination in thedecision todetain civilians—adecision that determines which encounters appear inpolice administrative data at all—then estimates of theeffect of civilian race on subsequent police behavior are

biased absent additional data and/or strong and untest-able assumptions.

This study makes several contributions. We clarifythe causal estimands of interest in the study of raciallydiscriminatory policing—quantities that many studiesappear tobe targeting, but are rarelymadeexplicit—andshow that the conventional approach fails to recover anyknown causal quantity in reasonable settings. Next, wehighlight implicit and highly implausible assumptionsin prior work and derive the statistical bias when theyare violated. We proceed to develop informativenonparametric sharp bounds for the range of possibleraceeffects, apply these ina reanalysis andextensionofa prominent article on police use of force (Fryer 2019),and present bias-corrected results that suggest this andsimilar studies drastically underestimate the level ofracial bias in police–civilian interactions. Finally, weoutline strategies for future data collection and re-search design that can mitigate these threats to in-ference.Thesearediscussed in the context of adetailedand feasible proposed study of racial bias in trafficstops.

As we show in this article, the difficulty of estimatingracial bias using police records stems from a thornycombination of mediation (Hernan, Hernandez-Diaz,and Robins 2004; Imai et al. 2011; Pearl 2001; Robins,Hernan, and Brumback 2000; VanderWeele 2009) andselection (Heckman 1979; Lee 2009): the effect of ci-vilian race on the outcome of a police encounter is me-diated by whether the civilian is stopped by police, butanalystsonlyhavedataforonelevelof themediator—thatis, data on stopped individuals. Because of this, policerecords do not contain a representative sample of allindividuals that police observe, but rather only thosecivilian encounters which escalated to the point oftriggering a reporting requirement. If a civilian’s raceaffects whether officers choose to stop that civilian(Gelman, Fagan, and Kiss 2007; Glaser 2014), thenanalyzing administrative police records amounts toconditioning on a variable that is itself affected bysuspect race, namely, whether a suspect appears in thedata at all. This could occur if officers have a higher

Dean Knox , Assistant Professor of Politics, Princeton University,[email protected].

Will Lowe , Senior Research Scientist, Hertie School of Gov-ernance, [email protected].

Jonathan Mummolo , Assistant Professor of Politics and PublicAffairs, Princeton University, [email protected].

We thank Matt Blackwell, Chuck Cameron, Tom Clark, ScottCunningham, Lauren Davenport, Naoki Egami, Jeffrey Fagan, AviFeller, AdamGlynn, Phillip Atiba Goff, Justin Grimmer, Andy Hall,Anna Harvey, Dan Hopkins, Matias Iaryczower, Kosuke Imai, Da-mon Jones, Dorothy Kronick, Shiro Kuriwaki, Neil Malhotra, MoritzMarbach, Nolan McCarty, Cyrus Samii, Maya Sen, Tara Slough,Rocio Titiunik, Tyler VanderWeele, Vesla Weaver, and SeanWestwood for helpful feedback. We thank Michael Pomirchy forresearch assistance. Replication files are available at the AmericanPolitical Science Review Dataverse: https://doi.org/10.7910/DVN/KFQOCV.

Received:April 26, 2019; revised:October 10, 2019; accepted: January8, 2020.

1

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039

https://doi.org/10.1017/S0003055420000039

https://orcid.org/0000-0002-1945-7938

mailto:[email protected]

https://orcid.org/0000-0002-1549-616


https://orcid.org/0000-0002-5639-3718


https://doi.org/10.7910/DVN/KFQOCV


https://www.cambridge.org/core

https://www.cambridge.org/core/terms

https://doi.org/10.1017/S0003055420000039

threshold for stopping white civilians during the un-seen first stage of police–civilian contact, meaning thatwhite civilians observed in the data are incomparablebecause they tend to pose a greater threat to policethan observed minorities. These unobserved differ-ences can lead analysts to understate anti-minorityracial bias—or even produce the appearance of anti-white bias—in the use of force. Despite claims to thecontrary (Fryer 2018, 2), this statistical bias oftencannot be eliminatedwith additional control variables,even if the goal is to estimate causal effects among thesubset of police–civilian encounters that appear inpolice data. Moreover, the problem remains whetherracial bias in detainment stems from so-called “taste-based” or “statistical” discrimination (Arrow1972, seebelow for extended discussion on this point).

At the first glance, the problem of race-based se-lection into policing data may appear a classic case ofsample selection bias (Elwert and Winship 2014;Heckman 1979) for which numerous remedies alreadyexist. But policing data exhibit a constellation of fea-tures that render previous methodological approachesunsuitableorunusable in this setting, leadingprominentscholars in this area to declare that “it is unclear how toestimate the extent of such bias or how to address itstatistically,” (Fryer 2018, 5).1 For example, Heckman(1979) and more recent extensions like Lee (2009)provide methods for estimating or bounding averagetreatment effects in the populationwhile accounting forsample selection. But with only data on stopped indi-viduals, policing scholars rarely seek to estimate pop-ulation treatment effects, instead targeting effectsamong individualswho actually interactwith police.Weshow that even without attempting to generalize to thebroader population, the issues we raise result in biasedestimates of the effect of race on police behavior evenamong encounters in which civilians are detained.

A related large literature provides remedies for so-called “post-treatment bias”—statistical bias thatresults from conditioning on a variable that is affectedby the causal variable of interest (Rosenbaum 1984).But implementation of these techniques requires eitherknowledge of the scale of the missing data (e.g., Nyhan,Skovron, and Titiunik 2017) or complete data on theposttreatment variable (e.g., Acharya, Blackwell, andSen 2016).2 In the case of policing, administrative datasets only include observations with one level of theposttreatment variable (i.e., data on stopped individu-als) and give no purchase on the number of individualspolice observe but do not stop, meaning these techni-ques cannot be applied. This scenario also differs from

situations of “truncation by death” (Frangakis andRubin 2002) in which receipt of a treatment causessample attrition and renders outcomes for someportionof units undefined. In the policing setting, individualsnot detained by police are absent from the data, butmany outcomes of interest are often still defined (e.g.,the level of force applied to nonstopped individuals iszero, a realized outcome). This feature allows us toidentify additional causal quantities that cannot be re-covered in the “truncation by death” setting. In short,existing methods offer either unusable or suboptimalsolutions to this pernicious threat to inference, absentstrong assumptions about the unseen process mappingcivilian race to officers’ decisions to detain individuals.

Our analysis indicates that existing empirical work inthis area is producing a misleading portrait of evidenceas to the severity of racial bias in police behavior.Replicating and extending the study of police behaviorin New York in Fryer (2019), we show that the con-sequences of ignoring the selective process that gen-erates police data are severe, leading analysts todramatically underestimate or conceal entirely thedifferential police violence faced by civilians of color.For example, while a naıve analysis that assumes norace-based selection into the data suggests only 10,000blackandHispanic civilianswerehandcuffedbecauseofracial bias inNewYorkCity between 2003 and 2013, weestimate that the true number is approximately 56,000.And while analyses ignoring bias in stopping wouldconclude that 10% of uses of force against black andHispanic civilians in these data were discriminatory,after bias-correction, we estimate that the true per-centage is 39%.

While the techniques used to obtain our correctedresults eliminate several facially implausible (and insome cases, empirically falsified) assumptions that areimplicit in prior work, we caution that they neverthelessrely on weaker assumptions that in some cases aredifficult to verify, as we discuss below. We seek to ad-vance the study of racial bias in policing by explicitlystating theseassumptions, discussing their plausibility inthis context, and carefully grounding unobservableparameters—in particular, the proportion of raciallydiscriminatory minority stops, which relates closely tothe severity of the statistical bias—in prior research(Gelman, Fagan, and Kiss 2007; Goel, Rao, and Shroff2016). We show that obtaining more precise bias-cor-rected estimates of racial discrimination in policingrequires future research to be designedwith this issue inmind. To that end, we outline a research design thatalleviates these concerns. Our study also providesa general framework for analyzing the study of racialbias that can illuminate the causal interpretation ofother longstanding tests for discrimination. For exam-ple, we show that under reasonable assumptions, so-called “outcome tests,” which compare the rates offinding evidence of criminal activity across detainedsuspects of different racial groups (Knowles, Perisco,and Todd 2001), imply a lower bound on the share ofracial minorities who are discriminatorily detained.Outcome tests also appear elsewhere in criminal justicestudies, for example, in capital sentencing (Alesina and

1 This comment was made in reference to an analysis of arrest data inFryer (2019). Further, Fryer (2019) includes an analysis aimed atcharacterizing selection into police data sets, and finds mixed resultsdepending on the outcome examined. The study states: “Taken to-gether, this evidence demonstrates how difficult it is to understandwhether there is potential selection into police data sets…Solving thisis outside the scope of this paper,” (19).2 In addition, the remedy proposed in Blackwell (2013), whichrequires re-weighting across all strata of the post-treatment variable,cannot be implemented in the situation we describe. However, thealternative designs we propose below are amenable to this approach.

Dean Knox, Will Lowe, and Jonathan Mummolo

2

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

Ferrara 2014) and bail decisions (Arnold, Dobbie, andYang 2018). And as Ayres (2002) and Simoiu, Corbett-Davies, and Goel (2017) note, such tests have alsobeen applied in a range of other social contexts, in-cluding financial lending and editorial decisions. Bynesting the study of discrimination in a rigorous andgeneral causal framework, our study can help synthe-size results from a broad interdisciplinary literature onracial bias.

Ourworkalsoextendsagrowing literature inpoliticalscience examining the political implications of law en-forcement which, in recent decades, has largely studiedpolicing indirectly, for example, as a means ofexplaining political participation (Burch 2013; Cohenet al. 2017; Lerman andWeaver 2014;White 2019) or asan instance of bureaucracy (Brehm and Gates 1999;Lipsky 1980; Ostrom andWhitaker 1973;Wilson 1989).This work is path breaking, but with some recentexceptions (Harvey and Mungan 2019; Magaloni,Franco, and Melo 2015; Mummolo 2018a; Peyton et al.2019; Soss and Weaver 2017), has tended to concep-tualize policing as a cause of politics, rather than a po-litical act in and of itself. The field’s relative inattentionto policing was made evident by several recent officer-involved shootings of unarmed black men (Edwards,Lee, and Esposito 2019) and subsequent social unrestthat caught many political scientists flatfooted, withlittle systematic evidence to offer as the demand forexplanations of police behavior surged. As Soss andWeaver (2017) note, the field’s limited store of relevantknowledge in the aftermath of these events wasespecially glaring given law enforcement’s role as aneveryday conduit of state power. According to oneoften-cited definition, politics is “who gets what, when,how” (Lasswell 1936). As a matter of routine, the dy-namics of police-civilian interactions determine whogets protected, punished, or left to fend for themselves(Wilson 1968).Viewed in thisway, the role of race in thestate’s exercise of violence, as well as in the provision ofsafety more broadly, is inherently political (Alexander2010; Gottschalk 2008; Key 1949). In addition to of-feringa rigorous analytic framework tohelp researcherscontendwith longstandingmethodological hurdles, ourstudy also underscores an often overlooked truth: po-licing is high-stakes politics.

CONCEPTUALIZING RACE AS ACAUSAL VARIABLE

We regard the investigation of racial bias in policing asan inherently causal inquiry, albeit a notoriously diffi-cult one. That is, researchers seek to assess whetherpolice behavior during police–civilian encounterswould have differed if the civilian had belonged toanother racial group, holding constant civilian behaviorand circumstances. As noted in Fryer (2018), this “‘raceeffect’…is the proverbial ‘holy grail’—the parameterthat we are all attempting to estimate but never quitedo” (2). This task is distinct from the descriptive en-terprise of merely documenting differential police be-havior during encounters with various groups, as such

disparities can arise via numerous processes that do notimply racial discrimination.3

The notion of a “causal effect of race” on an indi-vidual’soutcome is the subject ofmuchcontention in theliterature on causal inference (Hernan 2016; Pearl2018). Most notably, some have argued that this effectis undefined because race is an immutable, and hencenonmanipulable, characteristic (Holland 1986). Othersargue that an individual’s race is a complex, multifac-eted treatment—a “bundle of sticks,” in the words ofSen andWasow (2016)—that affects outcomes throughmyriad channels, and therefore, researchers must beprecise about the specific facets of race under consid-eration (Greiner and Rubin 2011).

Our analysis avoids this debate by focusing onpolice–civilian encounters—that is, sightings of civiliansby police—as the unit of analysis, rather than individ-uals. The manipulation of race is conceptualized as thecounterfactual substitution of an individual with a dif-ferent racial identity into the encounter, while holdingthe encounter’s objective context—location, timeof day, criminal activity, etc.—fixed. In other words, the“treatment” in this case is the entire “bundle of sticks”encapsulating the race of the civilian—including, forexample, skin tone, dialect, and clothing. We note thatthe credibility of causal inferences and the exact in-terpretation of racial discrimination in this frameworkwill depend crucially on how the analyst defines “race.”We leave the specific operationalization in a givencontext to the analyst, and, in linewith advice in SenandWasow (2016), encourage scholars to carefully conveytheir conceptualization of race when studying this andrelated questions.4

By conceptualizing the treatment in this way, weavoid consideration of the perhaps implausible coun-terfactual of holding all features of an individual con-stant but for their race. While various aspects of racialidentity and its close correlates may not be separable inthe observed world, there exists a subset of comparablesituations in which minority and majority civilians areobserved by police. If this subset can be identified, orapproximated through covariate adjustment, we canestimate the counterfactual police behavior that wouldhaveoccurredhad the civilian inquestionbeen replacedwith a member of another racial group.

While our approach considers a valid counterfactualand isolates racial discrimination that occurs duringpolice–civilian encounters, it necessarily mutes the in-fluence of pre-encounter macroinstitutional factors,suchasdecisions todeploymoreofficers to communitiesof color. In keepingwith the goals of prior studies in this

3 For example, we may observe that members of one racial group arestopped more often by police than members of another racial group.While this result shows disparate police behavior, it does not con-clusivelydemonstrate that thedifference isdue tocivilianrace. It couldsimply be the case that members of the first group participate incriminal activity more often in public.4 Note that while the unit of analysis is the police-civilian encounter,for the sake of brevity, we occasionally refer to “minority civilians” asshorthand for “police-civilian encounters with minority civilians” insubsequent discussion. Readers are cautioned to keep this distinctionin mind.

Administrative Records Mask Racially Biased Policing

3

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

area, our approach holds such contextual featuresconstant, allowing us to ask whether an encounterwould have unfolded differently had it involved a ci-vilian of differing race. But even if no such differenceexists within encounters, law enforcement strategiesadopted before encounters occur could still produceracially biased policing.We caution readers to keep thisscope condition in mind.

PRIOR RESEARCH ON RACIAL BIASIN POLICING

Race-based selection into policing data has been pre-viously noted, and some scholars have devised researchdesigns in an attempt to sidestep this issue.Grogger andRidgeway (2006), for example, leverage the so-called“veil of darkness” strategy, comparingpatterns in trafficstops that occur before and after sunset under thelogic that the race of the driver is plausibly hidden topolice officers after dark. In this way, the study aims toidentify a sample of police–civilian interactions thatwere initiated in a race-blind manner. Similarly,West(2018) examines data on police responses to trafficincidents, arguing that whether a co-racial officerresponds to amotorist’s unanticipated accident is as-ifrandom. If the assumptions in these studies hold,concerns over race-based sample selection are greatlyalleviated.

These attempts to mitigate race-based selection re-main rare, as most empirical studies in this literaturefocus nearly exclusively onmitigating themore familiarproblem of omitted variable bias. For example, Fryer(2019) (detailed below), a study of racial bias in policeviolence, estimates discrimination using data onpolice–civilian encounters via multivariate regressionsthat control for a host of observables relating to civil-ians, officers, and circumstance. In a related article, theauthor asserts that “regression can recover the ‘raceeffect’ if race is ‘as good as randomly assigned,’ con-ditional on the covariates” (Fryer 2018, 2). Fryer (2019)claims to find evidence of bias in sublethal force butnone in lethal encounters.

A related study, Johnson et al. (2019), attempts toestimate racial bias in police shootings. Examining onlypositive cases in which fatal shootings occurred, theyfind that the majority of shooting victims are white andconclude fromthis thatnoantiminoritybiasexists.KnoxandMummolo (2020) show that this conclusion rests onthe erroneous assumption that police encounter mi-nority and white civilians in equal number.

Prior work has also examined racial bias in trafficenforcement, such as Ridgeway (2006) which employspropensity score weighting when estimating racial biasin traffic stops in Oakland, CA. The analysis examinesoutcomes including citations, stop duration, and thedecision to search cars. The study claims thisreweighting strategy can recover “the causal effect ofrace” (9) on poststop outcomes. In general, the analysisfinds little evidence of racial bias on most outcomes,with the exception of stop duration. Antonovics andKnight (2009) use data on traffic citations from the

Boston Police Department to estimate the probabilitythat a ticketed driver was searched, controlling fordriver attributes such as age, race, and gender as well asneighborhood traits. They interpret the coefficient onan indicator of whether the officer and ticketed driverare of different races as an estimate of “racial profilingbased on prejudice,” as opposed to statistical discrim-ination (167). The claim is implicitly causal: some shareof searches among racially mismatched driver–officerpairs would not have occurred had the driver belongedto another racial group.

The above examples represent a mere fraction ofa decades-long, multidisciplinary effort to quantify thedegree to which police discriminate against civilians ofcolor [see Atiba Goff and Kahn (2012), Fridell (2017),and Ridgeway and MacDonald (2010) for more ex-tensive reviews of this empirical literature]. We high-light these specific examples because they all containseveral common features that are central toour critique.For one, these studies analyze data that fail to capturethe unseen selective process throughwhich police cometo engage civilians, a process that prior work shows isfunction of civilian race (Gelman, Fagan, and Kiss2007). In this way, these studies all fail to account for theimpact of race on the composition of the sample understudy. As we show below, failing to account for thisundocumented first stage of the police–civilian in-teractionwill lead to statistical bias, even if thegoal is toestimate the effect of suspect race within the sample ofindividuals who appear in police data and, in manycases, even with a “complete” set of control variablesthat render civilian race as-if randomly assigned topolice encounters.

Second, the aforementioned studies, despite makingat least implicitly causal claims, leave ambiguous theprecise quantity of interest—whether it be the averagetreatment effect (ATE) of race in all encounters; theaverage treatment effect among the subset of encoun-ters appearing in police data because a stop was made(ATEM51), which differs tremendously from the ATE;or the markedly more restrictive and difficult-to-in-terpret controlled direct effect among the same subset(CDEM51, defined below). While studies commonlydiscuss omitted variable bias and attendant assump-tions, they rarely discuss the additional assumptionsnecessary to identify specific causal quantities of in-terest. As a result, readers are unable to assess theadequacy of research designs and estimators, renderingthe interpretation and policy relevance of much priorwork unclear.

Taste-Based versus Statistical Discrimination

A closely related literature attempts to parse “taste-based discrimination” (racial animus) from so-called“statistical discrimination” (Arrow 1972, 1998;Becker 1971; Eberhardt et al. 2004; Phelps 1972) asmechanisms for racially biased policing, and insteadfocuses on recovering the causal effect of civilianrace on police behavior. In this study, we do not at-tempt to disentangle these mechanisms, and we notethat taste-based and statistical discrimination both


4

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

pose serious normative concerns. While statistical dis-crimination is sometimes viewed as more innocuous, itnonetheless constitutes racial profiling because officersdetain civilians due to the perceived actions oftheir racial group, not their observed individual be-havior. Thus, quantifying the causal effect of civilianrace on police behavior—our task here—is imperativeregardless of the mechanism that produces such aneffect.

CLARIFYING THE EFFECT OF CIVILIANRACE: NOTATION, ESTIMANDS,ASSUMPTIONS, ANDEXISTING APPROACHES

Researchers and policymakers examining the effects ofracially biased policing are nominally interested in therelationship between two variables: the race of the ci-vilian involved in encounter i, which we operationalizethrough theirminority statusDi2 {0, 1}, and consequentpolice behavior Yi 2 {0, 1}. However, analyses of ad-ministrative data on police-civilian encounters in-herently involve a mediating variable that may beaffected by race: whether an individual is stopped bypolice,whichwedenoteMi.The causalorderingof thesevariables is depicted in thedirectedacyclic graph (DAG)in Figure 1. We note that analysts often possess richcontextual informationabout theobjective contextof theencounter, suchas its locationand time,whichmay relatetoall of theabove.Wedenote these covariates collectivelyas Xi. However, administrative data invariably fail tocapture unobservable subjective aspects of the encounter,Ui, such as an officer’s suspicion or sense of threat.

As a motivating example, we consider the challengeof estimating racial bias in police violence as recentlyattempted in Fryer (2019). We ground our analysis inthe potential outcomes framework (Rubin 1974) oftenused in the study of causal mediation (Imai et al. 2011;Pearl 2001). The potential mediator Mi(d) representswhether encounter iwould have resulted in a stop if thecivilian were of race d. Similarly, the potential outcomeYi(d,m) representswhether forcewouldhavebeenusedin encounter i if the civilian were of race d and themediating variable werem. The observedmediator andoutcome can be written in terms of these potential

values asMi ¼Mi Dið Þ ¼�dMi dð Þ1 Di ¼ df g and Yi ¼Yi Di;Mi Dið Þð Þ¼�d�mYi d;mð Þ1 Di ¼ d;f Mi ¼mg,respectively. For any individual encounter, the (un-observable) causal effect of civilian race is thedifferenceinpotential force if thecivilianwereaminorityandstoppedas if they were a minority, versus if they were white andstopped accordingly, Yi(1, Mi(1)) 2 Yi(0, Mi(0)).

This notation implicitly makes the stable unit treat-ment value assumption (SUTVA, Rubin 1990). “Sta-bility” is of particular note: this stipulates that finerracial gradations must not affect the way that officersbehave, above and beyond any differences betweenthe broad binary categoriesDi5 0 andDi5 1. SUTVAalso requires that each encounter is unaffected by

a civilian’s race in other encounters; this might be vi-olated if, for example, groups of individuals are stoppedsimultaneously.

Traditionally, analysts use data on stopped indi-viduals to study bias by computing the difference inviolence rates between stopped minority and whitecivilians, while controlling for observable differencesbetween these two sets of encounters. We term this the“naıve estimator,” D, and it can be written as follows:

D ¼ YijDi ¼ 1;Mi ¼ 1� YijDi ¼ 0;Mi ¼ 1; (1)

where conditioning on possible treatment-outcomeconfounders, Xi, is left implicit. Assuming the analysthas correctly measured and specified all such con-founders, Dmay appear entirely reasonable at the firstglance. However, without further assumptions, thisquantity will have no causal interpretation so long asthe treatment affects the mediator (i.e., civilian raceaffects whether officers detain a civilian). As we showbelow, this is because treated encounters (with mi-nority civilians) that result in a stop (Mi51)will not becomparable to those with stopped control (majority)civilians. As a simple example, suppose officersexhibited racial bias as follows: they detain whitecivilians if they observe them committing a seriouscrime (such as assault, potentially warranting the useof force) but detain nonwhite civilians regardless ofobserved behavior. When this is true, comparingstopped white and nonwhite civilians amounts to com-paring fundamentally different groups. The analyst willobserve forceusedagainstagreaterproportionof stoppedwhite civilians because of the differential physical threatthey pose to officers.5 Under the traditional approach,the analyst would naıvely conclude that anti-white biasexists, yielding an erroneous portrait of racial discrimi-nation in the use of force.

To formalize the limitations of the naıve estimator,we begin by partitioning the population into principal

FIGURE 1. Directed Acyclic Graph of RacialDiscrimination in the Use of Force by Police

Notes: Observed X is left implicit; these covariates may becausally prior to any subset of D, M, and Y.

5 While somepolice records indicatewhethera suspectwasengaged inviolent behavior, allowing the analyst to control for this particularfactor, a host of similar concerns (e.g., time-varying officer suspicion)are unmeasured and thus cannot be controlled away.


5

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

strata with respect to the mediator (Frangakisand Rubin 2002; VanderWeele 2011). That is, we con-ceptualize police-civilian encounters in terms of fourlatent classes within which Mi(1) and Mi(0) are con-stant. The general approach of principal stratifica-tion has proven useful for clarifying and boundingquantities of interest in areas ranging from in-strumental variables (Angrist, Imbens, and Rubin1996; Balke and Pearl 1997) to the closely related“truncation by death” problem (Rubin 2000; Zhangand Rubin 2003).

These principal strata include “always-stop”encounters in whichMi(0)5Mi(1)5 1, as well as stopsthat discriminate against racial minorities (“racialstops”) in whichMi(1)5 1 butMi(0) 5 0. Always-stopencounters may be conceptualized as relativelysevere scenarios, such as violent crimes in progress, inwhich officers have no choice but to intervene regard-less of civilian race. In contrast, previous work hasidentified certain behaviors, such as “furtive move-ments” (Gelman, Fagan, andKiss 2007; Goel, Rao, andShroff 2016), that appear to be acted on selectively byofficers based on the race of suspects. “Never-stop”encounters, where Mi(0) 5 Mi(1) 5 0, are situationsin which civilians appear inconspicuous and wouldnot be stopped, regardless of race. There also may beanti-white racial encounters, in which Mi(1) 5 0 butMi(0) 5 1, though we believe these to be rare to non-existent (discussed further below). Figure 2 showsencounters appearing in police records (principalstrata for which Mi(Di) 5 1) are not comparableacross civilian races. Minority police-civilian encountersthat result in a stop are a mixture of “always-stop” and“antiminority racial stop” encounters, while encounterswith white civilians that result in a stop are a combinationof “always-stop” and “anti-white racial stop” encounters.These are fundamentally different groups, and withoutfurther assumptions, comparisons of rates of violencebetweenthemusingthenaıveestimatorwillbestatisticallybiased.

To state this more formally, note that the naıve es-timator recovers the weighted combination of violencerates in observed principal strata:

E Dh i

¼ E YijDi ¼ 1;Mi ¼ 1½ � � E YijDi ¼ 0;Mi ¼ 1½ �

¼ E Yi 1; 1ð ÞjDi ¼ 1;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �Pr Mi 0ð Þ ¼ 1jDi ¼ 1;Mi 1ð Þ ¼ 1ð Þ

þE Yi 1; 1ð ÞjDi ¼ 1;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ �Pr Mi 0ð Þ ¼ 0jDi ¼ 1;Mi 1ð Þ ¼ 1ð Þ

�E Yi 0; 1ð ÞjDi ¼ 0;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �Pr Mi 1ð Þ ¼ 1jDi ¼ 0;Mi 0ð Þ ¼ 1ð Þ

�E Yi 0; 1ð ÞjDi ¼ 0;Mi 1ð Þ ¼ 0;Mi 0ð Þ ¼ 1½ �Pr Mi 1ð Þ ¼ 0jDi ¼ 0;Mi 0ð Þ ¼ 1ð Þ:(2)

In equation (2), the first term is the average rate offorceappliedduringencounterswith racialminorities ofthe always-stop stratum, while the second term dealswith minorities in the anti-minority racial-stop stratum.The third and fourth terms are the average violencerates amongwhite civilian encounters in the always-stopand anti-white racial stop strata. Importantly, principal

strata are not fully observable without further assump-tions, and theyexist evenafter conditioningonXi: for anyparticularminority stop, it is fundamentally impossible toknowwith certainty whether a white civilian would havebeen stopped in identical circumstances. In sum, thenaıveestimator comparesgroupswithdifferent potentialoutcomes, and because these groups are unobservable,the resulting bias is difficult to address.

A central quantity of interest in the study of policingbias is the average treatment effect of race,ATE ¼ E Yi 1;Mi 1ð Þð Þ � Yi 0;Mi 0ð Þð Þ½ �—the extent towhich civilians of color face greater risk of police vio-lence thanwhite civiliansbecause of their race. TheATEconsiders both reported and unreported encounters,and it captures two related phenomena: first, whethermembers of theminority are differentially stopped; andsecond, if they are differentially subject to violence.However, police administrative records contain dataonly on reported encounters,meaning that this quantitycannot be estimated solely with police administrativedata without untenable assumptions. The ATE can berestated as follows:

ATE ¼ E Yi 1;Mi 1ð Þð Þ½ � � E Yi 0;Mi 0ð Þð Þ½ �¼�

d�m�m0

�E Yi 1;Mi 1ð Þð Þ � Yi 0;Mi 0ð Þð Þj½

Di ¼ d;Mi 1ð Þ ¼ m;Mi 0ð Þ ¼ m9�

3Pr Di ¼ d;Mi 1ð Þ ¼ m;Mi 0ð Þ ¼ m9ð Þ�; (3)

where the second line illustrates how it sums overthe principal strata depicted in Figure 2, takinginto account the number of minority and white civiliansin each strata (the probabilities) and the local averagetreatment effects for each group (the expectations). InOnline Appendices A.1–A.4, we use these quantities toderive bias and nonparametric sharp bounds.

No data are available for “never-stop” encounters,those with Mi(1) 5 Mi(0) 5 0. Moreover, racial-stopencounters, with Mi(1) 5 1 and Mi(0) 5 0, are onlyrecorded for minority civilians. However, consistentwith Nyhan, Skovron, and Titiunik (2017), we show inOnline Appendix A.6 that the ATE can be pointidentified if researchers collected two additionalnumbers: the count of total minority and whiteencounters, within levels of covariates X whereapplicable—a point we discuss further in our recom-mendations for future research.6

6 Nyhan, Skovron, and Titiunik (2017) examines the problem ofstudying the effect of party identification on turnout using voterregistration files, given the fact that party ID likely affects who reg-isters tovote. Inanapproachthat isequivalent toourProposition2, thestudy uses registration rates of the treated and control voting-agepopulations to bound the ATE given the effect of party ID on reg-istration. This option is not available in practice here since no data setscontain information on unreported encounter rates, or even theirorder ofmagnitude.As a result, analysts in this literature focus almostexclusively on the ATEM51, which we examine with a different ap-proach here. While some work in policing uses population figures asproxies for these encounter rates, we are skeptical of this approach, aspolice frequently stop civilians who reside in other jurisdictions.


6

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

Because “never-stop” encounters are unobserved incurrent data sources, researchers seeking to understandthe role of race in police behavior have, at least im-plicitly, focused on more narrowly defined estimands.7

Studies commonly restrict analysis to the subset ofreported encounters, that is, they seek to estimateeffects among those stopped by police, ATEM51. Incontrast to the ATE, this estimand is by definition notconcerned with unreported white encounters thatwould have escalated to a stop if the involved civilianwas a minority. (The same is true for unreported blackencounters that would have escalated if the involvedcivilian was white, to the extent that this group exists.)Formally, this quantity is given by the followingequation:

ATEM¼1 ¼ E Yi 1;Mi 1ð Þð ÞjMi ¼ 1½ � � E Yi 0;Mi 0ð Þð ÞjMi ¼ 1½ �:(4)

Relatedly, analysts may seek to causally attribute thenumber ofminority stops inwhich force would not havebeen used if the individual in question had been white(Yamamoto 2012). This value is proportional to theconditional average treatment effect among the treated(i.e., minority) stops, which can be written as follows:

ATTM¼1 ¼ E Yi 1;Mi 1ð Þð ÞjDi ¼ 1;Mi ¼ 1½ �

�E Yi 0;Mi 0ð Þð ÞjDi ¼ 1;Mi ¼ 1½ �: (5)

While the average treatment effects are of obviouspolicy importance, they are not the only quantity thatresearchers might seek to estimate. A closely relatedestimand is the controlleddirect effect among the subsetof reported encounters, CDEM¼1 ¼ E Yi 1; 1ð ÞjMi ¼ 1½ ��E Yi 0; 1ð ÞjMi ¼ 1½ �. This estimand differs from theATEM51 in its conceptual approach to racially dis-criminatory stops. Where the ATEM51 asks whethera stopwouldhaveoccurredatall if the individualwereofdiffering race, the CDEM51 seeks to quantify whatwould have happened if the officer was forced to stopthem anyway, perhaps against the officer’s will. Inpractice, the difference is one of inter-pretation—regardless of the target quantity, existingwork in this domain is based on the naıve difference inreported outcomes, and the question lies in the in-terpretation of estimated results. We note that causalestimands in the literature are often left undefined,making it difficult to assess whether published resultsare intended to correspond to theATEM51 or CDEM51(e.g., Goel, Rao, and Shroff 2016; Simoiu, Corbett-Davies, and Goel 2017). In Online Appendix A.3, wediscuss theCDEM51 at length.Weshowthat it cannotberecovered in this setting unless analysts make the un-tenable assumption that no mediator-outcome con-founding exists (Assumption 5, below). We referreaders to the Online Appendix for further details andfocus on recovery of average treatment effects here.

Necessary Assumptions

In this subsection, we describe a number of statisticalassumptions that the analyst must make for a causalstudy of racially biased policing when only adminis-trative data on police-civilian interactions is available.Without these assumptions, causal quantities of interestin this substantive area cannot be identified in data.

Assumption 1 (Mandatory Reporting). Yi(d, 0) 5 0for all i and for d 2 {0, 1}.

We assume all encounters that escalate to the use offorce also trigger a reporting requirement and are,therefore, observed in administrative data. Thoughthere exist wide variability in data recording practicesacross jurisdictions, this assumption is plausible in thestudy ofmanymajor police departments. For example,New York Police Department (NYPD) officers arerequired to report a number of variables, including thespecific type of force used, following each “stop,question, and frisk” encounter. Based on these andother reports, the NYPD releases detailed annual use-of-force reports (NYPD 2017). The completeness of

FIGURE 2. Principal Strata and ObservedPolice-Civilian Encounters

Notes: The figure displays the four principal strata that comprisepolice-civilian encounters based on how themediatorM (whethera civilian is stopped by police) responds to treatment D (whetherthe civilian is a racial minority). Minorities in the “always stop” andanti-minority racial stop strata, highlighted in red, are stopped bypolice and, thus, appear in police administrative data. Likewise,white civilians in the “always-stop” and anti-white racial stopstrata, highlighted in blue, appear in police data. “Never-stop”encounters are unobserved. Because white and nonwhiteencounters are drawn from different principal strata, the twogroupsare incomparableandestimatesof causal quantitiesusingobserved encounters will be statistically biased absent additionalassumptions.

7 For example, Fryer (2018) notes that his analysis of police use offorce is estimating the effect of suspect race “conditional on an in-teraction,” with police (4), rather than seeking its average treatmenteffect in the population.


7

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

these reports with respect to fatalities is informallyenforced by standard journalistic practiceswhich placehigh emphasis on documenting violent incidents(Iyengar 1994). Lesser formsof force aremore likely togo unreported, to be sure, but the ubiquity of sur-veillance cameras, cell phone cameras, and media in-terest in police brutality makes unobserved uses offorce increasingly unlikely (Fisher and Hermann2015). We note that this assumption is implicit in allanalyses of police use of force that rely on adminis-trative data.

Assumption 2 (Mediator Monotonicity). Mi(1) $Mi(0) for all i.

This assumption allows that there may be encoun-ters in which minorities would be stopped (Mi(1)5 1)but whites would not (Mi(0) 5 0), perhaps becauseofficers racially discriminate in applying differentialthresholds of “reasonable suspicion.” However, weassume that the reverse is never true: white civiliansare never stopped in circumstances when their mi-nority counterparts would be allowed to pass. Thisis clearly a stylized representation of a complexreality, and it would be violated if minority officersdiscriminate against white civilians. A violation couldalso occur if white civilians were more likely to bestopped by police because they appeared out ofplace in a predominantly black neighborhood, per-haps under the assumption that they were there tobuy drugs (Gelman, Fagan, and Kiss 2007, 822).These are rare occurrences, and a robustness checkin Online Appendix B.3, our reanalysis of Fryer(2019) after dropping all stops based on suspicionof a drug transaction, shows substantively similarresults.

Assumption 3 (Relative Nonseverity of RacialStops). E Yi d;mð ÞjDi ¼ d0;Mi 1ð Þ ¼ 1;Mi 0ð Þ½ ¼ 1;Xi ¼x�$E Yi d;mð ÞjDi½ ¼ d9;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0;Xi ¼ x�.

We theorize that for encounters during criminalevents severe enough to warrant stopping a civilianregardless of race (i.e., “severe” or “always-stop”encounters), the use of force is as ormore likely to occurthan during encounters in which police have morediscretion over whether to stop an individual (i.e., thosein which racial discrimination in stopping can occur) inexpectation. We regard this assumption, which com-pares violence rates within encounters that hold civilianrace fixed, as highly plausible. As one hypotheticalexample, this assumptionwould imply that police are asor more likely to use force against a white civilian ob-servedcommittingassault thanawhite civilianobservedjaywalking, on average.

Assumption 4 (Treatment Ignorability).

(a) With respect to potential mediator Mi(d) v Di|Xi.(b) With respect topotential outcomes:Yi(d,m)vDi|Mi(0)

5 m9, Mi(1) 5 m0, Xi.

This states that conditional onXi, civilian race is “asgood as randomly assigned” to encounters, and

officers encounter minority civilians in circumstancesthat are objectively no different from white encoun-ters. Part 4(a) stipulates that theobserved covariatesXinclude the confounder W in Figure 3(a). This as-sumption, while strong, has become more plausible inrecent years as administrative data sets have come toinclude a host of encounter attributes that mightlargely capture features observable to police whichcorrelate with suspect race and the potential for force.However, we note that this cannot be tested, evenindirectly, without data on nonstopped individuals.This assumption would be violated if neighborhoodswith high shares of minority residents were moreheavily policed and the analyst failed to adjust forneighborhood, for example, using fixed effects. Part4(b) implies that, for example, if police were moreheavily armed during minority-neighborhood patrolsand, hence, more likely to deploy force—representedby V in Figure 3(b)—then V must be included in X.Without Assumption 4, the range of possible racialeffects is so wide as to be uninformative. We also notethat every study claiming to estimate racial discrimi-nation using similar data makes this assumption, oftenimplicitly. Our aim in this study is not to assert theplausibility of treatment ignorability, but rather toclarify that deep problems remain even if this well-known issue is somehow solved.

Strong Assumptions

We now discuss further assumptions that are often leftimplicit in empirical studies of racially biased policingand that are implausible in many settings. We illustratethese scenarios graphically in Figure 3.

Assumption 5 (Mediator ignorability). Yi(d, m) vMi(0)|Di 5 d, Mi(1) 5 1, Xi.

This is related to but dramatically stronger thanAssumption 3, which merely requires that always-stopencounters are at least as severe in terms of observedcriminal behavior. In contrast, forAssumption5 tohold,violence rates in always-stop encounters must beidentical to those in observationally equivalent racialstops. We find mediator ignorability to be highly im-plausible in the context of policing. Subjective factorssuch as an officer’s suspicion and sense of threat-—depicted as U in Figure 3(c)—can not only lead toinvestigation (stopping) but also a heightened willing-ness touse force.Thesemediator-outcomeconfoundersmust be captured in X for this assumption to hold, butthey are notoriously difficult to capture in officers’ self-reported accounts. Even when proxies based on qual-itative officer narratives are available, strong legalincentives exist for distortion. Moreover, analysts mustbe sure to condition on all variables related to officermindset that are causallyupstreamof stops,while takingcare not to induce bias by conditioning on any that aredownstream.

Below,wedemonstrate that everyanalysis estimatinga racial effect using only data on stopped individualsimplicitly makes Assumption 5. We further note thatAssumptions 4(a), 4(b), and 5 are jointly covered by the


8

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

slightly stronger assumption of sequential ignorability(Imai et al. 2011).

Assumption6 (NoRacialStops).Mi(0)5Mi(1)|Mi51.

InFigure 3, this amounts to assuming away the arrowbetweenD andM. Equivalently, this assumption statesthat all reported encounters were of the always-stopkind, or that there is no racial discrimination in stops.Weshowbelowthat thisassumption is implicitlymadebyall studies claiming to identify the average treatmenteffect of race, conditional on a reported interaction.Naturally,when there is novariation inMi(0), then thisvariable is ignorable and Assumption 5 is alsosatisfied.

However, in view of an overwhelming body ofqualitative evidence and consistently massive quanti-tative differences in racial detainment rates acrossnumerous policing domains, we find racial bias inpolice stops too plausible to dismiss by assumption(Alexander 2010; Baumgartner et al. 2017; Glaser2014;Goel, Rao, and Shroff 2016; Lerman andWeaver2014). A raft of studies have also found that racialdisparities persist even after leading candidate omittedvariables, such as differential criminal activity acrossracial groups, are accounted for (Gelman, Fagan, andKiss 2007). While such patterns are not proof ofa causal relationship, we consider the possibility thatpolice exhibit anti-minority bias when engaging civil-ians strong enough to merit a careful considerationof the implications of that bias for the validity of studiesof racially biased policing.

Bias in the Naıve Estimator

In this section, we clear up several misunderstandingsabout the conventional estimator, which comparesreportedminority stops to reportedwhite stops (with orwithout covariates). First, we show that when there isany racial discrimination in detainment, selection onstops introduces unavoidable statistical bias in esti-mating the ATEM51, even when a perfect set of ob-served covariates renders race ignorablewith respect tothe potential mediator and outcomes. These resultsdirectly contradict prior assertions that “linear re-gression can recover the ‘race effect’ if race is ‘as goodasrandomly assigned,’ conditional on the covariates”(Fryer 2018, 2).The issue isnotoneofomittedvariables,but rather posttreatment conditioning. Second, we

clarify an important open question about the nature ofthis bias. Fryer (2018) comments in the context of se-lection into arrest data that, “It is unclear how to esti-mate the extent of such bias or how to address itstatistically” (5). Here, we derive the exact form of thisbias for the ATEM51 and the ATTM51; Online Ap-pendix A.3 does the same for the CDEM51. We showthat the bias is always negative, resulting in naıveestimates that downplay the extent of racially dis-criminatory police violence. Below, we develop in-formative nonparametric sharp bounds that adjust thenaıve estimates for the range of all possible selectionbias.

Prior work on race and policing uses estimators thatcompare average reported outcomes in majorityencounters to those in minority encounters. For sim-plicityof exposition,wepresent the special no-covariatecase; Appendices A.1–A.3 derive the bias of the naıveestimator with covariate adjustment. We first referreaders to equation (1), which expresses the naıve es-timator, D, in terms of stratum mean potential out-comes. We demonstrate that this commonly usedanalytic approach fails to recover any quantity of in-terest under plausible assumptions.We first show that itis biased for the ATEM51 and ATTM51 unless As-sumption 6 is true, and there are no racial stops. InOnline Appendix A.3, we show it is also biased for theCDEM51 unless Assumption 5 holds—that is, always-stopencounters are identical in violence rates to raciallydiscriminatory stops. As a result, the observed differ-ence in means fails to recover any known causalquantity without additional, and highly implausible,assumptions.

In Online Appendix A.1, we derive the bias of D whenit is used to estimate ATEM51 under the relativelyplausible Assumptions 1–4. This bias can be written asfollows:

E Dh i

�ATEM¼1

¼ E Yi 1; 1ð Þ � Yi 0; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �ð�E Yi 1; 1ð Þ � Yi 0; 0ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ �Þ3Pr Mi 0ð Þ ¼ 0jDi ¼ 1;Mi ¼ 1ð ÞPr Di ¼ 1jMi ¼ 1ð Þ� E Yi 1; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �ð�E Yi 1; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ �Þ3Pr Mi 0ð Þ ¼ 0jDi ¼ 1;Mi ¼ 1ð Þ: (6)

FIGURE 3. Violations of Assumptions

Notes:DAGs (a), (b), and (c), respectively, illustrate theviolationofAssumptions4(a), 4(b), and5.Note that thevariableUdepicted inDAG(c)is almost certain to exist in the policing context, and we do not advocate the use of Assumption 5.


9

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

Weoffer several comments on equation (6). The biasterm isguaranteed tobenegative, evenwithaperfect setof controls that render Di ignorable, as long as thereexist any racially discriminatory stops of minority civil-ians (or in an empirically falsified edge case).8 The firstterm in thebias expression relates toheterogeneity in theaverage treatment effect, or the extent to which Yi(1,Mi(1)) 2 Yi(0, Mi(0)) differs in expectation betweenalways-stop and racial-stop encounters—respectively,those withMi(1) 5 Mi(0) 5 1 and Mi(0) , Mi(1).

9 Biasarises because in the latter type of encounter, a whitecivilianwouldneverhavebeendetained in thefirst place,and hence force would never have been used—that is,E Yi 0; 0ð ÞjDi ¼ 1;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ � ¼ 0. Estimat-ing the average potential outcomes of this group usingstopped white civilians introduces unavoidable bias thatthe analyst cannot hope to eliminate simply by addingadditional covariates to the estimating model. The sec-ond term is related to the difference in baseline violencerates between always-stop encounters and racially dis-criminatory stops; this term also vanishes if there are noracial stops.

Can the naıve estimator be rehabilitated by simplyredefining the quantity of interest? In Online Ap-pendices A.2–A.3, we show that the answer is no. Thestructure of the bias when D is used to estimate theATTM51 is simpler but leads to substantively identicalconclusions: the naıve estimator is biased unless thereare no racial stops. We show that bias for the ATTM51is given by E D

h i� ATTM¼1 ¼ �E Yi 0; 1ð ÞjMi 1ð Þ ¼ 1;½

Mi 0ð Þ ¼ 1� Pr Mi 0ð Þ ¼ 0jMi 1ð Þ ¼ 1ð Þ. While the iden-tifying assumptions for the CDEM51 are slightlyweaker, they are nonetheless wholly implausible. Thesign of this bias for the ATTM51 and CDEM51 can alsobe shown to be negative underAssumption 1–4, exceptin the implausible edge cases described in the OnlineAppendix. Thus, regardless of the target quantity, theuse of the observed difference inmeanswill understatethe rate of racially discriminatory police violence. Inaddition, we emphasize that these derivations showthat statistical bias remains even after assuminga “complete” set of control variables that renders raceignorable. Posttreatment conditioning induces biasunless additional assumptions hold.

POTENTIAL SOLUTIONS

Howshould the analyst proceed in light of these results?We propose two approaches that eliminate the highlyimplausible assumptions outlined in the “StrongAssumptions” section, which are unstated but implicit

in prior work. We caution that these solutions still relyon theweaker assumptions described in the “NecessaryAssumptions” section, althoughwe argue that these areoften plausible in light of insights from extensive re-search on policing. Reasonable people can disagree onthe plausibility of various assumptions, but by statingthemexplicitly,we seek to advance empiricalwork in anarea which, at present, largely ignores such issuesaltogether.

In thefirst approach, we derive nonparametric sharpbounds representing the tightest possible range ofcausal effects that are consistentwith the reported data(Manski 1995). Again, for simplicity, we begin bypresenting bounds for the case in which treatment isunconditionally ignorable. To incorporate covariates,Online Appendix A.4 then describes a more generalformulation in which bounds are computed withinlevels of X, without functional form assumptions, andreaggregated; this latter formulation is also applicablewhen a correctly specified regression is used. Bothcases are demonstrated in a reanalysis of Fryer (2019)below.

A key limitation of the first proposed solution is thatall quantities of interest remain only partially identified.This is fundamentally a consequence of selection intopolice administrative records; point identification sim-ply cannot be achieved without either implausibleassumptions or additional data. To this end, we outlinean alternative approach that incorporates limited in-formation about the missing encounters (those that donot result in a stop). We show that with additionaldata—which in some cases are already being collectedby agencies—the prevalence of racially discriminatorystops and most racial effects of interest can be pointidentified. Following our applied example, we describea feasible research design based on this approach indetail.

Bounds on Effect of Race

Here, we derive large-sample nonparametric sharpbounds on the ATEM51 and ATTM51, focusing first onthe case inwhichAssumption 4 (treatment ignorability)holds without conditioning on further covariates.Proposition 1 quantifies and corrects for the range ofpossible bias induced by posttreatment conditioning,producing an informative interval of possible jointvalues for (1) the partially identified ATEM51 and (2)the proportion of racial stops among reported minorityencounters, r 5 Pr(Mi(0) 5 0|Di 5 1, Mi 5 1). Asequation (6) suggests, when there is no racial bias inpolice stops (r 5 0), these bounds collapse on the ob-served difference in means. We further demonstrate inFigure 4 that these bounds are highly informative whenr is known or can be credibly estimated from supple-mental data. When the prevalence of racially discrim-inatory detainment is unknown but a plausible rangecan be inferred from prior work, Figure 4 (discussedbelow) illustrates how this value can be used to assessthe behavior of the bounds much like a sensitivityparameter.

8 The edge case is if there is zero use of force against white civilians.This possibility is empirically falsifiable; in our application, we showthat it is far from the truth.To see that the bias is negative, observe thatE Yi 0; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �$E Yi 0; 0ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �,because the latter term is zero under Assumption 1. Together withAssumption 3, this signs the bias.9 Note that Mi(d) simplifies in equation (6), because it is constantwithin principal strata.


10

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

Proposition 1 (Nonparametric Sharp Bounds onATEM51). When Di is ignorable, nonparametric sharpbounds on (ATEM51, r) under Assumptions 1–4 arejointly given by

E Dh i

þ rE YijDi ¼ 0;Mi ¼ 1½ � 1� Pr Di ¼ 0jMi ¼ 1ð Þð Þ#ATEM¼1 #

E Dh i

þ r

1� rE YijDi ¼ 1;Mi ¼ 1½ � �max 0; 1þ 1

rE Yi jDi ¼ 1;Mi ¼ 1½ � � 1

r

��

3Pr Di ¼ 0jMi ¼ 1ð Þ þ rE Yi jDi ¼ 0;Mi ¼ 1½ � 1� Pr Di ¼ 0jMi ¼ 1ð Þð Þ;

where D ¼ YijDi ¼ 1;Mi ¼ 1� YijDi ¼ 0;Mi ¼ 1; and

the (ATTM51, r) must similarly satisfy

ATTM¼1 ¼ E Dh i

þ rE YijDi ¼ 0;Mi ¼ 1½ �To derive Proposition 1, we reformulate the bias in

terms of the unobserved joint distribution of (1) the useof force in minority encounters and (2) whether a mi-nority stop was racially discriminatory. FollowingKnoxet al. (2019), we then use Assumptions 1–4 and theFrechet inequalities, in conjunction with the observedmargins, to place sharp bounds on this joint distribution.These then imply sharp bounds on the ATEM51. Adetailed proof is given in Online Appendix A.4 for themore general case in which Di is ignorable only after

conditioning on prestop covariates. In this case, thelocal average treatment effect, ATEM51,x, is firstbounded by applying Proposition 1 within levels ofX toobtain local bounds, ATEM¼1;x;ATEM¼1;x

� . These are

then straightforwardly reaggregated to obtain boundson the conditional treatment effect among stops,

�xATEM¼1;x Pr Xi ¼ xjMi ¼ 1ð Þh

, �xATEM¼1;x Pr

Xi ¼ xjMi ¼ 1ð Þ�. In Online Appendix A.5, we outlinea Monte Carlo procedure for constructing confidenceintervals that asymptotically contain both the true lowerand upper bounds endpoints with probability 1 2 a.

Wenote that theproportionof raciallydiscriminatorystops may vary with X. However, when using thesebounds as a sensitivity analysis, we suggest using thesimplifying approximation of a constant r. This is be-cause without additional data beyond civilian race, theuse of force, or even prestop covariates, police ad-ministrative records alone are virtually uninformativeabout the range of r: any value in [0, 1) could producethe observed data,10 although Proposition 1 shows that

FIGURE 4. Bounds for Racially Discriminatory Use of Force, any Severity

Notes:These plots present the ATEM51 (ATTM51) for excess racial force, scaled by the number of stops (number ofminority stops) to obtainthe total number of civilians affected. The left panels consider the difference in the use of force if black civilians were substituted into eachencounter of any race (each black encounter), versus white civilians; the right panels show the same quantities for Hispanic civilians. Bluepoints (error bars) denote the naıve estimator (95% confidence intervals), which, conditional on the typical selection-on-observablesassumption, is unbiased for the ATEM51 if there are no discriminatory stops of minority civilians (zero on the x-axis). The dark (light) regionsrepresent the range of possible values (95%CI) for (1) the ATEM51 and (2) the proportion of discriminatory stops in reported data jointly, perProposition 1. The vertical line corresponds to an estimate of the proportion of discriminatory stops from Gelman, Fagan, and Kiss (2007),suggestingaplausiblevalue for thisunobservableparameter.The top (bottom)panelspresentboundsbasedonamodelwithnocontrols (themain specification, adjusting for a wide range of covariates).

10 If all stops were racially discriminatory, then we would observe nowhite stops.


11

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

each possible r value has differing implications for theset of possible racial effects.

Point Identification of the ATE GivenAdditional Data

The ATE is point identified with the collection of onlytwo additional numbers—the count of totalminority andwhite encounters, within levels of X where applicable.Below, we propose an alternative design in which thesedataare collected frompassive instruments suchas trafficcameras or police body-worn cameras. Where sucha design is infeasible (e.g., where traffic cameras coveronly a subset of the jurisdiction under study), pointidentification can also be achieved by linking incompletedata on both reported and unreported encounters topolice administrative records under mild assumptions.

Proposition 2 (Point Identification of ATE). UnderAssumptions 1–4, the ATE is identified by a weightedcombination of the observed racial means,

E YijDi ¼ 1;Mi Dið Þ ¼ 1½ �Pr Mi ¼ 1jDi ¼ 1ð Þ

�E YijDi ¼ 0;Mi Dið Þ ¼ 1½ �Pr Mi ¼ 1jDi ¼ 0ð Þ:Intuitively, the proof breaks the ATE into the size-

weighted sum of principal effects among always-stopand racial-stopencounters (theprincipal effect innever-stop encounters is known to be zero). Crucially, theadditional data on nonstops allows the researcher toconstruct a contingency table representing the jointdistributionof race anddetainment.Aspart of theproofin Online Appendix A.6, we show that this can be usedto straightforwardly recover the size of each principalstratum under Assumptions 2 and 4(a). However, itremains impossible to determine whether any in-dividual stop was racially discriminatory.

When total encounter numbers are unknown, thisjoint distribution can nonetheless be estimated byattempting to link a representative sample of allencounters (e.g., using timestamps from traffic cameras)against administrative records (e.g., license plate data-bases); those that are unlinkable can be presumed un-reported. After recovering principal strata sizes, we thenproceed by noting that minority outcomes in reportedadministrative data are in fact a mixture of Yi(1, Mi(1))from both always-stop and racial-stop strata in preciselythe required proportions; that reported white outcomescorrespond toYi(0,Mi(0)) from the always-stop stratum;and that Yi(0, Mi(0)) is known to be zero among theracial-stop stratum under Assumption 1. From this, theATE can then be reconstructed.

REANALYSIS OF FRYER (2019)

We have shown that the standard approach to esti-mating racial bias in police data will always un-derestimate its degree, so long as police discriminateagainst minorities when choosing whom to investigate.To explore the magnitude of this statistical bias in anapplied setting, we replicate and extend a section ofFryer (2019) which reports estimates of racial

discrimination in theapplicationof sublethal forceusingthe NYPD’s “Stop, Question and Frisk” (SQF) data-base (2003–13).11 The NYPD data contain roughly 5million records of pedestrian stops, the vast majority ofwhich are of nonwhite suspects. The data record the useof varying levels of force, including laying hands ona suspect, handcuffing a suspect, pointing a weapon ata suspect, and pepper spraying a suspect, among others.The original analysis in Fryer (2019) utilized the simplenaıve approach of equation (1) to predict the severity offorce applied by police, as well as covariate-adjustednaıve models analogous to those we consider in Ap-pendices A.1–A.3. Specifically, the study presenteda logistic regression of police force on suspect race,along with additional specifications that added a host ofcontrol variables such as precinctfixed effects, to renderthe ignorability assumptions more plausible. We re-produce two of these models—the baseline specifica-tion including only racial group indicators, along withthe richer “main specification” (21)12—to estimate theconditional expectations in Proposition 1. For compa-rability to the original analysis, we take these models atface value, setting aside issues of potential model mis-specification and the ignorability of civilian race.

Oneanalysis inFryer (2019) considered theuseof anyforce against a suspect, while subsequent analyses ex-amined force exceeding various severity thresholds,such as a binary outcome for “at least use of handcuffs.”Using the coding rules and estimation procedures inFryer (2019), we were able to closely replicate thepublished results. However, in doing so, we discoveredthis procedure involved an unconventional and in-advisable step in which all observations with nonzeroforce below the threshold of interest were dropped—asevere case of selection on the dependent variable. Inthe most extreme case, in the analysis of police batonand pepper spray use, this resulted in the discarding ofall encounters in which only lower levels of force wereused, a set that comprised 21.5%of all observations and99.8% of all uses of force. To present the most de-fensible results possible, for these outcomes, we departfrom the analysis in Fryer (2019) and revise the pro-cedure so that all encounters with a level of force at oraboveagiven thresholdareassignedanoutcomeof1 (asbefore) and all other encounters are assigned a value of0 (including those with lower levels of force, which arenow retained). Section B.1 in the Online Appendixcontains an extended discussion of the issue;

11 Because the replicationmaterial for Fryer (2019) was not posted atthe time of analysis, these data were obtained directly from https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page.12 The main specification in Fryer (2019) consists of a logistic re-gression of a force outcomeon race dummies plus controls for gender,a quadratic in age, whether the stopwas indoors or outdoors, whetherthe stop took place during the daytime, whether the stop took place ina high crime area, during a high crime time, or in a high crime area ata high crime time, whether the officer was in uniform, civilian ID type,whether others were stopped during the interaction, controls for ci-vilian behavior, and precinct and year fixed effects. Fryer (2019) alsonotes that “missing indicators for all variables” are included ascovariates. We omit these indicators as it was unclear how they werecoded. See Figure 1 caption in Fryer (2019).


12

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039

https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page

https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page



https://doi.org/10.1017/S0003055420000039

a comparison of the original, replicated, and correctedresults; and a demonstration of the serious implicationsfor statistical significance of the original estimates.

Basedon thediscussion inbothFryer (2018)andFryer(2019), we interpret the published results as estimates ofthe ATEM51: “the difference inY that can be attributedto an individual’s race,” (Fryer 2018, 2), conditional ona recorded interactionwith police (i.e., conditional onMi5 1). We note that of the other quantities considered inthis study, the unconditional ATE cannot be estimatedwithout information on unreported encounters, and theCDEM51 cannot be computed without strong assump-tions aboutpotential outcomes that canneverbe realizedin observational settings. For these reasons, we focus onthe ATEM51 and ATTM51 in this reanalysis.13

Figure 4 depicts bounds on the ATEM51 when thebinary outcome is any use of force, including the lowestrecorded value of physically handling a civilian.14 Im-portantly, this specific outcome is unaffected by theoutcome coding issue discussed above. (In Figures B.2and B.3, we present additional bounds for varying forcethresholds, up to whether a baton or pepper spray wasused.) The plots also display estimates of the bias-cor-rected ATTM51 (dashed lines). As the plots show, therange of possible ATEM51 and ATTM51 values variesstrongly with the severity of discrimination in stops.

In equation (6), we demonstrated that the use of thenaıve estimator implied the substantively implausibleassumption that police never discriminate in stops (i.e.,r 5 0). Similarly, contextual information also suggeststhat some depicted values of r are implausibly large. Tounderstand the rangeof empirically plausible values,weturn to two prior studies that use very different analyticapproaches to shed light on the degree of racial bias inthe decision to detain civilians. Using the SQF data andcontrolling for precinct, suspected crime, andprior localarrest rates by race, Gelman, Fagan, and Kiss (2007)produce estimates that—by our calculations—imply32% of black-civilian stops made by the NYPD couldnot be explained even by differential criminality

between racial groups of suspects, as proxied by priorarrest rates.15 Their analyses are run separately byprecinct and crime type; for simplicity, we take theweighted average of racial-stop proportions. This an-alytic approach most likely underestimates the pro-portion of racially discriminatory stops—the number ofprior arrests in a precinct and racial group is not a directmeasure of criminality, but is itself likely contaminatedby discrimination in previous detainments and arrests.We, therefore, regard the value of r implied byGelman,Fagan, and Kiss (2007) as conservative.

Goel, Rao, and Shroff (2016) take an entirely dif-ferent tack based on a comparison of “hit rates,” or theshare of stops that produced evidence of the suspectedcrime for which the civilian was detained—a variant ofan “outcome test” for discrimination (Anwar and Fang2006; Knowles, Perisco, and Todd 2001). Using a flexi-ble logistic regression to adjust for a vast array ofindicators visible to officers prestop, the study showsthat white hit rates exceeded those of “similarly situ-ated” black civilians. We show in our Online AppendixA.7 that the difference in hit rates implies a minimumproportion of racial stops and, therefore, also impliesa conservative estimate of r.16 The correspondingvalues of r from these two studies are 0.32 and a lowerbound of 0.34, respectively, when considering blackcivilians.While any estimate of this difficult-to-measurequantity from police data is sure to be imperfect, thefact that two independent estimates of racial bias instopping so closely comport with one another, despiteusing wholly different analytical approaches, gives ussome empirical justification for narrowing the range ofplausible racial effects in the use-of-force analysis. Wenote that the research design presented in the “Rec-ommendations for Future Research” section belowoffers an alternative approach for obtaining betterestimates of racially discriminatory stopping.

Figure 4 demonstrates that strong negative bias in thenaıve estimator paints a wildly misleading portrait ofpolice use of force. We turn first to estimates of theATEM51 using themain specification, which adjusts fora battery of covariates. The naıve estimator (whichassumes no racial bias in police stops) suggests thatencounterswithblack (Hispanic) suspects arepredictedto exhibit an additional 3.9 (0.4) instances of

13 We note that in Proposition 1 we consider binary minority status,whereas the specifications in Fryer (2019) take civilian race as a cat-egorical variable. (However, only two races are considered for anyparticular ATEM51 estimate: black versus white, or Hispanic versuswhite). To accommodate this, in reported black ATEM51 andATTM51 results, we use a slight generalization in which white civilianencounters are representedwithDi5 0, black encounterswithDi5 1,and subsequentminority groupswithDi52, 3 and soon.Proposition1and its covariate-adjusted counterpart inOnlineAppendixAcan thenbe applied directly. The chief implication of this formulation is (1)a different average value for Yi(d, 1) is estimated for each minoritygroup, and (2) that all minority groups are implicitly assumed to beracially stopped at the same rate, although this can easily be relaxed.(The same procedure is applied when theminority group of interest isHispanic civilians, after setting the Hispanic indicator to Di 5 1.) Toassess whether results were affected by this, in Online Appendix B.4,we conduct two additional analyses after first subsetting to black andwhite encounters, and Hispanic and white encounters, respectively.As the results makes clear, conclusions are virtually identical apartfrom differences that stem from the size of the subsetted data.14 Note that we treat stops in which “other”was denoted as the use offorce category as zero force, since the vast majority of these cases didnot even not involve officers even laying hands on suspects.

15 Based on SQF data from 1998–99, Gelman, Fagan, and Kiss (2007)fit hierarchical Poisson models for the number of stops (by suspectedcrime, precinct, and race) per arrest in the previous year, which theymodel as emþarace within groups of stops defined by the suspectedcharges (violent crimes, weapons crimes, property crimes, and drugcrimes) and precinct racial composition (,10%, 10–40%, and.40%black).Within each group, the excess black stopping rate is then givenby 1� eawhite�ablack . We approximate the size of each group by multi-plying the reported marginal probabilities of stop types (25%, 44%,20%, and 11%, respectively) and composition groups (“each… rep-resents roughly 1/3 of the precincts”), since the joint distribution is notreported. The r 5 0.32 estimate is then produced by taking the size-weighted average of subgroup excess black stopping rates. The cor-responding estimate of r for Hispanic civilians implied by Gelman,Fagan, and Kiss (2007) is slightly higher, at 0.35.16 Using SQF data from 2008–12, Goel, Rao, and Shroff (2016) es-timate ahit rate of 3.8%forwhite suspects and2.5%forblack suspects(379), which implies that r is at least 0.34.


13

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

handcuffing per 1,000 encounters, compared with thesame encounters had they involved white civilians. Wethen employ the most conservative racial stopping es-timate, denoted by the vertical line in the figure, togenerate bounds on the true race effect. Our bias-cor-rected results show the true effect is at least as high as15.5 (13.0)—meaning that the conventional approachunderestimates discriminatory force by a factor of atleast 4 (32).

To characterize bias in estimates of the ATTM51, weagain use the conservative racial stopping estimate fromGelman, Fagan, and Kiss (2007) to correct the naıveestimate. Again, the naıve approach substantiallyunderstates racially discriminatory police violence,suggesting that there were 75,000 instances in whichpolice laid hands on black and Hispanic civilians, butwould not have done so had those individuals beenwhite. Our bias-corrected estimate shows the truenumber is approximately 307,000, meaning the naıveapproach masks 232,000 such incidents. Similarly, thenaıve approach indicates roughly 3,400 racially dis-criminatory instances in which officers pointeda weapon at a black or Hispanic civilian, whereas thebias-corrected ATTM51 shows the true number is al-most five times as large.

To see how this statistical bias affects estimates fordifferent levels of force, Table 1 presents naıve esti-mates alongside ATEM51 bounds for excess force per1,000 black and Hispanic encounters across the fullspectrum of police actions—ranging from physicalhandling of a civilian to the use of pepper spray ora baton—again using the conservative racial-stop esti-mate fromGelman, Fagan, andKiss (2007) to apply ourbias correction. The results again show that the

traditional approach substantially understates the de-gree of racial bias in police use of force. Our results alsoinclude numerous cases in which downward bias pro-duces the illusion of no race effect. For example, whilethe approach in Fryer (2019) implies a statistically in-significant 2.4 instances per 1,000 encounters of pushingHispanic suspects to a wall due to suspect race, ourrevised estimate shows the true number is at least26—eleven times larger. We can also quantify thenumber of masked instances of racially discriminatoryuses of force as a percentage of all uses of force againstminorities, displayed in Figure 5. In the period we ex-amine, black andHispanic civilians experienced force atthe hands of police 779,894 times.Using the approach inFryer (2019), one would conclude that about 10%would not have occurred had those civilians beenwhite.Using our bias-corrected approach, we find that in fact39%were discriminatory. These underestimates persistacross all force threshold analyses.17

RECOMMENDATIONS FORFUTURE RESEARCH

The analysis above clarifies whether and when esti-mates of racial bias in police behavior identify causalquantities, shedding light on how traditional estimationapproaches that fail to account for posttreatment con-ditioning can inadvertently mask racially biased polic-ing. Our results suggest the body of evidence on this

FIGURE 5. Estimated Number of Racially Discriminatory Uses of Force against Black and HispanicCivilians, Divided by Total Observed Uses of Force among Those Groups Using Naıve (Red Dot) andBias-Corrected (Blue Triangle) Estimators of the ATTM51

Notes: In some cases, the naıve approach returns negative estimates, indicating that more uses of force would have occurred had thecivilians been white. The bias-corrected estimates show the naıve estimates substantially underestimate the pervasiveness ofanti-minorityracial bias in police violence.

17 These estimates were generated by computing the ATTM51 withcovariate adjustment.


14

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

topic that relies on police administrative data may belargely uninformative or even misleading. While ourbias-correction and bounding techniques are an im-provement, they still rely on assumptions that manyanalysts may not be willing to entertain. Some of theseassumptions, such as conditional treatment ignorability,are unavoidable. But others can be sidestepped orweakened through the use of research designs thatpreempt the problem of posttreatment conditioning. Inwhat follows, we detail a feasible research design thataddresses these concerns.

To estimate the effect of suspect race on poststoppolice behavior while avoiding the concerns outlinedabove, we describe a feasible study of police-civilianinteractions during traffic stops. A key advantage oftraffic studies is thatmuchof thedataneeded to improveresearch are already collected passively by law

enforcement agencies across the United States in anautomated fashion via highway cameras. We note thatbefore the advent of this technology, data on un-reported police-civilian interactions had to bemanuallycollected by researchers accompanying patrol officerson their shifts (Allen 1982; Smith,Visher, andDavidson1984), a labor-intensive strategy highly vulnerable toresearcher demand effects (Orne 1962).

Recall that akeyproblem in the typical studyofpoliceadministrative data is the unobservability of thoseencounters that do not generate police reports. How-ever, given the prevalence of highway speed camerasacross police jurisdictions, it is entirely feasible to collectdata on every passing car (or a random sample ofpassing cars), whether or not police pulled the car overand recorded the stop. This mode of data collection hasalready been utilized in prior work (Kocieniewski 2002;

TABLE 1. Average Treatment Effect among Stops (ATEM51), by Severity of Force and Minority Group

Minimum force

ATEM51 for encounters with black civilians (vs. white)

No covariates Full specification

Bounds Naıve Bounds Naıve

Use of hands (112.66, 124.59) 61.69 (86.95, 96.70) 23.46(84.6, 151.84) (32.89, 90.63) (81.54, 102.14) (16.23, 30.54)

Push to wall (24.15, 27.75) 4.20 (26.47, 30.20) 6.66(15.50, 37.35) (25.29, 14.02) (24.24, 32.39) (3.67, 9.53)

Use of handcuffs (14.60, 16.92) 1.32 (16.59, 19.05) 3.95(9.45, 22.61) (24.83, 7.53) (15.10, 20.57) (1.95, 5.90)

Draw weapon (4.52, 5.14) 1.26 (4.71, 5.34) 1.46(3.13, 6.67) (20.33, 2.83) (4.20, 5.86) (0.77, 2.14)

Push to ground (4.04, 4.58) 1.22 (4.10, 4.64) 1.24(2.79, 5.97) (20.21, 2.66) (3.65, 5.07) (0.64, 1.80)

Point weapon (1.49, 1.70) 0.36 (1.63, 1.86) 0.55(0.96, 2.29) (20.29, 1.00) (1.36, 2.12) (0.18, 0.89)

Baton or pepper spray (0.17, 0.19) 0.08 (0.17, 0.19) 0.07(0.10, 0.26) (20.01, 0.15) (0.12, 0.24) (0.00, 0.14)

Minimum force

ATEM51 for encounters with Hispanic civilians (vs. white)

No covariates Full specification

Bounds Naıve Bounds Naıve

Use of hands (115.44, 127.53) 64.48 (78.93, 88.31) 15.44(88.94, 155.96) (37.06, 92.91) (74.70, 92.65) (9.60, 21.09)

Push to wall (26.41, 30.14) 6.46 (22.24, 25.75) 2.44(19.54, 37.79) (21.12, 14.26) (20.21, 27.80) (20.28, 5.08)

Use of handcuffs (12.54, 14.76) 20.74 (13.05, 15.31) 0.40(9.10, 18.24) (25.27, 3.57) (11.71, 16.64) (21.40, 2.1)

Draw weapon (3.42, 3.98) 0.16 (3.12, 3.66) 20.14(2.41, 5.08) (21.04, 1.33) (2.63, 4.16) (20.77, 0.49)

Push to ground (3.11, 3.60) 0.29 (2.71, 3.18) 20.14(2.18, 4.61) (20.83, 1.37) (2.27, 3.63) (20.71, 0.41)

Point weapon (0.73, 0.90) 20.41 (0.81, 0.98) 20.28(0.32, 1.29) (20.94, 0.08) (0.54, 1.26) (20.64, 0.07)

Baton or pepper spray (0.05, 0.06) 20.05 (0.05, 0.07) 20.05(20.01, 0.12) (20.13, 0.02) (0.00, 0.12) (20.12, 0.02)

Note:Excessuseof force usedagainstminority civilians (versuswhite civilians) per 1,000encounters.Bounds intervals indicate the rangeofpossibleATEM51 valueswhen the unknownproportion of discriminatory stops is approximatedwith the conservative estimate fromGelman,Fagan, and Kiss (2007). Estimates are bolded, and 95% confidence intervals are italicized.


15

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

Lange, Johnson, and Voas 2005), though in thosestudies, camera data on individual motorists were notlinked to administrative data on policing outcomes, aswe propose below.

Givena large randomsampleofpassing cars capturedby highway speed cameras, analysts could use video orphotographic records to document license plate num-bers that allow for a merge with other administrativedata sets containing information on the registrant’shome neighborhood, whether each car went on to bestopped by nearby police at a proximate time, whethera summons was issued, and whether the encounterescalated to include a search or the use of force. As withall causal analyses of observational data, analysts muststill make some version of Assumption 4(b)—notreatment-outcome confounding conditional on ob-servable covariates—but in this case, the standard“treatment selection on observables” plausibly holdsbecause virtually all prestop data available to an officerare in fact observable to the analyst. Using camerafootage merged with administrative records, analystscould credibly measure this “complete” set of controlvariables.18 These factors would include not only therace, age, gender, and registered neighborhood of the

driver but also themake, color, and condition of the car,along with weather and driving speed.

Given this set of covariates, researchers could cred-ibly estimate the ATE for various outcomes, includingsearching, ticketing, and the use of force, by comparingthe rates of outcomes between racial minority andmajority motorists, regardless of whether they werestopped by police, conditional on X. The ATTM51 issimilarly point identified because the proportion ofracial stops can be calculated and used to correct esti-mates. However, the ATEM51 remains partially iden-tified—the quantity can be bounded, as we show above,but not precisely estimated. And as Figure 6 makesclear, the CDEM51 remains fundamentally unidentifi-able without covariates that make Assumption 5plausible, such as controls for officer temperament thatare specific to some stops but not others (i.e., time-varying), which likely influences both stopping deci-sions and subsequent treatment of civilians.

CONCLUSION

With the release of large and granular data onpolice-civilian interactions, many researchers havefocused on estimating whether police exhibit racialbias in their treatment of civilians. Though somestudieshaveacknowledged the threat of posttreatmentbias in this setting (Fryer 2018), the issue has not beenadequately addressed, and studies in this area have leftambiguous which causal quantities are being approx-imated and the degree to which racial bias may beobscured by traditional estimation strategies. Giventhe policy relevance of this topic and the degree ofselection bias inherent to these analyses, we believesocial scientists need to devote substantial effort todevelop research designs that can sidestep the threat ofposttreatment conditioning rather than proceeding inthe face of this threat and simply hoping for the best.

In this study, we clarify the statistical problems in theuse of police administrative data in isolation to studyracial bias. We offer bias-correction and boundingprocedures for scholars analyzing these data, alongwithan improved research design that can avoid posttreat-ment conditioning altogether. Our results can informthe study of racial discrimination in a host of othersettings beyond law enforcement. And thoughwe focuson a case of racial bias in theUnited States, these resultsalso speak to a rich literature on racial discriminationoutside the U.S. context (e.g., Bruce-Jones 2015; Cano2010). Our identifying assumptions may also be usefulfor researchers seeking toaddressbiases stemming fromposttreatment conditioning more generally, beyondstudies of discrimination.

Whilewe are optimistic about alternative designs andestimation strategies, we are under no illusions thateliminating this particular source of bias will removeothers. Our research design suggestions may also limitthe outcomes that are feasible to study. For example,rare events such as shootings may or may not occurduring the observation periods proposed,meaning onlylower level uses of force or sanctioning can be studied in

FIGURE 6. Traffic Stop Design

Notes: The DAG illustrates potential back-door paths for stops(throughW, e.g.,heavilypolicedneighborhoods)and for theuseofforce (through V, e.g., car registrant has warrant for arrest) thatmay correlate with the presence of minority drivers. These areblocked (boxed) by conditioning on prestop variables, includinglicense plates as well as administrative records that can be linkedthrough them. Many mediator-outcome confounders (U) cannotbe blocked but do not pose a threat to inference for the ATE orATEM51.

18 This approach is akin to the design ofHainmueller andHangartner(2013), another rare instance in which the analyst could claim tomeasure all relevant covariates in an observational setting. In thatstudy, citizens made judgments about individuals applying for citi-zenship in Switzerland. Because all information on potential citizenswas contained on a flier distributed by the government, the authorscould credibly account for all possible factors that contributed to theaverage citizen’s judgment of applicants.


16

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039



https://doi.org/10.1017/S0003055420000039

some cases. Our recommendations, therefore, placeemphasis onbias reduction over latitude in the selectionof research questions. But given the ease with whichfaulty conclusions can be reached as a result of the race-based selection we highlight, narrowing the scope ofresearch to generate more reliable estimates may bepreferable, especially because policy reforms couldhinge on the results of studies in this area. Put differ-ently, because of the pitfalls we highlight above, it is notclear that studies of rare phenomena that lack a sounddesign are generating usable knowledge anyway, so thistrade-off in scope may be of only marginal concern(Samii 2016).

Regardless of which approach scholars pursue, thisarticle highlights the need for further careful researchinto the first stage of police-civilian interactions—thatis, the process by which officers decide whether or notto stop and investigate an individual for a crime. Thiseffort is necessary not only to further our scholarlyunderstanding of police-civilian interactions but alsoto craft effective policy reforms. If racial bias is con-centrated in the initial stage of contact, reforms fo-cused on reducing unnecessary police-civilianinteractions may be most effective at curbing raciallydiscriminatory police violence. On the other hand, ifthere exists more significant bias in the ultimate de-cision to use force, substantial improvements mayrequire a wholly different reform strategy. Withoutserious consideration of the role of race in each stageofthe complex police-civilian interactions under study,the benefits of data-driven reforms will be stunted, aswill our collective understanding of the politics ofpolicing.

SUPPLEMENTARY MATERIAL

To view supplementary material for this article, pleasevisit https://doi.org/10.1017/S0003055420000039.

Replication materials can be found on Dataverse at:https://doi.org/10.7910/DVN/KFQOCV.

REFERENCES

Acharya, Avidit, Matthew Blackwell, and Maya Sen. 2016.“ExplainingCausalFindingswithoutBias:Detecting andAssessingDirect Effects.” Biometrics 110 (3): 512–29.

Alesina, Alberto, andEliana La Ferrara. 2014. “ATest of Racial Biasin Capital Sentencing.” The American Economic Review 104 (11):3397–433.

Alexander,Michelle. 2010.The New Jim Crow:Mass Incarceration inthe Age of Colorblindness. New York: The New Press.

Allen, David. 1982. “Police Supervision on the Street: AnAnalysis ofSupervisor/Officer Interaction During the Shift.” Journal ofCriminal Justice 10 (2): 91–109.

Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996.“Identification of Causal Effects Using Instrumental Variables.”Journal of the American Statistical Association 91 (434): 444–55.

Angrist, Joshua D., and Jorn-Steffen Pischke. 2008.Mostly HarmlessEconometrics: An Empiricist’s Companion. Princeton, NJ:Princeton University Press.

Antonovics, Kate, andBrianG.Knight. 2009. “ANewLook at RacialProfiling: Evidence from the Boston Police Department.” TheReview of Economics and Statistics 91 (1): 163–77.

Anwar, Shamena, and Hanming Fang. 2006. “AnAlternative Test ofRacial Prejudice in Motor Vehicle Searches: Theory and Evi-dence.” The Review of Economic Studies 96 (1): 127–51.

Arnold,David,WillDobbie,andCrystalS.Yang.2018.“RacialBias inBailDecisions.”Quarterly Journal of Economics 133 (4): 1885–932.

Arrow, Kenneth J. 1972. “Models of Job Discrimination.” In RacialDiscrimination in Economic Life, ed. Anthony Pascal. Lexington,MA: D.C. Heath, 83–102.

Arrow, Kenneth J. 1998. “What Has Economics to Say about RacialDiscrimination?” The Journal of Economic Perspectives 12 (2):91–100.

AtibaGoff,Phillip, andKimberlyBarsamianKahn.2012.“RacialBiasinPolicing:WhyWeKnowLessThanWeShould.”Social IssuesandPolicy Review 6 (1): 177–210.

Ayres, Ian. 2002. “Outcome Tests of Racial Disparities in PolicePractices.” Justice Research and Policy 4 (1–2): 131–42.

Balke, Alexander, and Judea Pearl. 1997. “Bounds on TreatmentEffects from Studies with Imperfect Compliance.” Journal of theAmerican Statistical Association 92 (439): 1171–6.

Baumgartner, Frank R., Derek A. Epp, Kelsey Shoub, and BayardLove. 2017. “Targeting YoungMen of Color for Search andArrestDuring Traffic Stops: Evidence from North Carolina, 2002–2013.”Politics, Groups, and Identities 5 (1): 107–31.

Becker, Gary. 1971. The Economics of Discrimination. Chicago:University of Chicago Press.

Blackwell, Matthew. 2013. “A Framework for Dynamic Causal In-ference in Political Science.” American Journal of Political Science57 (2): 504–20.

Brehm, John, andScottGates. 1999.Working, Shirking, andSabotage:Bureaucratic Response to a Democratic Public. Ann Arbor, MI:University of Michigan Press.

Bruce-Jones, Eddie. 2015. “German Policing at the Intersection:Race,Gender,Migrant Status andMentalHealth.”Race&Class 56(3): 36–49.

Burch, Traci. 2013. Trading Democracy for Justice: Criminal Con-victions and the Decline of Neighborhood Political Participation.University of Chicago Press.

Cano, Ignacio. 2010. “Racial Bias in Police Use of Lethal Force inBrazil.” Police Practice and Research: International Journal 11 (1):31–43.

Cohen,Elisha,AnnaGunderson,KaylynJackson,PaulZachary,TomS. Clark, Adam N. Glynn, and Michael Leo Owens. 2017. “DoOfficer-Involved Shootings Reduce Citizen Contact with Gov-ernment?” The Journal of Politics 81 (3): 1111–23.

Eberhardt, Jennifer, PhillipAtibaGoff, Valerie J. Purdie, andPaulG.Davies. 2004. “Seeing Black: Race, crime, and Visual Processing.”Journal of Personality and Social Psychology 87 (6): 876–93.

Edwards, Frank, Hedwig Lee, and Michael Esposito. 2019. “Risk ofBeing Killed by Police Use of Force in the United States by Age,Race–Ethnicity, and Sex.”Proceedings of the National Academy ofSciences 116 (34): 16793–8.

Elwert, Felix, andChristopherWinship. 2014.“EndogenousSelectionBias: TheProblemofConditioning onaColliderVariable.”AnnualReview of Sociology 40: 31–53.

Fisher, Marc, and Peter Hermann. 2015, June 8. “Did the Mckinney,Texas, Police Officer Know He Was Being Recorded?” TheWashington Post.

Frangakis, Constantine E., and Donald B. Rubin. 2002. “PrincipalStratification in Causal Inference.” Biometrics 58 (1): 21–9.

Fridell, Lorie A. 2017. “Explaining the Disparity in Results AcrossStudies Assessing Racial Disparity in Police Use of Force: A Re-searchNote.”The Journal of Economic Perspectives 42 (3): 502–13.

Fryer, RolandG. 2018. “ReconcilingResults onRacial Differences inPolice Shootings.” The American Economic Review 108: 228–33.

Fryer, RolandG. 2019. “AnEmpirical Analysis of Racial Differencesin Police Use of Force.” Journal of Political Economy 127 (3):1210–61.

Gelman,Andrew, JeffreyFagan, andAlexKiss. 2007.“AnAnalysis ofthe New York City Police Department’s ‘Stop-and-Frisk’ Policy inthe Context of Claims of Racial Bias.” Journal of the AmericanStatistical Association 102 (429): 813–23.

Glaser, Jack. 2014. Suspect Race: Causes and Consequences of RacialProfiling. New York: Oxford University Press.


17

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039

https://doi.org/10.1017/S0003055420000039




https://doi.org/10.1017/S0003055420000039

Goel, Sharad, Justin M. Rao, and Ravi Shroff. 2016. “Precinct orPrejudice? Understanding Racial Disparities in New York City’sStop-and-Frisk Policy.”Annals of Applied Statistics 10 (1): 365–94.

Gottschalk, Marie. 2008. “Hiding in Plain Sight.” Annual Review ofPolitical Science 11 (1): 235–60.

Greiner, James D., and Donald B. Rubin. 2011. “Causal Effects ofPerceived Immutable Characteristics.” The Review of Economicsand Statistics 93 (4): 775–85.

Grogger, Jeffrey, and Greg Ridgeway. 2006. “Testing for RacialProfiling in Traffic Stops from Behind a Veil of Darkness.” Journalof the American Statistical Association 101 (475): 878–87.

Hainmueller, Jens, and Dominik Hangartner. 2013. “Who Getsa Swiss Passport? A Natural Experiment in Immigrant Discrimi-nation.” American Political Science Review 107 (1): 159–87.

Harvey, Anna, and Murat Mungan. 2019. “Policing for Profit: ThePolitical Economy of Law Enforcement.” Working Paper. https://s18798.pcdn.co/annaharvey/wp-content/uploads/sites/6417/2019/06/Policing_For_Profit_2019.pdf.

Heckman, James J. 1979. “Sample Selection Bias as a SpecificationError.” Econometrica 47 (1): 153–61.

Hernan,Miguel, SoniaHernandez-Diaz, and JamesRobins. 2004. “AStructural Approach to Selection Bias.” Epidemiology 15 (5):615–25.

Hernan, Miguel A. 2016. “Does Water Kill? A Call for Less CasualCausal Inferences.” Annals of Epidemiology 26 (10): 674–80.

Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal ofthe American Statistical Association 81 (396): 945–60.

Imai, Kosuke, Luke Keele, Dustin Tingley, and Teppei Yamamoto.2011. “Unpacking the Black Box of Causality: Learning aboutCausal Mechanisms from Experimental and Observational Stud-ies.” American Political Science Review 105 (4): 765–89.

Iyengar, Shanto. 1994. Is Anyone Responsible?: How TelevisionFrames Political Issues. Chicago: University of Chicago Press.

Johnson, David J., Trevor Tress, Nicole Burkel, Carley Taylor, andJoseph Cesario. 2019. “Officer Characteristics and Racial Dis-parities in Fatal Officer-Involved Shootings.” Proceedings of theNationalAcademyof Sciences116 (32): 15877–82. PublishedOnlineAhead of Print July 22, 2019. https://www.pnas.org/content/early/2019/07/16/1903856116.

Key, Valdimer Orlando. 1949. Southern Politics in State and Nation.New York: Knopf.

Knowles, J., N. Perisco, and P. Todd. 2001. “Racial Bias in MotorVehicle Searches: Theory and Evidence.” Journal of PoliticalEconomy 109 (1): 203–29.

Knox, Dean, and Jonathan Mummolo. 2020. “Making Inferencesabout Racial Disparities in Police Violence.” Proceedings of theNational Academy of Sciences 117 (3): 1261–2. https://doi.org/10.1073/pnas.1919418117.

Knox, Dean, Teppei Yamamoto, Matthew A. Baum, and Adam J.Berinsky. 2019. “Design, Identification, andSensitivityAnalysis forPatient Preference Trials.” Journal of the American Statistical As-sociation 114 (528): 1532–46.

Kocieniewski, David. 2002, March 21. Study Suggests Racial Gap inSpeedinginNewJersey.TheNewYorkTimes.https://www.nytimes.com/2002/03/21/nyregion/study-suggests-racial-gap-in-speeding-in-new-jersey.html.

Lange, James E., Mark B. Johnson, and Robert B. Voas. 2005.“Testing the Racial Profiling Hypothesis for Seemingly DisparateTraffic Stops on theNew JerseyTurnpike.” JusticeQuarterly 22 (2):193–223.

Lasswell, Harold D. 1936. Politics: WhoGets What, When, How. NewYork: Whittlesey House.

Lee, David S. 2009. “Training, Wages, and Sample Selection: Esti-mating Sharp Bounds on Treatment Effects.” The Review ofEconomic Studies 76 (3): 1071–102.

Lerman, Amy, and Vesla Weaver. 2014. Arresting Citizenship: TheDemocratic Consequences of American Crime Control. Chicago:University of Chicago Press.

Lipsky, Michael. 1980. Street-Level Bureaucracy: Dilemmas of theIndividual in Public Service. New York: Russell Sage Foundation.

Magaloni, Beatriz,Edgar Franco, andVanessaMelo. 2015. “Killing inthe Slums: An Impact Evaluation of Police Reform in Rio deJaneiro.” Working Paper No. 556. https://siepr.stanford.edu/sites/default/files/publications/556wp_9.pdf.

Manski,CharlesF. 1995. IdentificationProblems in theSocial Sciences.Cambridge, MA: Harvard University Press.

Mummolo, Jonathan. 2018a. “Militarization Fails to Enhance PoliceSafety or Reduce Crime but May Harm Police Reputation.” Pro-ceedings of the National Academy of Sciences of the United States ofAmerica 115 (37): 9181–6.

Mummolo, Jonathan. 2018b. “Modern Police Tactics, Police-CitizenInteractions and the Prospects for Reform.”The Journal of Politics80 (1): 1–15.

Nix, Justin, Bradley A. Campbell, Edward H. Byers, and Geoffrey P.Alpert. 2017. “A Bird’s Eye View of Civilians Killed by Police in2015 Further Evidence of Implicit Bias.” Criminology & PublicPolicy 16 (1): 309–40.

Nyhan, Brendan, Christopher Skovron, and Rocıo Titiunik. 2017.“Differential Registration Bias in Voter File Data: A SensitivityAnalysis Approach.” American Journal of Political Science 61 (3):744–60.

NYPD. 2017. Use of Force Report, 2017. Technical Report. https://www1.nyc.gov/assets/nypd/downloads/pdf/use-of-force/use-of-force-2017.pdf.

Orne,Martin T. 1962. “On the Social Psychology of the PsychologicalExperiment:With Particular Reference toDemandCharacteristicsand Their Implications.” American Psychologist 17 (1): 776–83.

Ostrom, Elinor, and Gordon Whitaker. 1973. “Does Local Com-munity Control of Police Make a Difference? Some PreliminaryFindings.” American Journal of Political Science 17 (1): 48–76.

Pearl, Judea. 2001.“Direct and IndirectEffects.” InProceedings of theSeventeenth Conference on Uncertainty in Artificial Intelligence,411–20.

Pearl, Judea. 2018. “DoesObesity Shorten Life? or Is It the Soda? onNon-Manipulable Causes.” Journal of Causal Inference 6 (2): 1–7.

Peyton, Kyle, Michael Sierra-Arevalo, and David G. Rand. 2019. “AField Experiment on Community Policing and Police Legitimacy.”Proceedings of theNational Academy of Sciences 116 (40): 19894–8.

Phelps, Edmund S. 1972. “The Statistical Theory of Racism andSexism.” The American Economic Review 62 (1): 659–61.

Ridgeway, Greg. 2006. “Assessing the Effect of Race Bias in Post-Traffic Stop Outcomes Using Propensity Scores.” Journal ofQuantitative Criminology 22 (1): 1–29.

Ridgeway, Greg, and John MacDonald. 2010. Race, Ethnicity, andPolicing: New and Essential Readings, Chapter Methods forAssessing Racially Biased Policing. New York: New York Uni-versity Press.

Robins, James M., Miguel A. Hernan, and Babette Brumback. 2000.“Marginal Structural Models and Causal Inference in Epidemiol-ogy.” Epidemiology 11 (5): 550–60.

Rosenbaum, Paul R. 1984. “The Consequences of Adjustment foraConcomitantVariable thatHasBeenAffectedby theTreatment.”Journal of the Royal Statistical Society 147 (5): 656–66.

Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments inRandomized and Non-Randomized Studies.” Journal of Educa-tional Psychology 66 (5): 688–701.

Rubin, Donald B. 1990. “Formal Mode of Statistical Inference forCausal Effects.” Journal of Statistical Planning and Inference 25 (3):279–92.

Rubin, Donald B. 2000. “Causal Inference without Counterfactuals:Comment.” Journal of theAmericanStatisticalAssociation 95 (450):435–8.

Samii, Cyrus. 2016. “Causal Empiricism in Quantitative Research.”The Journal of Politics 78 (3): 941–55.

Sen, Maya, and Omar Wasow. 2016. “Race as a Bundle of Sticks:Designs that Estimate Effects of Seemingly Immutable Charac-teristics.” Annual Review of Political Science 19: 499–522.

Simoiu, Camelia, Sam Corbett-Davies, and Sharad Goel. 2017. “TheProblem of Infra-Marginality in Outcome Tests for Discrimina-tion.”AnnalsofAppliedStatistics11 (3): 1193–216.https://arxiv.org/abs/1706.05678.

Smith, Douglas A., Christy A. Visher, and Laura A. Davidson. 1984.“Equity andDiscretionary Justice: The Influence of Race on PoliceArrest Decisions.” Journal of Criminal Law & Criminology 75 (1):234.

Soss, Joe, and Vesla Weaver. 2017. “Police Are Our Government:Politics, Political Science, and the Policing of Race–Class Sub-jugated Communities.” Annual Review of Political Science 20:565–91.


18

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039

https://s18798.pcdn.co/annaharvey/wp-content/uploads/sites/6417/2019/06/Policing_For_Profit_2019.pdf



https://www.pnas.org/content/early/2019/07/16/1903856116

https://www.pnas.org/content/early/2019/07/16/1903856116

https://doi.org/10.1073/pnas.1919418117

https://doi.org/10.1073/pnas.1919418117

https://www.nytimes.com/2002/03/21/nyregion/study-suggests-racial-gap-in-speeding-in-new-jersey.html



https://siepr.stanford.edu/sites/default/files/publications/556wp_9.pdf

https://siepr.stanford.edu/sites/default/files/publications/556wp_9.pdf

https://www1.nyc.gov/assets/nypd/downloads/pdf/use-of-force/use-of-force-2017.pdf



https://arxiv.org/abs/1706.05678

https://arxiv.org/abs/1706.05678



https://doi.org/10.1017/S0003055420000039

VanderWeele, Tyler J. 2009. “Marginal Structural Models for theEstimation of Direct and Indirect Effects.” Epidemiology 20 (1):18–26.

VanderWeele, Tyler J. 2011. “Principal Stratification–Uses andLimitations.” International Journal of Biostatistics 7 (1): 1–14.

West, Jeremy. 2018.“RacialBias inPolice Investigations.”WorkingPaper.https://people.ucsc.edu/~jwest1/articles/West_RacialBiasPolice.pdf.

White, Ariel. 2019. “Misdemeanor Disenfranchisement? TheDemobilizing Effects of Brief Jail Spells on Potential Voters.”American Political Science Review 113 (2): 311–24.

Wilson, JamesQ. 1968.Varieties of Police Behavior. Cambridge,MA:Harvard University Press.

Wilson, James Q. 1989.Bureaucracy: What Government Agencies Doand Why They Do It. New York: Basic Books.

Yamamoto, Teppei. 2012. “Understanding the Past: StatisticalAnalysis of Causal Attribution.” American Journal of PoliticalScience 56 (1): 237–56.

Zhang, Junni L., and Donald B. Rubin. 2003. “Estimation of CausalEffects via Principal Stratification When Some Outcomes AreTruncated by ‘Death’.” Journal of Educational and BehavioralStatistics 28 (4): 353–68.


19

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. P

rinc

eton

Uni

v, o

n 21

May

202

0 at

14:

55:0

9, s

ubje

ct to

the

Cam

brid

ge C

ore

term

s of

use

, ava

ilabl

e at

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e/te

rms.

htt

ps://

doi.o

rg/1

0.10

17/S

0003

0554

2000

0039

https://people.ucsc.edu/~jwest1/articles/West_RacialBiasPolice.pdf

https://people.ucsc.edu/~jwest1/articles/West_RacialBiasPolice.pdf



https://doi.org/10.1017/S0003055420000039

administrative records mask racially biased policing...franco, and melo 2015; mummolo 2018a; peyton...

Documents