runaway feedback loops in predictive policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf ·...

12
Proceedings of Machine Learning Research 81:112, 2018 Conference on Fairness, Accountability, and Transparency Runaway Feedback Loops in Predictive Policing * Danielle Ensign [email protected] University of Utah Sorelle A. Friedler [email protected] Haverford College Scott Neville [email protected] University of Utah Carlos Scheidegger [email protected] University of Arizona Suresh Venkatasubramanian [email protected] University of Utah Editors: Sorelle A. Friedler and Christo Wilson Abstract Predictive policing systems are increasingly used to determine how to allocate police across a city in order to best prevent crime. Discovered crime data (e.g., arrest counts) are used to help update the model, and the process is repeated. Such systems have been empirically shown to be susceptible to runaway feedback loops, where police are repeatedly sent back to the same neighbor- hoods regardless of the true crime rate. In response, we develop a mathemati- cal model of predictive policing that proves why this feedback loop occurs, show empir- ically that this model exhibits such prob- lems, and demonstrate how to change the inputs to a predictive policing system (in a black-box manner) so the runaway feed- back loop does not occur, allowing the true crime rate to be learned. Our results are quantitative: we can establish a link (in our model) between the degree to which run- away feedback causes problems and the dis- parity in crime rates between areas. More- over, we can also demonstrate the way in which reported incidents of crime (those re- ported by residents) and discovered inci- dents of crime (i.e. those directly observed by police officers dispatched as a result of * This research was funded in part by the NSF under grants IIS-1633387, IIS-1513651, and IIS- 1633724. Code for our urn simulations can be found at https://github.com/algofairness/ runaway-feedback-loops-src. Corresponding author. the predictive policing algorithm) interact: in brief, while reported incidents can atten- uate the degree of runaway feedback, they cannot entirely remove it without the in- terventions we suggest. Keywords: Feedback loops, predictive policing, online learning. 1. Introduction Machine learning models are increasingly being used to make real-world decisions, such as who to hire, who should receive a loan, where to send police, and who should receive parole. These de- ployed models mostly use traditional batch-mode machine learning, where decisions are made and observed results supplement the training data for the next batch. However, the problem of feedback makes tradi- tional batch learning frameworks both inappro- priate and (as we shall see) incorrect. Hiring algorithms only receive feedback on people who were hired, predictive policing algorithms only observe crime in neighborhoods they patrol, and so on. Decisions made by the system influence the data that is fed to it in the future. For ex- ample, once a decision has been made to patrol a certain neighborhood, crime discovered in that neighborhood will be fed into the training appa- ratus for the next round of decision-making. In this paper, we focus on predictive policing – an important exemplar problem demonstrating c 2018 D. Ensign, S.A. Friedler, S. Neville, C. Scheidegger & S. Venkatasubramanian.

Upload: others

Post on 26-Jun-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Proceedings of Machine Learning Research 81:1–12, 2018 Conference on Fairness, Accountability, and Transparency

Runaway Feedback Loops in Predictive Policing∗

Danielle Ensign [email protected] of Utah

Sorelle A. Friedler [email protected] College

Scott Neville [email protected] of Utah

Carlos Scheidegger [email protected] of Arizona

Suresh Venkatasubramanian† [email protected]

University of Utah

Editors: Sorelle A. Friedler and Christo Wilson

AbstractPredictive policing systems are increasinglyused to determine how to allocate policeacross a city in order to best prevent crime.Discovered crime data (e.g., arrest counts)are used to help update the model, andthe process is repeated. Such systems havebeen empirically shown to be susceptible torunaway feedback loops, where police arerepeatedly sent back to the same neighbor-hoods regardless of the true crime rate.

In response, we develop a mathemati-cal model of predictive policing that proveswhy this feedback loop occurs, show empir-ically that this model exhibits such prob-lems, and demonstrate how to change theinputs to a predictive policing system (ina black-box manner) so the runaway feed-back loop does not occur, allowing the truecrime rate to be learned. Our results arequantitative: we can establish a link (in ourmodel) between the degree to which run-away feedback causes problems and the dis-parity in crime rates between areas. More-over, we can also demonstrate the way inwhich reported incidents of crime (those re-ported by residents) and discovered inci-dents of crime (i.e. those directly observedby police officers dispatched as a result of

∗ This research was funded in part by the NSFunder grants IIS-1633387, IIS-1513651, and IIS-1633724. Code for our urn simulations canbe found at https://github.com/algofairness/runaway-feedback-loops-src.† Corresponding author.

the predictive policing algorithm) interact:in brief, while reported incidents can atten-uate the degree of runaway feedback, theycannot entirely remove it without the in-terventions we suggest.

Keywords: Feedback loops, predictivepolicing, online learning.

1. Introduction

Machine learning models are increasingly beingused to make real-world decisions, such as whoto hire, who should receive a loan, where to sendpolice, and who should receive parole. These de-ployed models mostly use traditional batch-modemachine learning, where decisions are made andobserved results supplement the training data forthe next batch.

However, the problem of feedback makes tradi-tional batch learning frameworks both inappro-priate and (as we shall see) incorrect. Hiringalgorithms only receive feedback on people whowere hired, predictive policing algorithms onlyobserve crime in neighborhoods they patrol, andso on. Decisions made by the system influencethe data that is fed to it in the future. For ex-ample, once a decision has been made to patrola certain neighborhood, crime discovered in thatneighborhood will be fed into the training appa-ratus for the next round of decision-making.

In this paper, we focus on predictive policing– an important exemplar problem demonstrating

c© 2018 D. Ensign, S.A. Friedler, S. Neville, C. Scheidegger & S. Venkatasubramanian.

Page 2: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

these feedback issues. Predictive policing is in-creasingly employed to determine where to sendpolice, who to target for surveillance, and evenwho may be a future crime victim (Perry, 2013).We focus on the most popular of these formsof predictive policing (with PredPol, HunchLab,IBM, and other companies entering the market)which attempts to determine how to deploy po-lice given historical crime data.

Definition 1 (Predictive Policing) Givenhistorical crime incident data for a collection ofregions, decide how to allocate patrol officers toareas to detect crime.

Once police are deployed based on these pre-dictions, data from observations in the neighbor-hood is then used to further update the model.We will call these observations discovered inci-dents, as opposed to reported incidents that arecrime incidents reported to the police (e.g., via911 calls). Since such discovered incidents onlyoccur in neighborhoods that police have beensent to by the predictive policing algorithm itself,there is the potential for this sampling bias to becompounded, causing a runaway feedback loop.Indeed, Lum and Isaac (2016) have shown thatthis can happen.

Lum and Isaac’s work focused on PredPol

(Mohler et al., 2015), a predictive policing sys-tem in use by the LAPD and other cities acrossthe U.S.. Lum and Isaac (2016) model whatwould happen if PredPol were used in Oaklandto distribute police to find drug crime by usinghistorical crime incident data as the historicaldata and a synthetic population of likely drugusers based on public health data U.S. DoJ viaICPSR (2015); U.S. HHS via ICPSR (2015); theyfind that increasing policing efforts based on dis-covered incidents causes PredPol’s prediction tosubstantially diverge from the true crime rate, re-peatedly sending back police to the same neigh-borhoods.

In addition to its importance in the criminaljustice pipeline, predictive policing serves as anarchetypal problem, through which we can bet-ter understand issues which arise out of deploy-ing batch-mode machine learning algorithms inan online setting, where they essentially see re-sults that are influenced by their own predictions.Other such algorithms include recidivism predic-tion, hiring algorithms, college admissions, and

distribution of loans. In all of these contexts,the outcome of the prediction (e.g., who to hire)determines what feedback the algorithm receives(e.g., who performs well on the job).

1.1. Results

We use the theory of urns (a common frame-work in reinforcement learning) to analyze exist-ing methods for predictive policing. We show for-mally as well as empirically why these methodswill not work. Subsequently, we provide reme-dies that can be used directly with these methodsin a black-box fashion that improve their behav-ior, and provide theoretical justification for theseremedies.

2. Related Work

Our work builds most strongly on the work ofLum and Isaac (2016) described above, demon-strating the consequences of feedback loops insimulation in the predictive policing setting.There are a number of systems currently in placefor predictive policing Perry (2013). The mostwell known system used for predictive policing iscalled PredPol (Mohler et al., 2015) (describedin more depth below). Our implementation ofPredPol is the one used by Lum and Isaac (2016)in their work. Recidivism prediction systems arealso related to this work in that we believe theymay exhibit some of the same feedback loop is-sues, given that recidivism outcomes are onlyknown for prisoners who are released. Whilethe details of the actual implementations (suchas COMPAS NorthPointe (2012)) remain propri-etary, Berk and Bleich (2013) provide a compre-hensive review of the methods used in this area.

2.1. PredPol

The predictive policing software PredPol will becritical to our experimental investigations, so wedescribe it in more detail here. PredPol (Mohleret al., 2015) assumes that crimes follow an earth-quake aftershock model, so that regions that pre-viously experienced crime are likely to experi-ence crime again, with some decay. Mohler et al.(2015) model the crime rate λr(t) in region r at

time t as follows: λr(t) = µr+∑tir<t

θωe−ω(t−tir)

2

Page 3: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

where tin represents the time of an event in re-gion r, ω quantifies the time decay of a shock,and θ captures the degree to which aftershocksare generated from an initial event. They usean expectation-maximization procedure to deter-mine the parameters of the model.

Note that this model only uses incident data(including both discovered and reported inci-dents – see Section 3.1) per region to determinethe true crime rate1 and does not use any con-text in the form of demographics, arrest profilesand so on. PredPol, in essence, is predictingwhere incidents will be reported or discovered(since that’s all it sees), not where crime willhappen. Each day officers are sent to the areaswith highest predicted intensity and the result-ing discovered incident data is fed back into thesystem.

3. Predictive Policing with Urns

We will model the predictive policing process bya series of urn models with increasing complexity.Urn models (especially the Polya-Eggenbergerurns) have a long history in machine learning,but notably also in reinforcement learning (Pe-mantle, 2007), where they have been used, start-ing with the work of Erev and Roth (1998), as away to model how bounded-rationality players ina game might interact with each other. Studyingthe dynamics of urn models allows us to under-stand the convergent behavior of reinforcementlearning in such settings.

We will use a generalized Polya urn model (Pe-mantle, 2007) containing balls of two colors (redand black). At each time step, one ball is drawnfrom the urn, the color is noted, and the ball isreplaced. Then the following replacement matrixis used to decide how to update the urn contents:

(Red addition Black addition

Sample red a bSample black c d

)This matrix says that if we initially sampled ared ball, then we replace it and add a more redballs and b more black balls to the urn. We referto the standard Polya urn as a generalized urnwith a = d = 1 and b = c = 0.

1. PredPol— critically — conflates amount of crime andincident data.

3.1. Goals and assumptions

In the simplest predictive policing setting, aprecinct has a single police officer and policestwo regions A and B. Every day the police of-ficer is sent to one neighborhood where they mayor may not observe an incident; if they do, it islogged and we refer to such a log as a discoveredincident. In addition, residents might report inci-dents that are also logged: we call these reportedincidents. The goal is to build a predictive modelfor where to send the officer on each day. Specif-ically, the goal is to distribute the police officersin proportion to the crime in each area.2

Goal 3.1 (Effective Policing) A region withΛ percent of the crime in the precinct should re-ceive Λ percent of the police.

Achieving this goal requires learning the relativecrime rates of the regions.

To understand the behavior of predictive mod-els, we will make some simplifying assumptions.We will firstly assume that the predictive modelonly uses current statistics (in some form) tomake predictions.

Assumption 3.1 (Predictive Model) Theofficer tosses a coin based on current statisticsto decide where to go next.

To fully specify a predictive model, we alsoneed to understand context – what informationis collected during policing – and ground truth– what assumptions we make about underlyingcrime rates. We assume the simplest form of con-text.

Assumption 3.2 (Context) The only infor-mation retained about a crime is a count.

Assumptions about ground truth are both crit-ical and complicated. For some neighborhood A,let λA be the underlying ground truth crime ratefor the neighborhood. We will assume that this isobserved via discovered and reported incidents.Let dA be the rate at which police that visitneighborhood A discover incidents. Let wd bethe weight of the discovered incidents within all

2. Why should this be the goal? Suppose there are ex-actly enough police officers to stop all the crime andno more, then a deployment according to the truecrime rates will perfectly police all regions.

3

Page 4: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

incidents. Similarly, let rA be the rate at whichincidents are reported from neighborhood A, andlet wr be the weight of reported incidents amongall incidents. We will assume that wr + wd = 1.The total rate of incident data from neighbor-hood A is then wd · dA + wr · rA. We note herethat discovered incidents are directly implicatedin the feedback loop since police are deployed inareas based on the results of the predictive model.Reported incidents on the other hand are not.

To start our examinations, we make the follow-ing assumptions. In the subsections below, weexplore what happens as we vary these factors.

Assumption 3.3 (Truth in Crime Data) Ifan officer goes to a location A with an under-lying ground truth crime rate of λA, the officerdiscovers crime at a rate of λA. I.e., dA = λA.Reported incidents are also reported at a ratethat matches the underlying ground truth crimerate, i.e., rA = λA.

Note that Assumption 3.3 allows the predictivepolicing system to operate in a generous context.There are many reasons to believe that this as-sumption does not hold. We will show that evenin this optimistic setting problems occur.

Assumption 3.4 (Discovery Only) Incidentdata is only collected by an officer’s presence ina neighborhood. Neighborhoods with no officerswill contribute no incidents to the data. I.e.,wd = 1 and wr = 0.

We will also start with the assumption that allincident data is made up of discovered incidents.We will modify this assumption to also accountfor reported incidents in Section 3.4.

3.2. Uniform crime rates

Let us start by assuming that the crime rate isuniform between areas.

Assumption 3.5 (Uniform Crime Rate)If an officer goes to a location, crime happens

with probability λ. I.e., for any neighborhoods Aand B, λA = λB = λ.

Consider an urn that contains red and blackballs, where the proportion of red and black ballsrepresent the current observed statistics of crimein areas A and B respectively. Visiting area A

corresponds to picking a red ball and visiting areaB corresponds to picking a black ball. Observingcrime (which happens with probability λ) causesa new ball of the same color to be placed in theurn. The initial balls are always returned to theurn. The long-term distribution of red and blackballs in the urn corresponds to the long-term be-lief about crime prevalence in the two areas.

In general, we can describe the evolution ofthis process as the following urn. We toss a cointhat returns 1 with probability λ. If the coin re-turns 1, we simulate one time step of a standardPolya urn, and if 0, we merely replace the sam-pled ball. This corresponds to a standard Polyaurn “slowed” down by a factor λ. As such, itslong-term convergence is well-characterized. Letthe beta distribution Beta(α, β) be a distributionover the interval [0, 1] where the probability of xis given by3 f(x;α, β) ∝ xα−1(1− x)β−1

Lemma 2 (Renlund (2010)) Assume the urnstarts off with nr red balls and nb black balls.Then the limiting probability of seeing a red ballis a draw from the beta distribution Beta(nr, nb).

Significance. The long-term probability ofseeing red is the long-term estimate of crime inarea A generated by the model. The above resultshows that this probability is a random draw gov-erned only by the parameters nr, nb, which rep-resents the prior belief of the system. In otherwords, the prior belief coupled with the lack offeedback about the unobserved region preventsthe system from learning that the two areas arein fact identical with respect to crime rates.

On the contrary, consider how this processwould work without feedback. The officer couldbe sent to an area chosen uniformly at randomeach day, and this process would clearly convergeto a uniform crime rate for each area. Indeed,such a process resembles the standard update forthe bias of a coin where the prior distribution onthe bias is governed by a Beta distribution.

3.3. Non-uniform crime rates

Let us now drop the assumption of uniformity incrime rates, replacing Assumption 3.5 by

3. The constant of proportionality is Γ(α)Γ(β)/Γ(α+β)where Γ(x) is the standard gamma function.

4

Page 5: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

Assumption 3.6 (Non-uniform Crime Rate)A visit to area A has probability λA of encounter-ing a crime, and a visit to area B has probabilityλB of encountering a crime.

Nonuniform crime rates in neighborhoods Aand B can also be modeled by a Polya urn,with the caveat that the updates to the urn arenow random variables instead of deterministicupdates. Formally, we can think of the urn asbeing described by the 2× 2 (addition) matrix(

XA 00 XB

)where XA is a Bernoulli variable taking the value1 with probability λA and 0 with probability 1−λA, and XB is defined similarly

If the urn satisfied the so-called balance con-dition that the number of balls added at eachtime step is a constant (Mahmoud and Morcrette,2012), then we could invoke standard results todetermine the limiting behavior. This is not thecase in this setting. However, we now show thatit is possible to reduce this to a deterministic up-date model by exploiting the Bernoulli structureof the update.

At any time t, let n(t)A , n

(t)B be the number of

balls “colored” A and B respectively. The prob-ability of adding any ball to the urn is given bythe expression

n(t)A λA + n

(t)B λB

n(t)A + n

(t)B

Note that this can be viewed as a convex combi-nation of the two probabilities λA and λB and sothe overall probability of a ball being added tothe bin varies between two constants.

As before, consider the update process limitedto time steps when a ball is added to the urn.The probability of adding a ball colored A, con-ditioned on adding some ball, is given by

Pr(adding a A-colored ball)

Pr(adding some ball)=

n(t)A λA

n(t)A λA + n

(t)B λB

with a similar expression for adding a B-coloredball.

This is identical to the deterministic Polya urnin which we sample an i-colored ball, replace itand then add in λi more balls of the same color.

Essentially by conditioning on the event that weadd a ball, we have eliminated the randomnessin the update while retaining the randomness inthe sampling.

This latter Polya urn can be represented by thestochastic addition matrix(

λA 00 λB

)(1)

A very elegant result by Renlund (2010) pro-vides a general expression for the long-term prob-ability of seeing a A-colored ball.

Lemma 3 (Renlund (2010)) Suppose we aregiven a Polya urn with replacement matrix of theform (

a bc d

)with a positive number of balls of each kind tostart with. Assume that a, b, c, d ≥ 0 and at leastone entry is strictly positive. Then the limit ofthe fraction of balls of each type exists almostsurely. The fraction p of A-colored balls can becharacterized as follows:

• If a = d, c = b = 0, then p tends towards abeta distribution.

• If not, then p tends towards a single pointdistribution x∗, where x∗ ∈ [0, 1] is a root ofthe quadratic polynomial

(c+ d− a− b)x2 + (a− 2c− d)x+ c.

If two such roots exist, then it is the one suchthat f ′(x∗) < 0.

By limiting ourselves to the subsequence ofevents when some ball is added to the urn, andusing the above general lemma characterizing theasymptotics of deterministic urn updates fromRenlund (2010), we have the following lemmaabout the urn under this new assumption.

Lemma 4 In the urn with addition matrixgiven above, the asymptotic probability of sam-pling a red ball is 1 if λA > λB and is zero ifλB > λA.

Proof Invoking Lemma 3, we set the param-eters b = c = 0, a = λA and d = λB . Theresulting quadratic polynomial is (λB − λA)x2 +

5

Page 6: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

(λA − λB)x = 0. This polynomial has two roots:x = 0, 1. The first derivative is (λB−λA)(2x−1).If λA > λB , then this is negative for x = 1.Conversely, if λB > λA, then this is negative forx = 0.

Significance. In this scenario, the update pro-cess will view one region as having much morecrime than the other, even if crime rates are sim-ilar. In particular, if region A has a crime rate of10% and region B has a crime rate of 11%, theupdate process will settle on region B with prob-ability 1. This is a classic “go with the winner”problem where feedback causes a runaway effecton the estimated probabilities.

3.4. Accounting for reported incidents

Now we consider what happens when we removeAssumption 3.4, i.e., we allow both discoveredand reported incidents to be used as input to theurn model as is more usually the case in pre-dictive policing systems. As discussed earlier,the weight terms wd and wr are used to weightdiscovered and reported crimes from a neighbor-hood, and so the total weight of crime incidentsfrom (say) area A would be wddA + wrrA if itwas visited, and wrrA otherwise. This leads tothe following urn replacement matrix:(

wddA + wrrA wrrBwrrA wddB + wrrB

)where, as in Section 3.3, we should interpret theentries of the matrix as expected values of aBernoulli process.

Using the same trick as in Section 3.3, we canreinterpret the above matrix as a deterministicupdate process, and invoke Lemma 3 to under-stand the limiting behavior. The correspondingquadratic equation associated with this replace-ment urn is given by:

f(x) = wd(dB − dA)x2 + (wd(dA − dB)−wr(rA + rB))x+ wrrA = 0

Let R = wr(rA + rB) be the total weight of re-ported incidents, and let us denote wd(dB − dA)by ∆d, the weighted differential in discoveredcrime. We can then rewrite the above expressionas:

f(x) = ∆dx2 − (∆d +R)x+ wrrA = 0

We can now find the roots of f(x) = 0. Theseare given by

x =(∆d +R)±

√(∆d +R)2 − 4∆dwrrA

2∆d

which can be written as

x = ν ±√ν2 − wrrA

∆d

where ν = 12 + R

2∆d. Taking the first derivative,

f ′(x) = 2∆dx− (∆d +R)

and thus f ′(x) < 0 when x < 12 + R

2∆d. Therefore,

by Lemma 3, the limiting fraction of “A-colored”balls in the urn is

x∗ = ν −√ν2 − wrrA

wd(dB − dA)(2)

3.4.1. Interpretation

We can interpret Equation (2) through a numberof cases. Firstly, consider the case of no feed-back. This corresponds to setting wd = 0. Inthat case, the urn replacement matrix is fixed:regardless of which ball we draw, we always addrA A-colored and rB B-colored balls. Clearly,the limiting fraction of A-colored balls is ra

ra+rband this is the answer we would expect given thecrime reporting rates – we denote this fraction asλ∗.

We can rewrite Equation (2) in terms of λ∗

by introducing a change of variable. Define κ =R/∆d which allows us to rewrite ν = (1 + κ)/2.We can now rewrite Equation (2) as

x∗ =1 + κ

2−√(1 + κ

2

)2

− λ∗κ. (3)

The second term under the square root comesfrom noting that wrrA/∆d = rA/(rA + rB) ·wr(rA + rB)/∆d.

A first observation is that as λ∗ → 0, x∗ → λ∗.Similarly, as λ∗ → 1, x∗ → λ∗. In other words,if the crime rates between the neighborhoods areheavily skewed, this urn will converge to a goodapproximation of the correct answer.

6

Page 7: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

For intermediate values of λ∗, we transform theequation as follows:

x∗ =1 + κ

2−√(1 + κ

2

)2

− λ∗κ

=1 + κ

2

(1−

√√√√1− λ∗κ(1+κ

2

)2

)

which in the limit, as κ grows, can be expressedas

x∗ =1 + κ

2

( λ∗κ

2(

1+κ2

)2

)

by a binomial approximation, yielding

x∗ = λ∗κ

κ+ 1= λ∗

R

R+ ∆d

Significance The limiting behavior of this urnis represented by x∗. How does this relate to theideal limiting behavior λ∗? For x∗ ≈ λ∗, it mustbe that the ratio R/(R + λd) is close to 1. Thiscan happen in two ways. Either R must be verylarge, or ∆d must be small. R = wr(ra + rb)which is bounded by 2. Thus, the only other op-tion is to have ∆d be very small. Recall that∆d = wd(dB − dA). To make it small, we musteither make wd small, which corresponds to dis-counting the importance of discovered incidents(thus relying heavily on the distribution of re-ported incidents assumed to be correct by As-sumption 3.3), or it must be that the discoveredcrime rates dB and dA are very similar. In otherwords, the only scenarios where feedback doesnot drive the outcome away from the true resultare when we effective ignore feedback (by down-weighting the importance of discovered crime)or when the crime rates are similar enough forthe feedback to not matter. However, it is pre-cisely when crime rates are different that predic-tive policing is of value (because resources arethen deployed differently). Thus, once again theurn model reveals problems (via simulation) inexisting models for predictive policing.

3.5. Modifying the urn model to accountfor feedback

In order to learn the crime rate, we want thePolya urn to contain balls in proportion to the

relative probability of crime occurrence. As wehave seen, a standard Polya urn with stochasticupdate rates will converge to a distribution thathas no relation to the true crime rates. Here,we present a simple change to the urn processwhich does guarantee that the urn proportionwill converge to the ratio of replacement (i.e.crime) rates.

3.5.1. Discovered Incidents Only

Again, we first consider what happens if Assump-tion 3.4 is in place.

Consider the standard Polya urn update rule:the probabilities λA and λB model the probabil-ity of an additional ball being added to the urn,conditional on a ball of the respective color hav-ing been sampled. This means that the prob-ability of a ball being added is not λA, but

λAn(t)A

n(t)A +n

(t)B

. As a result, the expected fraction

of A-balls being added to the urn after one stepof the process is λAnA

λAnA+λBnBinstead of λA

λA+λB.

This immediately suggests a fix: instead of al-ways adding the new balls, we first sample an-other ball from the urn, and only add the newballs if the colors are different. With this fix,the probability of adding a ball with label A

isn(t)A

n(t)A +n

(t)B

λAn(t)B

n(t)A +n

(t)B

, while the probability of

adding a ball with label B isn(t)B

n(t)A +n

(t)B

λBn(t)A

n(t)A +n

(t)B

.

Crucially, these two expressions are proportionalto λA and λB , except for a constant factor thatis a function of the current state of the urn.

The intuition behind this fix is that if our de-cision procedure sends police to region A 90% ofthe time, we should not be surprised that discov-ered incidents in region A happen at a rate ofnine to one, even if the crime rate is the sameacross both regions. In such a scenario, if we seea crime in region A (where police go 90% of thetime), we should simply drop the incident record90% of the time; analogously, in region B (wherepolice only go 10% of the time), we drop the in-cident record 10% of the time.

One way to interpret our fix is as a formof rejection sampling : we are dropping sam-pled values according to some probability schemeto affect the statistic we are collecting. Theimportance-sampling analog to this schemewould be to use weighted balls, where the weight

7

Page 8: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

of each ball is inversely proportional to be rateat which police are sent. Effectively, we wanta scheme where as more police are sent, smallerweights are assigned to discovered incidents. Butsuch a scheme is precisely the Thompson-Horvitzestimator, used in survey designs with unequalprobability distributions (Horvitz and Thomp-son, 1952), and so we see that our proposal is arejection-sampling variant of Thompson-Horvitzestimation.

3.5.2. Reported and DiscoveredIncidents

Now we consider what happens if there are bothdiscovered and reported incidents. Intuitively, wewant to correct for the runaway feedback causedby the discovered incidents, but not over-correctfor the reported incidents, which don’t sufferfrom the issue. Recall that the replacement ma-trix is: (

wddA + wrrA wrrBwrrA wddB + wrrB

)Suppose that Assumption 3.3 is in place and re-call that wd + wr = 1. Then this replacementmatrix is: (

λA wrλBwrλA λB

)=(

wddA 00 wddB

)+

(wrrA wrrBwrrA wrrB

)Note that the first matrix represents the discov-ered replacement and the second represents thereported replacement. From the previous sec-tion, we know how to modify the discovery re-placement matrix so that the feedback effect ismitigated. We first apply that same techniquehere, but only to the discovered incidents. Asbefore, doing this ensures that the replacementcontributes (in expectation) exactly wddA to theurn when visiting A, and wddB when visiting B.

But what about the reported incidents? If weadd them as is (i.e as given by the second ma-trix), the total contribution in the case of anA-visit is wddA + wrrA + wrrB . Again, invok-ing Assumption 3.3, the total contribution be-comes λA + wrλB (with a similar expression fora B-visit). Unfortunately, this expression leadsto the urn converging to an incorrect rate, ulti-mately because the contribution to the region not

visited has been incorrectly down-weighted. Thefix is simple: we remove the down-weighting ofthe reported incidents in the neighborhood wherepolice were not deployed. The resulting replace-ment matrix, in expectation, is:(

wddA + wrrA rBrA wddB + wrrB

)and we apply our earlier fix to any discovereddata. This ensures that in expectation, the con-tribution to the urn regardless of whether A orB is visited is λA A-balls and λB B-balls, as de-sired.

4. Evaluating the urn model

In this section, we will focus on validating the ex-istence of the feedback loop problem experimen-tally within our urn model. Code for our urn sim-ulations can be found at https://github.com/algofairness/runaway-feedback-loops-src.

4.1. Observational decay

Thus far, our urn models have captured somekey elements of the model used by PredPol– theidea of differential crime rates as well as the up-dates based on discovered and reported incidents.PredPol also includes a notion of limited mem-ory, both by incorporating time decay into crimeaftershocks, and by using a limited time windowfor training. We model limited memory in theurn setting by adding a simple notion of decay.After every round, each ball disappears from theurn independently with a fixed probability pd.This can be thought of as a relaxation of As-sumption 3.2. Varying pd is analogous to varyingthe size of the PredPol training window.

4.2. Illustrating runaway feedback in urns

To the best of our knowledge there is no theoret-ical characterization of the asymptotic distribu-tions in this full model once the notion of decayis included. We present empirical evidence illus-trating the problems with using this model tolearn crime rates. In our experiments, pd = 0.01.

Using the Lum and Isaac (2016) data, weconsider a two neighborhood police deploymentscenario using, first, the two regions of Oaklandwith the most historical incident data (Top1

8

Page 9: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

Without Improvement Policy With Improvement PolicyTop1 vs. Top2 Top1 vs. Random Top1 vs. Top2 Top1 vs. Random

Dis

cove

red

On

lyA

llIn

cid

ents

Figure 1: The distribution over 1000 days versus percentage of balls from region Top1 in the urnover 1000 runs. A police force deployed based on the underlying crime rates would send56.7% of the force to Top1 instead of Top2 and 61.0% of the force to Top1 instead ofRandom (the green line shown). Top row (discovered incidents only): both charts (left)converge to sending 100% of the force to Top1, while with the improvement policy (right)the charts appear to converge to the correct crime rates. Bottom row (all incidents,equally weighted): both charts (left) converge to the incorrect rate (red line), while withthe improvement policy (right) the charts appear to converge correctly to the true crimerates.

and Top2) and, second, the Oakland neigh-borhood with the most incidents as comparedto a randomly chosen region with many fewerincidents (Random). We simulate the effect ofthe historical incident data on the prior for thesystem by determining the number of balls foreach region in our urn based on the past numberof incidents. We use the full number of incidents(609, 379, and 7 for regions Top1, Top2 andRandom respectively) as the starting number ofballs in the urn from each region. The urns arethen updated based on the estimated numberof daily drug use incidents, i.e., λTop1 = 3.69,λTop2 = 2.82, and λRandom = 2.36.

4.2.1. Discovered Incidents Only

First, we begin with Assumption 3.4 in place stat-ing that we’ll only consider discovered incidentdata (i.e., wr = 0). This allows us to isolate thepart of the data that causes the feedback loop inorder to examine its effect.

The results, shown in the top left of Figure1, demonstrate that even if police sent to aneighborhood discover crime incidents accordingto the true crime rate (Assumption 3.3), theurn model will converge to only sending policeto the neighborhood with the most crime. Thisreplicates (within our urn model) the feedbackloop problems with PredPol found by Lumand Isaac (2016). Recall, from Lemma 4, thatskew occurs even if the difference in crime ratesbetween the two neighborhoods is small. Notethat while we included a notion of decay inour urn model in order to more closely modelPredPol, we found similar results under theurn model without decay.

4.2.2. Discovered and ReportedIncidents

Now, considering both reported and discoveredincident data, we repeat the experiments. Again,we’ll assume that both discovered and reportedincidents are reported at the underlying true

9

Page 10: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

rate (Assumption 3.3), and we’ll assume thatthese incidents are equally weighted, i.e., thatwd = wr = 0.5. The results shown in the bot-tom left of Figure 1 show that while the error inpolice deployment is not as great as if only dis-covered incidents are used, the urn still does notconverge to the correct rate. Here, it’s importantto note the strength of the assumption that in-cidents are reported at the true underlying rateand not influenced by police deployment - we sus-pect that this assumption helps this convergenceto be closer to (though still not the same as) thecorrect rate.

4.3. Evaluating the modified urn

Using this improvement policy to determinewhen to replace balls, we can now determine ifthe urns can learn the true crime rate despite theissue of observational bias. Again, using the esti-mated daily drug usage per region as the under-lying true crime rate and the historical incidentdata as the prior for the urn color distribution,we simulate the urn’s ability to find the relativecrime rates in two regions, the Top1 and Top2 in-cident regions and a Random region. As shownin the right of Figure 1, urns under this improve-ment policy converge to a distribution of colorsthat represents the true crime rate, whether us-ing only discovered incidents or both discoveredand reported incidents.

5. Fixing PredPol

5.1. Modifying PredPol in a black-boxmanner

The urn models we explore provide a justificationfor the observed feedback loop failures of Pred-Pol. But can we remedy PredPol itself us-ing the improvements described in Section 3.5?We first demonstrate how asymmetric feedbackaffects PredPol by simulating the decisions aprecinct might take after running it. We runPredPol’s prediction model (using the Lum andIsaac (2016) data and implementation), trainedon Oakland historical crime data, and generatecrime according to the drug usage rates describedabove.

At each simulation day, PredPol trains onthe previous 180 days of incident data, and pro-duces predicted crime rates rA and rB . The deci-

sion of where to send police is made probabilisti-cally, by a Bernoulli trial with p = rA(rA+rB)−1.This models the targeting effect of sending morepolice where more crime is expected, simulatinga typical use of PredPol (Mohler et al., 2015).

To counteract the effects of the feedback, wecan employ the same strategy as in Section 3.5.The key insight is that we need only filter theinputs presented to PredPol rather than try-ing to modify its internal workings. Specifically,once we obtain crime report data from the sys-tem, we conduct another Bernoulli trial withp = rO(rA + rB)−1, where rO is the predictedrate of the district we did not police that day,and only add the incidents to the training set ifthe trial succeeds. In other words, the more likelyit is that police are sent to a given district, theless likely it is that we should incorporate thosediscovered incidents.

5.2. Evaluating the PredPol simulationand its repair

Simulating the effects of PredPol on policingas described above, both before and after ourimprovement policy is applied, we compare thepolicing rates of region Top1 to Top2 and Top1 toRandom as before. Each simulation is repeated300 times and run for one year. As can be seen inthe top row of Figure 2, regular PredPol ratesfluctuate wildly over different runs, and do notconverge to the appropriate crime rates (markedwith the red dashed line). However, when the in-puts to PredPol are changed according to ourimprovement policy, PredPol’s prediction ratesappear to fluctuate around the correct crime ra-tio. Note that the process is still quite noisy,a further indication that PredPol generatescrime rate predictions that are still somewhat un-reliable.

In Section 3.5.2, we provided an analysis andcorrection for urn models based on more thanonly discovered incidents. We provide a similaranalysis for the mixed case in PredPol, shownalong the bottom row of Figure 2. Note thateven with a large number of reported incidents,PredPol seems to remain susceptible to run-away feedback. When the correction mechanismof Section 3.5.2 is applied to the (discovered only)incidents, PredPol appears to converge to theappropriate crime rate predictions.

10

Page 11: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

0.5

1.0

0.0

0.5

1.0

0.0

0 180 360 0 180 360

Top1 vs. Top2 Top1 vs. Random

r 1/(r

1+

r 2)

0.5

1.0

0.0

0.5

1.0

0.0

0 180 360 0 180 360

Top1 vs. Top2 Top1 vs. Random

r 1/(r

1+

r 2)

With improvement policyWithout improvement policy

Dis

cover

edO

nly

All

Inci

den

ts

0.5

1.0

0.0

0.5

1.0

0.0

0 180 360 0 180 360

Top1 vs. Top2 Top1 vs. Random

r 1/(r

1+

r 2)

0.5

1.0

0.0

0.5

1.0

0.0

0 180 360 0 180 360

Top1 vs. Top2 Top1 vs. Random

r 1/(r

1+

r 2)

With improvement policyWithout improvement policy

Figure 2: PredPol’s relative deployment to region Top1 versus Top2 or Random. Along the toprow, we use the model which only accounts for discovered incidents (those based on policehaving been deployed to an area). Along the bottom row, we use the model which accountsfor both discovered and reported incidents. Left: PredPol operating as usual. Right:discovered incident entries modified using our improvement policy. Police deploymentbased on underlying crime rates would send 56.7% of the force to Top1 instead of Top2and 61.0% of the force to Top1 instead of Random. These correct crime rates (indicatedwith a dashed red line) appear to be what PredPol converges to under the improvementpolicy.

6. Discussion and Limitations

In this paper we show that urn models can beused to formally model predictive policing as wellas indicate remedies for problems with feedback.We demonstrate this both formally and empir-ically. Our solution also suggests a black-boxmethod to counteract runaway feedback in pre-dictive policing by appropriately filtering the in-puts fed to the system.

Our results are not just a qualitative indicatorof problems with feedback. They also indicate ex-actly how the problem of runaway feedback canbe exacerbated: specifically as crime rates varybetween regions and as the model relies more andmore on discovered incident reports. Our resultsalso indicate that if crime rates are more or lessthe same between regions, then the problem offeedback is much less, and it might be possible togenerate reasonable predictions without explic-itly countering feedback loops (though the resultswill still be inaccurate).

There are many avenues that our analysis doesnot yet explore. Firstly, while we expect that oursolution generalizes to multiple regions (and in-deed the problems with feedback remain exactlythe same), there might be technical difficulties inworking with the much smaller probabilities wewill encounter. As an abstraction of predictivepolicing, an urn model suffices to capture feed-back issues, but does not account for potentiallyricher predictive systems that might use other in-formation (for example demographics) to makepredictions. Another interesting dimension thatis unexplored is the fact that different types ofcrime might have different reporting and discov-ery profiles, and might interact with each otherin a predictive model in complex ways.

One of the most crucial assumptions we make(and one that in fact is sympathetic to currentpredictive policing models) is that reported anddiscovered incident rates track the true crimerates. There is considerable evidence that crime

11

Page 12: Runaway Feedback Loops in Predictive Policingproceedings.mlr.press/v81/ensign18a/ensign18a.pdf · Runaway Feedback Loops in Predictive Policing where ti n represents the time of an

Runaway Feedback Loops in Predictive Policing

reporting is noisy and skewed by area and by typeof crime (Bialik, 2016). Once we remove thatassumption, the analysis becomes more compli-cated, and while the problems of runaway feed-back remain, the solutions might not continue towork. In this case, we would require better mod-els to describe how crime rates manifest them-selves in terms of reported and discovered inci-dents.

7. Acknowledgements

This paper would not have been possible withoutKristian Lum and William Isaac’s generosity insharing the code and data developed for (Lumand Isaac, 2016). Many thanks!

References

Richard A. Berk and Justin Bleich. Statisti-cal procedures for forecasting criminal behav-ior. Criminology & Public Policy, 12(3):513–544, 2013. ISSN 1745-9133. doi: 10.1111/1745-9133.12047. URL http://dx.doi.org/10.1111/1745-9133.12047.

Carl Bialik. How to make sense of conflicting,confusing and misleading crime statistics.https://fivethirtyeight.com/features/how-to-make-sense-of-conflicting-

confusing-and-misleading-crime-

statistics/, 2016. Visited 10/7/2017.

Ido Erev and Alvin E Roth. Predicting how peo-ple play games: Reinforcement learning in ex-perimental games with unique, mixed strategyequilibria. American economic review, pages848–881, 1998.

Daniel G Horvitz and Donovan J Thompson.A generalization of sampling without replace-ment from a finite universe. Journal of theAmerican statistical Association, 47(260):663–685, 1952.

Kristian Lum and William Isaac. To predict andserve? Significance, pages 14 – 18, October2016.

Hosam M Mahmoud and Basile Morcrette. Ex-actly solvable balanced tenable urns with ran-dom entries via the analytic methodology. Dis-

crete Mathematics & Theoretical ComputerScience, 2012.

George O. Mohler, Martin B. Short, Sean Ma-linowski, Mark Johnson, George E. Tita, An-drea L. Bertozzi, and P. Jeffrey Brantingham.Randomized controlled field trials of predictivepolicing. Journal of the American StatisticalAssociation, 110(512):1399 – 1411, 2015.

Inc. NorthPointe. Compas. http:

//www.northpointeinc.com/files/downloads/FAQ Document.pdf, 2012.

Robin Pemantle. A survey of random processeswith reinforcement. Probab. Surveys, 4:1–79,2007. doi: 10.1214/07-PS094. URL http://

dx.doi.org/10.1214/07-PS094.

Walt L Perry. Predictive policing: The role ofcrime forecasting in law enforcement opera-tions. Rand Corporation, 2013.

Henrik Renlund. Generalized polya urns viastochastic approximation. arXiv preprintarXiv:1002.3716, 2010.

U.S. DoJ via ICPSR. National crime victimiza-tion survey, 2014. United States Departmentof Justice. Office of Justice Programs. Bu-reau of Justice Statistics. [Distributor]: Inter-university Consortium for Political and So-cial Research (ICPSR), Aug. 27, 2015. URLhttp://doi.org/10.3886/ICPSR36142.v1.

U.S. HHS via ICPSR. National survey ondrug use and health, 2011. United StatesDepartment of Health and Human Services.Substance Abuse and Mental Health ServicesAdministration. Center for Behavioral HealthStatistics and Quality. [Distributor]: Inter-university Consortium for Political and So-cial Research (ICPSR), 2015. URL http:

//doi.org/10.3886/ICPSR34481.v4.

12