the essential role of empirical validation in legislative … · 2020-03-04 · kosuke imai...

21
The Essential Role of Empirical Validation in Legislative Redistricting Simulation Kosuke Imai Harvard University Quantitative Redistricting and Gerrymandering Conference Duke University March 3, 2020 Joint work with Ben Fifield, Jun Kawahara (Kyoto) and Chris Kenny Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 1 / 21

Upload: others

Post on 16-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

The Essential Role of Empirical Validation inLegislative Redistricting Simulation

Kosuke Imai

Harvard University

Quantitative Redistricting and Gerrymandering ConferenceDuke UniversityMarch 3, 2020

Joint work with Ben Fifield, Jun Kawahara (Kyoto) and Chris Kenny

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 1 / 21

Page 2: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Motivation

Congressional redistricting as a key element of American democracyInfluenced by political motives and partisan endsEarly proposals in 1960s: automated simulation as a transparent,objective, and unbiased method for redistricting

Resurgence of simulation methods over the last 20 yearsincreasing availability of granular data about votersrecent advances in computing capability and methods

Starting to be used in courts (e.g., MO, NC, and OH)

Do simulation methods can actually yield a representative sample ofall possible redistricting plans that satisfy required constraints?

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 2 / 21

Page 3: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Overview

Insufficient efforts have been made to empirically validate redistrictingsimulation methodsEric Lander in his amicus brief to the Supreme Court:With modern computer technology, it is now straightforward to gener-ate a large collection of redistricting plans that are representative of allpossible plans that meet the State’s declared goals (e.g., compactnessand contiguity)

Some used 25 precinct validation set of Fifield et al. (forthcoming)

We apply the computational method of Kawahara et al. (2017)1 efficiently enumerate all possible redistricting plans2 independently and uniformly sample from this population

Scales to a state with a couple of hundred geographical units1 enumeration: a large number of small validation sets2 sampling: a small number of medium-size validation sets

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 3 / 21

Page 4: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Redistricting as a Graph-partitioning Problem

v1

v2

v3

v4

v5

v6

(a) Original map

v1

v2 v4

v6

v5v3

e1

e2

e3

e4

e5

e6

e7

(b) Graph representation

v1

v3

v2 v4

v6

v5

e2

e4 e6

e7

(c) Induced subgraphs

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 4 / 21

Page 5: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Zero-suppressed Binary Decision Diagram (ZDD)

A data structure to efficiently represent a family of sets (Minato, 1993)

ZDD as a directed acyclic graphroot node e1: no incoming arc, represents an edge of the original graphterminal nodes: no outgoing arc, no correspondence to an edge of theoriginal graph

1 0-terminal 02 1-terminal 1

every non-terminal node has two outgoing arcs1 0-arc: dashed arc 99K removes edge2 1-arc: solid arc −→ retains edge

One-to-one correspondence:Graph partition: {e2, e4, e6, e7}The set of edges that belong to a directed path from the root node to1-terminal node and have an outgoing 1-arc

e1 99K e2 −→ e3 99K e4 −→ e5 99K e6 −→ e7 −→ 1

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 5 / 21

Page 6: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

e1

e2 e2

e3 e3 e3

e4 e4 e4 e4

e5 e5 e5 e5 e5 e5

e6 e6 e6 e6 e6 e6

e7 e7 e7 e7 e7 e7

0 1

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 6 / 21

Page 7: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Construction of ZDD

Starting with root node e1, create one outgoing 0-arc and oneoutgoing 1-arc from one node e` to the next node e`+1

Store the number of determined connected components or dcc foreach node: dcc = 1 for e5

e1 −→ e2 99K e3 99K e4 99K e5

{v1, v2} forms a district regardless of whether or not e5 is retainedCreate an arc into the 0-terminal node if dcc > p at any node ordcc < p at the final nodeKeep track of connected component number for each vertex of theoriginal graph

1 Start by setting comp[vi ]← i for i = 1, 2, . . . , n2 If we retain an edge between vi and vi ′ , then set

comp[vj ]← min{comp[vi ], comp[vi ′ ]} for any vj withcomp[vj ] = max{comp[vi ], comp[vi ′ ]}

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 7 / 21

Page 8: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

The Frontier-based Search

Frontier: the set of vertices of the original graph that are incident toboth a processed edge and an unprocessed edgeF0 = Fm = ∅ where m is the total number of edgesFrontier can be used to determine a connected component number

suppose there exists a vertex v such that v ∈ F`−1 but v /∈ F`

If there is no other vertex in F` shares the connected componentnumber, then comp[v ] is determined and dcc is incremented by 1

1

2

3

4

5

6

e1

e2

e3

e4

e5

e6

e7

F1

(a) Process edge e1; dcc = 0

1

2

1

4

5

6

F2

e1

e2

e3

e4

e5

e6

e7

(b) Process edge e2; dcc = 0

1

2

1

4

5

6

F3

e1

e2

e3

e4

e5

e6

e7

(c) Process edge e3; dcc = 0

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 8 / 21

Page 9: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

1

2

1

2

5

6

F4

e1

e2

e3

e4

e5

e6

e7

(d) Process edge e4; dcc = 0

1

2

1

2

5

6

F5

e1

e2

e3

e4

e5

e6

e7

(e) Process edge e5; dcc = 1

1

2

1

2

5

2F6

e1

e2

e3

e4

e5

e6

e7

F4

F4

(f) Process edge e6; dcc = 1

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 9 / 21

Page 10: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Node Merge for Computational Efficiency

Only required information = connectivity of vertices in F`−1

Merge multiple paths at node e` if dcc and the frontier F`−1 areidentical after renumbering to eliminate gaps

2

2

3

4

5

6

F2

e1

e2

e3

e4

e5

e6

e7

(a) Path e1 −→ e2 99K e3

3

2

3

4

5

6

F2

e1

e2

e3

e4

e5

e6

e7

(b) Path e1 99K e2 −→ e3

Merging is critical: 8× 8 lattice into 2 districts ≈ 1.2× 1011 partitionsCan encode the population parity and other information into ZDD prevents merging and hence does not scale

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 10 / 21

Page 11: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Enumeration and Independent Sampling

Every path from the root node to the 1-terminal node has one-to-onecorrespondence to a graph partitionEnumerate all the paths

1 start with the 1-terminal node2 count the number of unique paths at each node3 move upwards until the root node is reached

Independent sampling (Knuth 2011)Let c(e`) be the number of paths from the 1-terminal node to node e`Let e`0 and e`1 be the nodes pointed by the 0-arc and 1-arc of e`Store c(e`) for each node e`Conduct random sampling by starting with the root node and choosingnode e`1 with probability c(e`1)/{c(e`1) + c(e`0)}Probability of reaching the 0-terminal node is zero

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 11 / 21

Page 12: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Scalability of the ZDD Construction Algorithm

Randomly generate contiguous subsets of the New Hampshire map(327 precincts and 2 districts) that vary in size {40, 80, . . . , 200}Number of districts: 2, 5, or 10Cluster with 530 nodes, 48 cores and 180 GB of RAM per node

● ●●

● ●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

● ●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●

2 D

istr

icts

5 D

istr

icts

10 D

istr

icts

40 80 120 160 200

0

5

10

15

0

5

10

15

0

5

10

15

Number of Precincts in Map

Run

time

(hou

rs)

Finished Building ZDDRan Out of Memory(180GB Limit)

Runtime Scalability

● ●● ●● ●● ● ●● ●●● ●●

●●

●● ●● ●●● ● ●●●●

● ●● ●●● ●● ●Memory Limit

● ●● ●● ● ●●●

● ●● ● ●●

● ●●●

● ● ●●● ●

● ●●● ●● ●● ●●● ●● ●● ● ●●● ●● ●Memory Limit

●●● ●● ●● ●●●

●●●

●●●● ●

●● ●●

●● ●● ●● ● ●● ●●●●● ●●

●● ●● ● ●●●

●●●

● ●Memory Limit

40 80 120 160 200

0

50

100

150

0

50

100

150

0

50

100

150

Number of Precincts in Map

Mem

ory

Usa

ge (

GB

)

Memory Usage Scalability

● ●●●● ●● ●●● ● ●● ●●

●●

●● ●● ●●● ●● ●●●

● ●● ●●● ●●●Memory Limit

● ●● ●● ●● ●●

● ●●● ●●

● ●●●

● ● ● ●●●

● ● ●● ● ●●●●●● ●●●● ● ●●● ●●●Memory Limit

●●● ●● ●● ● ●●

●●●

●● ●●●

●●● ●

● ● ●● ●● ●●● ●●●

●● ●●●

●●● ● ● ●●● ●●

● ●Memory Limit

5 10 15 20

0

50

100

150

0

50

100

150

0

50

100

150

Max Frontier Size of Graph

Mem

ory

Usa

ge (

GB

)

Memory Usage Scales with Frontier Size

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 12 / 21

Page 13: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Validation through Enumeration

70 precinct validation set from Florida (� 25 validation set )8 hours on MacBook Pro laptop with 16GB and 2.8 GHz processorBuilding ZDD took less than a second: 44 million valid plans

25.750

25.775

25.800

25.825

25.850

25.875

−80.20 −80.16 −80.12

Longitude

Latit

ude

Population

0

5000

10000

70−Precinct Validation Map

< 0.01 Parity: 717060

< 0.05 Parity: 3678453< 0.10 Parity: 7866708

0

300,000

600,000

900,000

0.00 0.05 0.10 0.15 0.20

Distance from Population Parity

Num

ber

of P

lans

Distribution of Population Parity on 70−Precinct Map

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 13 / 21

Page 14: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Performance of RSG and MCMC Algorithms

Evaluate the performance of two common algorithms:1 Random-seed-and-grow (RSG): Cirincione et al. (2000); Chen and Rodden

(2013)2 Markov chain Monte Carlo (MCMC) Fifield et al. (2014); Mattingly and

Vaughn (2014)

Implemented via the redist packagetempering/discarding/reweighting for population constraintsRepublican dissimilarity index as a test statistic

No Equal Population Constraint 5% Equal Population Constraint 1% Equal Population Constraint

0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3

0

5

10

15

Republican Dissimilarity Index

Den

sity

Truth

MCMC

RSG

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 14 / 21

Page 15: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Many Small Validation Maps

Robustness to many different mapsNo separate tuning or convergence diagnostics for each mapSetup

1 200 independent 25-precinct sets from Florida2 5 million iterations, taking every 500th draw3 conduct the Kolmogorov-Smirnov test and record the p-value

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

No Equal Population Constraint 20% Equal Population Constraint 10% Equal Population Constraint

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

0.00

0.25

0.50

0.75

1.00

Theoretical Quantile

Em

piric

al Q

uant

ile

● MCMC

RSG

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 15 / 21

Page 16: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Validation through Independent Uniform Sampling

Even if we can build ZDD, enumeration is computationally intensiveRandom sampling addresses this issue via Monte Carlo approximationIowa map: 4 districts

99 counties; no county is supposed to be split500 million independent drawsactual map has a population parity of less than 0.0001

Des Moines

Cedar Rapids

Congressional Districts of Iowa (2016)

District 1District 2District 3District 4

100 200 300 400Population (Thousands)

< 0.01 Parity: 0.00006%< 0.05 Parity: 0.00723%

< 0.10 Parity: 0.05746%

0.00%

0.02%

0.04%

0.06%

0.00 0.05 0.10 0.15Distance from Population Parity

Sha

re o

f Sam

pled

Pla

ns

Distribution of Population Parity on Iowa Map

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 16 / 21

Page 17: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Performance for the Iowa Validation Map

MCMC:8 chains with 250K iterations eachGelman-Rubin diagnostics indicates convergence after 30K iterations5% parity: 630K maps1% parity: 93K maps

RSG: 2 million independent draws

No Equal Population Constraint 5% Equal Population Constraint 1% Equal Population Constraint

0.00 0.03 0.06 0.09 0.00 0.03 0.06 0.09 0.00 0.03 0.06 0.09

0

10

20

30

40

50

Republican Dissimilarity Index

Den

sity

Truth

MCMC

RSG

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 17 / 21

Page 18: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

A New 250-Precinct Validation Map

28.00

28.25

28.50

28.75

29.00

29.25

−81.25 −81.00 −80.75 −80.50Longitude

Latit

ude

Population

0

5000

10000

15000

250−Precinct Validation Map

< 0.01 Parity: 1.95%

< 0.05 Parity: 10.08%< 0.10 Parity: 21.78%

0%

1%

2%

3%

0.00 0.05 0.10 0.15Distance from Population Parity

Sha

re o

f Sam

pled

Pla

ns

Distribution of Population Parity on 250−Precinct Map

The largest validation map taken from Florida2 districts total number of contiguous plans = 539

We uniformly sample 100 million partitions

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 18 / 21

Page 19: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Empirical Performance

MCMC:8 chains with 500K iterations eachGelman-Rubin diagnostics indicates convergence after 75K iterations5% parity: 3 million maps1% parity: 1.9 million maps

RSG: 4 million independent drawsNo Equal Population Constraint 5% Equal Population Constraint 1% Equal Population Constraint

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25

0

20

40

60

Republican Dissimilarity Index

Den

sity

Truth

MCMC

RSG

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 19 / 21

Page 20: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

Concluding Remarks

Increasing use of computational methods to generate redistrictingplans in legislatures and determine their legality in courtsScientific community must empirically validate the performance ofvarious proposed simulation methods

We apply the recently developed enumeration method1 more realistic validation maps including Iowa and a 250-precinct map2 an MCMC algorithm significantly outperforms a RSG algorithm3 the algorithm and validation maps will be made available

Ongoing work:1 consequences of various constraints other than contiguity and

population parity2 further scaling up the enumeration algorithm

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 20 / 21

Page 21: The Essential Role of Empirical Validation in Legislative … · 2020-03-04 · Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 20203/21. RedistrictingasaGraph-partitioningProblem

References

Fifield, Imai, Kawahara, and Kenny. (2020). “The Essential Role ofEmpirical Validation in Legislative Redistricting Simulation.” WorkingPaper

Fifield, Higgins, Imai, and Tarr. “Automated Redistricting SimulationUsing Markov Chain Monte Carlo.” Journal of Computational andGraphical Statistics, Forthcoming.

redist: Markov Chain Monte Carlo Methods for RedistrictingSimulation. available at CRAN

https://imai.fas.harvard.edu

Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 21 / 21