heterogeneity in disease resistance and the impact of

49
Heterogeneity in disease resistance and the impact of antibiotics in the US. Abstract: The discovery of antibiotics—beginning in 1937 with the widespread usage of sulfa drugs—led to rapid declines in mortality rates from bacterial infectious diseases; however, there is limited understanding of the potential heterogeneity in these mortality impacts. Our primary hypothesis is that the impact of antibiotics is moderated by a population’s inherent resistance mechanisms, from which more resistant populations benefited less than more susceptible populations. To measure this heterogeneity, we use a yearly panel of bacterial infectious disease deaths that covers most of the 20 th century for the 48 contiguous US states. We find that states with higher levels of innate resistance, measured by population-level genetic diversity within the human leukocyte antigen (HLA) system, have smaller mortality responses from the discovery of antibiotics, suggesting area-level genetic endowments of disease resistance and the discovery of medical technologies have acted as substitutes in determining levels of health across the US. We then use this measure of resistance as an instrument for levels of pre- 1937 bacterial disease mortality to estimate the impacts of infectious disease reductions on a variety of socioeconomic outcomes, showing previous results were understated. C. Justin Cook University of California-Merced [email protected] Jason M. Fletcher University of Wisconsin-Madison [email protected]

Upload: others

Post on 13-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Heterogeneity in disease resistance and the impact of

Heterogeneity in disease resistance and the impact of antibiotics in the US.

Abstract: The discovery of antibiotics—beginning in 1937 with the widespread usage of sulfa drugs—led to rapid

declines in mortality rates from bacterial infectious diseases; however, there is limited understanding of

the potential heterogeneity in these mortality impacts. Our primary hypothesis is that the impact of

antibiotics is moderated by a population’s inherent resistance mechanisms, from which more resistant

populations benefited less than more susceptible populations. To measure this heterogeneity, we use a

yearly panel of bacterial infectious disease deaths that covers most of the 20th century for the 48

contiguous US states. We find that states with higher levels of innate resistance, measured by

population-level genetic diversity within the human leukocyte antigen (HLA) system, have smaller

mortality responses from the discovery of antibiotics, suggesting area-level genetic endowments of

disease resistance and the discovery of medical technologies have acted as substitutes in determining

levels of health across the US. We then use this measure of resistance as an instrument for levels of pre-

1937 bacterial disease mortality to estimate the impacts of infectious disease reductions on a variety of

socioeconomic outcomes, showing previous results were understated.

C. Justin Cook

University of California-Merced

[email protected]

Jason M. Fletcher

University of Wisconsin-Madison

[email protected]

Page 2: Heterogeneity in disease resistance and the impact of

1

Introduction

At the beginning of the 20th century, deaths per 1000 live births in the United States within the first

month and first year of life were nearly 50 and 150, respectively. One century later, early life mortality

has been substantially reduced to 4.6 deaths in the first month and 6.9 deaths in the first year.1 A key

determinant of this decline is the result of improvements in environmental factors such as nutrition and

sanitation, access to effective care, and innovations in medical technologies and information.2 While

these historical examples suggest the possibility that extending these improvements to new and

underserved populations could likewise result in large scale increases in population health, it is also

possible that extending these improvements could be less effective than predicted due to heterogeneity

in effects. One source of this heterogeneity is the extent to which new medications substitute for, or

complement, existing characteristics in the population and their environments. Specifically, the long

histories of interactions between people and bacteria around the world have shaped both the people

and the bacteria, allowing some populations to have developed “natural” resistances that undercut the

usefulness of medications that serve the same purpose.

This paper explores the idea of interaction between two principal classes of determinants—that

medical technologies may interplay with innate population characteristics. The focus of the analysis is

on infectious diseases, where we exploit a genetically determined population resistance phenomenon

akin to herd immunity that preceded modern medicine. By incorporating population genetics ideas into

a macroeconomic health analysis, we pursue the hypothesis that medical innovations may have

systematically differential effectiveness that depends on the population under study. Specifically,

medical innovations could be a substitute for population genetic characteristics in the production of

health, such that expanding the innovations to new populations may be less effective than expected.

This substitution effect may be particularly likely in cases of infectious diseases. This paper uses the

expansion of highly successful medical innovations for infectious diseases to test this hypothesis.

1 Considering contemporary developing countries, which constitute a general proxy for early 20th century America, the overwhelming majority of deaths within the first year of life are due to infectious disease—primarily, diarrhea, pneumonia, and sepsis (WHO 2010). 2 For example, innovations in water sanitation are shown to reduce total mortality by 13% and infant mortality by 46% (Beach et al. 2016; Cutler and Miller 2004); Jayachandran et al. (2010) show evidence for a statistical break in bacterial infectious disease mortality in 1937 that is tied to the widespread usage of the sulfa drug. Prontosil; and Hansen (2014) shows improvements in life expectancy across US states from the broad package of medicinal innovations in the mid-20th century.

Page 3: Heterogeneity in disease resistance and the impact of

2

In so doing, we focus on the discovery and widespread usage of antibiotics in the mid-20th

century. Following Jayachandran et al. (2010) and given ongoing testing that led to the mass-

manufacturing of penicillin only eight years later in 1945, we take 1937 as a broad intervention year for

the beginning of a general antibiotic era (Aminov 2010; Clardy et al. 2009; Powers 2004).3 We show that

the beneficial effect of antibiotics was not the same for all populations. Long running historical

differences in disease environments shaped the innate response of populations, leaving some

populations more susceptible than others. We propose that more susceptible groups are likely to

benefit more from the introduction of these effective medicines than those that were better able to

resist infection, leading to a heterogeneous effect of antibiotics.

In short, due to differences in the timing of agriculture and the availability of domesticate

animals, societies have had historically different exposure to infectious disease (Wolfe et al. 2007).

Evidence of this historical difference is seen in the large amount of diversity in the set of genes

associated with the recognition and disposal of foreign pathogens, the human leukocyte antigen (HLA)

system (Prugnolle et al. 2005). Genetic diversity within the HLA system provides resistance to

populations by slowing the spread of infectious pathogens and is shown to be strongly associated with

cross-country health outcomes in years prior to, but not after, the mid-20th century health innovations

(Cook 2015). For the current work, we argue that population aggregations—i.e., U.S. states —with more

genetic diversity for HLA genes, i.e. those that have a higher innate resistance, will have a smaller

relative benefit from the discovery of antibiotics in 1937. The intuition being that states with greater

HLA diversity will be relatively better off in regard to infectious disease mortality in periods prior to the

use of antibiotics and will therefore not have as large a decline in mortality from the introduction of this

treatment.

To test our hypothesis, we construct a state-level measure of genetic diversity within the HLA

system for 1937. We also collect a yearly state-level panel of death rates attributed to bacterial

infections for the 20th century. Preliminary evidence of our hypothesis is presented in Figure 1, which

separates the average bacterial mortality rate for those states above and below median HLA genetic

diversity. As shown, more susceptible states (low HLA diversity) have initially higher relative levels of

bacterial mortality in periods prior to 1937. But after 1937, both low and high HLA states are shown to

3 Class of antimicrobial agent and year of FDA approval (from Table 1 of Powers 2004): Pencillin, 1941; Aminoglycosides, 1944; Chloramphenicol, 1949; Tetracyclines, 1950; Macrolides/Lincosamides/Streptogramins, 1952; Glycopeptides, 1956; Rifamycins, 1957; Nitromidiazoles, 1959; Quinolones, 1962; and Trimethoprim, 1968.

Page 4: Heterogeneity in disease resistance and the impact of

3

have sharp declines in mortality and eventually converge at the same time to a new lower baseline

mortality rate. Given the initial disparity in mortality rates, the more susceptible states, or states with

low amounts of diversity within the HLA system, appear to have a more rapid decline in mortality post

treatment. It is this more rapid decline—tied to a state measure of HLA diversity—that we seek to

estimate.

Indeed, we do show a robust relationship that is consistent with our hypothesis. State-level HLA

diversity has a significant positive association with the post-1937 decline in bacterial mortality rates.

The positive effect implies a larger decline in bacterial infections for states with lower levels of diversity

within the HLA system. This finding is robust to controlling for the demographic composition of the

state, measures of infrastructure and income, and other measures of genetic diversity.

We extend our main finding by proposing that HLA diversity can be used to better measure the

socio-economic effects of medical innovations. Prior studies use pre-innovation mortality as a measure

of intensity for exogenous health innovations—see e.g., Acemoglu and Johnson (2007), Hansen (2014),

and Bhalotra and Venkataramani (2015). Pre-innovation mortality, however, is likely associated with

persistent institutional or cultural factors that can lead to bias in estimating the effect of health

innovations on post-innovation socio-economic measures (Bloom et al. 2014). We propose that pre-

innovation mortality rates are a function of exogenous, latent differences in resistance between

populations and potentially endogenous institutional/cultural factors—e.g., access to nutrition and care,

public works projects, etc. Therefore, we instrument pre-period mortality with our measure of HLA

diversity to address the portion of pre-period mortality determined by endogenous institutional factors.

The instrumented pre-period mortality measure then allows a causal assessment of medical innovations

on population-level socioeconomic and health outcomes. Using our framework, we find statistically

significant effects on a range of socioeconomic outcomes in contrast to previous OLS estimates in the

literature that suggest limited effects.

Background

HLA Diversity: Functions and Hypothesized Causes for Population Variation

Our mechanism of genetic resistance is a measure of genetic diversity within the set of genes comprising

the HLA system. The HLA (human leukocyte antigen) system is associated with the creation of proteins

that recognize and are responsible for removing foreign cells from the body. This system of genes is

hypothesized to have undergone recent selection (Sabeti et al. 2006), and this selection is not for the

Page 5: Heterogeneity in disease resistance and the impact of

4

uniformity of variants that provide a particular benefit, rather the selection of the HLA system is one for

diversity—the set of genes comprising the HLA system being one of the most diverse regions in the

human genome (Jeffrey and Bangham 2000; Meyer et al. 2017). Importantly, genetic diversity within the

HLA system is a broad measure of immunity that is not strictly tied to a particular infectious pathogen.

HLA diversity provides resistance in a population by increasing the variety of immune responses

within the population. The measles virus provides an illustrative example: Obtaining the virus from a

(genetically similar) relative increases the likelihood of death from the virus twofold compared to

obtaining the virus from a (genetically distant) unrelated individual (Garenne and Aaby 1990). For

individuals within the population, rare variants (corresponding to rare immune response) are favorable,

in that pathogens are likely to adapt and overcome common responses. Having a rare immune response

would therefore be beneficial; hence, the rare variant would increase in frequency.4

The roots of population differences in the level of genetic diversity within the HLA system are

tied to the out-of-Africa migration and the Neolithic Revolution. Serial founder effects from the

migration out of East Africa are associated with overall genetic diversity of a population, including the

level of genetic diversity within the HLA system (Ashraf and Galor 2013, Prugnolle et al. 2005, Qutob et

al. 2012, Ramachandran et al. 2005).5 Additional variation in HLA heterozygosity is tied to the

differential timing and composition of the transition to agriculture. The presence of domesticate

animals and the large, dense, and sedentary populations that could only be achieved after agriculture

provided the basis for an epidemiological transition that led to a large increase in the number of

infectious diseases (Cook 2015).

Data and Empirical Methodology

State-Level HLA Heterozygosity

Our primary analysis will be at the state-level within the US, which allows us to avoid country-level

confounders that are likely to bias our estimations as well as providing quality improvements in many of

4 This is only one possible mechanism for the selection of diversity within the HLA system. For a summary of others, please see Spurgin and Richardson (2010). For our purposes, the precise cause of diversity within the HLA system is not important. 5 A negative linear relationship exists between a population’s level of genetic diversity and the distance the population is along historic migration routes from East Africa. This relationship is due to a serial founder effect, in which emigrating (or founding) populations contain a subset of the genome of the origin population. As a result, subsequent emigration reduce genetic diversity along migration routes. This is discussed in greater detail in Ashraf and Galor (2013).

Page 6: Heterogeneity in disease resistance and the impact of

5

the health and economic variables we use. As discussed in greater detail in Cook (2015), the measure of

HLA diversity is derived from the Allele Frequency Database, or ALFRED (Kidd et al. 2003). ALFRED

contains a wide array of genetic data for approximately 50 anthropologically defined ethnicities,

covering all continents in which humans live. Our focus is solely on genetic variants within the HLA

System. The HLA system is a collection of 239 genes located on the sixth chromosome (Shiina et al.

2004). From ALFRED, we are able to obtain data for 156 different single nucleotide polymorphisms, or

SNPs, for each of 51 distinct ethnicities. A SNP is a single change along a strand of DNA. These genetic

variants, or alleles, are then used to calculate the measure of genetic diversity specific to immune

function: expected heterozygosity.

Expected heterozygosity is a commonly used measure from population genetics that measures

the probability that two individuals differ in their genetic variant at a particular locus (or SNP in our

case). Formally, expected heterozygosity is defined as:

𝐻𝑒𝑥𝑝 = 1 −1

𝑚∑ ∑ 𝑝𝑖

2

𝑘𝑙

𝑖=1

𝑚

𝑙=1

where 𝑝𝑖 represents the fraction of allele 𝑖 within each population (ethnicity for our case), and expected

heterozygosity is found by the average across the 𝑚 loci (the 156 SNPs).

Expected heterozygosity scores are calculated for each ethnicity, which are then aggregated into

the country-level measure by taking the weighted average of each ethnic-specific heterozygosity score.

Ethnic compositions, or the fraction of a country’s population attributed to each ethnicity, are found

from Alesina et al. (2003). Of note, the 51 ethnicities in ALFRED are not all directly matched to an

identical ethnicity from Alesina; rather, language similarities are used to match the ethnicities of ALFRED

to closely matched ethnicities within Alesina. This method yields data for roughly 175 countries, of

which 131 are used within the analysis of Cook (2015). However, in our analysis, instead of using

country-level data, we will focus on a single country and examine variation across states in the US. Thus,

we will need to extend the methods of calculating country-level measures of genetic resistance found in

the literature to allow state-level measures. To do so, we use data from the 1980, 1990, and 2000 US

Census’s 5% state sample, which is a 1-in-20 random sample of the US population (Ruggles et al. 2017).

The 1980 Census is the first available census that records each respondent’s self-reported ancestry. We

then match this ancestry to either a specific ethnicity or country for which we have prior measurement

Page 7: Heterogeneity in disease resistance and the impact of

6

of HLA heterozygosity.6 This measure constructed from the 1980/1990/2000 Census ancestry

categories, however, would capture periods that are after the proposed intervention data of 1937.

Therefore, to account for the pre-period level of genetic resistance, we use reported state of birth and

age to construct a measure of state-level HLA heterozygosity for 1937. In other words, we aggregate by

state of birth, rather than state of residence, for respondents born before or during 1937. This results in

a 1937’s measure of genetic resistance for the 48 contiguous US states, our primary measure of genetic

resistance. Our primary measure of diversity within the HLA system considers reported ancestry groups

to be segregated (i.e., no intermarriage), so that a state’s HLA diversity is simply a weighted average of

the HLA diversity from the state’s ancestral composition.7 Reported ancestries along with the respective

match to Cook’s HLA heterozygosity score are reported in the appendix. Summary statistics for the US

and by region are given in Table 1. Additionally, relative state-level HLA diversity comparisons are

shown in Figures 1 and 2.

Outcome and Control Variables

Vital statistics data are from the annual Vital Statistics of the United States. The mortality rate from

each specified infectious disease is by state. Data for years 1900-1930 are from Grant Miller’s NBER

data, and we collected vital statistics data from 1931-2000. The result is an unbalanced panel of the

contiguous US states that covers much of the 20th century (we note instances of missing data in our

results tables below). The infectious diseases of interest include typhoid fever, scarlet fever, pertussis

(whooping cough), tuberculosis, diphtheria, flu and pneumonia, diarrhea and enteritis, syphilis, and

maternal mortality (a proxy for puerperal fever).

Our base specification comprises a difference-in-differences framework that includes state,

year, and year-by-census region fixed effects, and further controls are piecemeal introduced in three

sets. The first set of controls attempts to account for additional unobserved state-year variation in the

bacterial mortality rate by including the residual mortality rate (i.e., the total mortality rate less bacterial

mortality) and the per year count of bacterial infections used to calculate the bacterial mortality rate.

Additional controls, which attempt to account for a differential post-1937 trend, include a set of

demographic controls—the fraction of a state’s population that is black, the fraction of a state’s 1937

population that is foreign born, the urbanization rate in 1937, and a measure of ethnic fractionalization

6 The average is taken for individuals that report two different ancestries. 7 Table 8 considers alternatives to this assumption of segregated ethnicities/ancestral groups. In so doing, we calculate a measure of HLA diversity based upon fully mixed ethnic groups by creating a weighting average of each individual genetic variant used to compute expected heterozygosity.

Page 8: Heterogeneity in disease resistance and the impact of

7

based on the census-level reported ethnicity—and a set of infrastructure controls—schools per square

mile in 1937, hospitals per square mile in 1937, physicians per capita in 1937, and World War II military

spending. Further definitions and sources for all variables can be found in the Variable Appendix.

Empirical Strategy

The first step of our empirical analysis is concerned with verifying both the treatment—the use of sulfa

drugs and other antibiotics after 1937—and its intensity—measured by genetic diversity within the HLA

system. To verify the treatment date, we use within-state estimation to show declining mortality rates

from bacterial infections after the proposed date of 1937.8 In establishing a differential state-level

response from genetic diversity, we will pool years in the pre and post periods. The idea being that our

measure of innate resistance has a hypothesized significant effect on bacterial infections before

antibiotics that becomes significantly weaker in the post periods.

Secondly, we will pursue difference-in-differences specifications in the spirit of Acemoglu and

Johnson (2007) and Hansen (2014).9 In so doing, we compare the relative change in mortality in the

post-innovation period (to the pre-innovation period) between states that were either more or less

genetically susceptible to infectious disease. HLA heterozygosity measures the intensity of treatment.

States with greater levels of diversity within the HLA system will be more resistant to mortality from

infectious disease in the pre-innovation period and are hypothesized to benefit less from the medical

innovations; the opposite being true for states with relatively low HLA heterozygosity. And in order to

eliminate bias from endogenous usage of the treatment (i.e., the newly developed antibiotics), we will

use a common post-innovation treatment date for all states, 1937.

Formally, our estimation is given by:

ln 𝐵𝑎𝑐𝑡𝑒𝑟𝑖𝑎𝑙 𝑀𝑜𝑟𝑡𝑖𝑡 = 𝛽1 ln 𝐻𝐿𝐴 ℎ𝑒𝑡𝑖 × 𝐼𝑡𝑝𝑜𝑠𝑡

+ 𝛽2𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒𝑠𝑖𝑡

+𝛽3 ln 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑀𝑜𝑟𝑡𝑖𝑡 + ∑ 𝛽𝑗𝑋𝑖 × 𝐼𝑡𝑝𝑜𝑠𝑡

𝑗

+ ∑ 𝛽𝑟𝑅𝑖 × 𝛾𝑡 +

𝑟

𝛾𝑐 + 𝛾𝑡 + 𝜀𝑖𝑡

Where 𝑖 represents each of the contiguous US states, 𝑡 represents time, or each year between 1900 and

2000. The natural log of our primary measure of innate resistance is denoted by ln 𝐻𝐿𝐴 ℎ𝑒𝑡, and 𝐼𝑡𝑝𝑜𝑠𝑡

8 Extensive analysis on the use of 1937 as a post-sulfa drugs intervention date is given by Jayachandran et al., who perform a number of strategies in verifying 1937 as the treatment year. 9 Our difference-in-differences specification is also partially derived from Nunn and Qian (2011), which is similar in its intensity-of-treatment framework.

Page 9: Heterogeneity in disease resistance and the impact of

8

is an indicator for the post-innovation period, or an indicator for periods 1937-2000. State and year

fixed effects are represented by 𝛾𝑐 and 𝛾𝑡, respectively. To account for differential characteristics that

may be associated with HLA diversity and the corresponding change to bacterial mortality, time

invariant controls (c.1937) interacted with the post-1937 indicator are given by 𝑋𝑖 × 𝐼𝑡𝑝𝑜𝑠𝑡

. While

coverage for most years exists, the availability of some bacterial infections are not present in every year.

We therefore control for the number of bacterial infections included in the construction of bacterial

mortality. We also control for the yearly state specific overall mortality rate less bacterial infections.

This is done to account for unobserved state specific trends that may also influence bacterial mortality.

Our focus is the heterogeneity in the decline in mortality following the use of antibiotics.

Therefore, our coefficient of interest is represented by 𝛽1 above and is hypothesized to be positive. The

positive coefficient implies that states with higher HLA diversity had a lower decline in mortality

following the introduction of antibiotics in 1937. The unadjusted relationship is presented by Figure 1,

which shows the decline in bacterial mortality for states above and below median HLA diversity. This

figure represents our primary hypothesis by showing that more diverse states had a lower initial level of

bacterial mortality, resulting in a smaller decline post-1937.

Results

Preliminary Analysis

The first step of our analysis is to verify a declining mortality from bacterial infections after 1937. This is

shown in Table 2, which regresses the mortality rate from a specified infectious disease on a post-1937

indicator while including state fixed effects and a time trend and its interaction with the post-1937

indicator. As shown, the post-1937 indicator typically has a statistically significant negative effect on

mortality rates for bacterial infections considered.10

Table 3 pools all considered bacterial infections across years in the pre-1937 and post-1937

periods. Our hypothesis is that our state-level measure of HLA heterozygosity has a strong negative

association with mortality from infectious disease in the pre-innovation period, and that this

relationship weakens once antibiotics begin to be used in the mid-20th century.11

10 The exception being syphilis, which is shown to have an increase post-1937. This is due to a combination of relatively small pre-period, the inclusion of time trends, and a post-1937 spike in syphilis deaths. Syphilis was not treatable by sulfa drugs, rather it was treated following the introduction of penicillin 8 years later (Davenport 2012). 11 This is similar to the country-level analysis in Table 4 of Cook (2015).

Page 10: Heterogeneity in disease resistance and the impact of

9

For the pre-1937 period, a strong negative association exists between genetic diversity within

the HLA system and bacterial mortality. This negative relationship implies that a higher level of HLA

diversity (i.e., more resistant) is associated with a lower level of mortality from bacterial infections.

Using the coefficient from column (1), a standard deviation increase in HLA diversity is associated with

roughly a 20 percent decline in bacterial mortality rates.

For years after 1937, however, the point estimate of the coefficient of HLA diversity is about half

of the pre-1937 estimate. A further illustration of the diminishing role of HLA diversity following 1937 is

given by Figure 3, which plots the year-by-year coefficient estimate of our state-level HLA diversity on

bacterial mortality. As shown, HLA diversity has a statistically significant negative association with

bacterial mortality in early years. The effect of HLA diversity, however, becomes statistically

indistinguishable from zero in more contemporary periods. This supports our arguments for a reduced

role of innate resistance in the post-antibiotic period. To estimate this contemporary null effect,

columns (3) and (4) equally divide the post-1937 period. As shown, the significant effect of column (2) is

driven solely in the years immediately after 1937, a time in which additional antibiotics are being

introduced.

Table 4 repeats the within-state estimation of Table 2, replacing the simple post-indicator with

its interaction with HLA diversity. Again, our hypothesis is that the coefficient of HLA diversity

interacted with the post-1937 indicator will be positive, implying lower levels of HLA diversity

experienced larger declines in bacterial mortality following the 1937 intervention. As hypothesized, the

interaction between the post-1937 indicator and HLA diversity is generally positive and statistically

significant at conventional levels.

Our baseline analysis bundles all available bacterial infections. This is given by column (10) of

Table 4. Regressing all bacterial infections on our primary coefficient produces the expected positive

coefficient that is significant at the 1-percent level. From this coefficient and at the mean-HLA diverse

state, a standard deviation decline in HLA diversity is associated with approximately a 10% decline in

bacterial mortality following the 1937 intervention. Those states with low levels of HLA diversity are

hypothesized (and shown in Table 3) to have a lower innate resistance that leads to higher levels of

mortality in the absence of medicine. Following the innovations of 1937 and the subsequent

innovations of penicillin, etc., the initial gap in bacterial mortality rates between the low and high HLA

diverse states begins to close, being virtually eliminated by 1960.

Page 11: Heterogeneity in disease resistance and the impact of

10

Baseline Results

Our baseline difference-in-differences estimation is given in Table 5. All bacterial infections listed in

Table 4 are summed within year to create our bacterial mortality measure. As outlined by Equation 2,

our primary hypothesis is that the coefficient on the natural log of HLA heterozygosity interacted with

the post-1937 indicator is positive, implying a sharper decline in mortality for states with low HLA

diversity, or a low level of resistance.

Column (1) gives the bivariate difference-in-differences estimation, controlling only for state,

year, and year-by-division effects. A positive and statistically significant coefficient of interest is

estimated. This estimated coefficient remains similar throughout Table 5, which piecemeal includes our

baseline set of controls. The count of bacterial infections and the state-year residual mortality rate are

included in column (2). Demographic variables—i.e., the mean 1932-1936 fractions of the black

population, foreign born population, urbanization rate, and ethnic fractionalization—are included in

column (3), and infrastructure measures—i.e., schools per mile, hospitals per mile, education

expenditures per capita, physicians per capita, total World War II spending (c.1940-1945), and initial

income levels—are included in column (4). All controls are included in column (5) with little effect on

the magnitude or significance of the coefficient of HLA diversity.

To show evidence that HLA heterozygosity impacts this process solely through resistance to

infectious pathogens, Table 6 considers a falsification test, using mortality from all other causes as the

dependent variable. No significant relationship exists between other causes of mortality and HLA

heterozygosity, providing further support for HLA diversity’s only role as a resistance mechanism to

infectious pathogens.

Robustness to Ashraf and Galor’s Genetic Diversity

As argued and shown by Ashraf and Galor (2013; hereafter AG), genetic diversity is associated with

economic development. This is further shown in the US by Ager and Bruekner (2016). It is possible that

our measure of genetic diversity is simply accounting for the effect of AG. Therefore, Table 7 controls

for a state’s overall level of genetic diversity. To create AG’s measure of genetic diversity, we match

the self-reported ancestry to a country in AG’s data. Then we take the state-level weighted average of

this diversity measure; this is identical to our calculation of state HLA diversity.

Panel A includes AG’s overall diversity interacted with the post-1937 indicator into our baseline

analysis of Table 5. As shown, the point estimate of the coefficient is reduced in magnitude and the

Page 12: Heterogeneity in disease resistance and the impact of

11

standard error increases, but the coefficient of HLA diversity is generally statistically indistinguishable

from the estimate when omitting AG’s diversity measure. This is primarily due to inflated standard

errors for the coefficient of interest from the inclusion of AG’s measure of genetic diversity. AG’s

measure of genetic diversity is due to serial founder effects in the migration out of Africa. As shown in

Table 2 of Cook (2015), this pre-agricultural base of genetic diversity has a strong positive association

with the amount of genetic diversity within the HLA system. Indeed, the correlation coefficient between

the two measures is 0.74. Given this close relationship between AG’s overall diversity and diversity

within the HLA system, high collinearity is expected.

As an alternative way to account for the overall level of diversity, Panel B regresses bacterial

infectious disease mortality on the ratio of a state’s HLA diversity to the state’s overall measure of

genetic diversity. The effect of this ratio is mostly statistically significant at conventional levels and is in

line with the previous estimations of Table 4.12 More precisely, states that contain a higher amount of

HLA diversity to overall diversity within the genome generally experienced a lessened decline in

bacterial mortality rates following the intervention of 1937.

Alternative Strategies for Aggregating State Ethnicities

In calculating HLA diversity, we simply take the weighted average of each ancestral group’s HLA

diversity. This assumes that all ancestral groups within a state are segregated. An alternative approach

is to take the weighted average for the frequency of each genetic variant and then calculate expected

heterozygosity from these state-level gene frequencies. This method assumes that populations are

completely mixed. We consider these two methods of calculating HLA diversity to be two extremes on

the spectrum of ethnic interactions.13

In short when considering mixed ethnicities, states with larger minority populations tend to

have large increases in measured diversity within the HLA system. For example, Louisiana has the

highest amount of diversity for the mixed ethnic score but is 32nd when considering the segregated

score. Indeed, the highest 8 scores for the mixed HLA diversity score belong to states in the South.

Therefore, when considering the mixed score it is imperative to control for region (col. 2 and 5) and the

fraction of the population that is black (col. 3 and 5).

12 The p-value for the coefficient of the ratio of HLA diversity to AG’s diversity is less than 0.15 in columns (3) and (5). 13 The correlation coefficient between the two measures of HLA diversity is 0.37.

Page 13: Heterogeneity in disease resistance and the impact of

12

Panel A of Table 8 replaces our primary measure of HLA diversity, which considers

ethnic/ancestral populations to be segregated, with a measure that considers ethnic populations to be

fully mixed. The coefficient of mixed HLA diversity is insignificant and the opposite sign than expected

except when accounting for census region. Once controlling census regions, the effect of HLA diversity

is similar (though slightly larger in magnitude) to the estimates of our primary segregated measure.

Panel B splits the differences between the two extremes of our assumptions of ethnic

mixing/segregation by taking the average between our primary segregated measure and the mixed

measure of Panel A. This average yields more consistent, statistically significant coefficients that are

slightly larger in magnitude than our baseline segregated measure.

Instrumenting Pre-Period Bacterial Mortality

A common strategy in evaluating the impacts of health innovations is to perform difference-in-

differences estimation while using pre-innovation health outcomes as a measure of spatial variation to

measure intensity of the innovation.14 The idea is that the effect of the treatment will be larger for

areas that will directly benefit more from the treatment. In the current setting, the effect of sulfa drugs

(and antibiotics generally) will be larger for states with higher bacterial mortality rates in the years

preceding the intervention.

As noted by Bloom et al. (2014), however, pre-period health outcomes are not randomly

allotted and may be associated with unobservable factors that could lead to bias in estimating the

treatment effect for post-innovation SES outcomes. For example, the US South has the highest average

bacterial mortality rate prior to the 1937 intervention date.15 The factors that determine this higher

level for the South are also likely correlated with future growth in SES outcomes. In part, these factors

can be controlled for (e.g., the inclusion of pre-period income in our base analysis), but hard to measure

institutional or cultural factors may remain that could be associated with slowed human capital

accumulation, a slower demographic response, and reduced growth in income and wealth. When not

accounting for this source of bias, the association between pre-period health differences and post-

period growth in SES measures is muted.

14 See for example, Acemoglu and Johnson (2007), Bhalotra and Venkataramani (2015), Bleakley (2007), Hansen (2014), and Hansen and Strulik (2017). 15 For 1932-1936 mean bacterial mortality rates, the average for the South is 256 compared to 181 for the Northeast, 246 for the West, and 175 for the Midwest.

Page 14: Heterogeneity in disease resistance and the impact of

13

Our use of HLA diversity can be used to separate quasi-random, latent causes of pre-period

health from institutional/cultural/economic factors. Pre-period bacterial mortality rates are a function

of population resistance/susceptibility—measured by HLA diversity—and a collection of

institutional/cultural/economic factors—observable and unobservable—that are tied to bacterial

mortality through nutrition, public works projects, environmental conditions, etc. By estimating the

change to pre-period bacterial mortality rates tied only to the innate ability of the population to resist

infectious disease, we in effect remove the bias attributable to institutional/cultural/economic factors,

leading to stronger statistical associations between pre-period health and post-innovation SES

outcomes. This is what we find in Tables 9 and 10.

Table 9 repeats our base estimating strategy replacing the interaction between the post-1937

indicator and state HLA diversity with a similar interaction with pre-period bacterial mortality rates. Our

focus is on differential effects on a range of SES outcomes from instrumenting pre-period health with

HLA diversity. Panel A gives OLS estimates, Panel B provides 2SLS estimates, and outcomes are

respectively by column the bacterial mortality rate, the birth rate, years of schooling, worker

experience, population, and real income.16

Column (1) of Panel A follows the spirit of the previous literature by estimating the differential

decline in bacterial mortality from pre-period health differences. As expected, states with higher pre-

period mortality rates experienced a sharper decline following the innovation of antibiotics. When

considering SES outcomes, however, the differential effect from pre-period health is shown to not be

significantly associated. This is seen in columns (2)-(6), where the initial bacterial mortality is

insignificantly related to state-level changes in the birth rate, years of schooling, worker experience,

population, or income. In contrast, the estimates of Panel B, which instruments initial bacterial

mortality, are shown to increase in absolute magnitude for all SES outcomes (col. 2-6) while not

changing the association with declines in bacterial mortality (col. 1). This is supportive of our hypothesis

that the estimated effect of un-instrumented pre-period health is biased towards zero. Notably, the

effect of initial bacterial mortality becomes statistically significant for birth rates and measures of

human capital (col. 2-4), while remaining statistically insignificant for population and income.

Our estimations suggest that an important channel of medicinal innovations in shaping future

income (or output) is through increases in human capital. The estimates of Panel B of Table 9 are

16 The natural log of all outcomes is used in Table 9. The Variable Appendix gives further discussion and sources for all outcome variables.

Page 15: Heterogeneity in disease resistance and the impact of

14

supportive of a Beckerian framework, where the reduction in infectious diseases, which primarily affect

the young (and old), reduces the risks associated with investing in children. These early-life human

capital investments, however, may be difficult to detect in our aggregate state-level analysis.

Therefore, to better measure the cohort effects across the life-course of the 1937 intervention,

we examine birth-year differences in effects, similar to that of Bhalotra and Venkataramani (2015). In so

doing, Table 10 mirrors the analysis of Table 9 but focuses attention on examining the impacts of

medical interventions as a “cohort” (or birth year) phenomenon rather than a “period” (or survey year)

phenomenon. To implement this approach, we replace the contemporary state-level sample with a

birth-year by birth-state sample that is aggregated from individuals in three waves (1980, 1990, and

2000) of the 5% census sample.17 Observations are at the census wave-birth year-birth state level,

otherwise aggregations are similar to those of Table 9 but now take into account cohort differences in

place of aggregate contemporary differences.18 Our focus remains on the state-level difference in

response due to initial bacterial mortality rates. Again, Panel A will present OLS estimates, and Panel B

will give 2SLS estimates, instrumenting initial mortality with HLA diversity. A key difference between

Table 9 and 10 being that treatment is based upon birth year in Table 10. This difference allows us to

better measure life-course differences that may make a smaller impact on the state-level averages of

Table 9.

The OLS estimates of initial bacterial mortality for Panel A now become generally significant with

the expected sign for most SES measures.19 Following the 1937 intervention, states with higher levels of

initial mortality, and therefore benefited more from the intervention, generally are seen to have positive

statistically significant effects on reductions in the number of children born, years of schooling, high

school graduation rates, reductions in poverty, and family income.20 But as we argue above, this

17 Due to data limitations, the sample is restricted to those born from 1920 to 1970. The sample is also restricted to women 40 and older for the number of children born, and to all 30 and older for years of schooling, high school graduation rates, the poverty rate, and family income. This implies that older census waves are absent for more recent birth years; e.g., those born in 1970 are only measured for the 2000 wave of the 5% sample. 18 The census wave level is necessary to account for age differences at the time of census sampling. For example, an individual born in 1940 would be 40 during 1980 wave, 50 during the 1990 wave, and 60 during 2000 wave. Considering the life-cycle of income, it is imperative that census wave be accounted for. 19 SES measures for Table 10 are not identical to those of Table 9; however, the included variables—children born, years of schooling, high school graduation, an indicator of poverty, and family—are relevant and similar in context. 20 In comparing Tables 9 and 10, it is expected that the effect of initial bacterial mortality would be larger in the cohort analysis of Table 10. The absence of this difference in magnitude may be attributed to the sampling periods. We measure accumulated years of schooling up until 2000, but because of the age restriction, most of the analysis is weighted to those in the 1980 and 1990 censuses. When restricting the state sample to 1920-1970, the 2sls coefficient is 0.098; for 1920-1980, it is 0.12; for 1920-1990, it is 0.13. Furthermore, the cohort analysis of

Page 16: Heterogeneity in disease resistance and the impact of

15

estimated effect is likely to be biased towards zero by unobserved institutional/cultural/economic

factors. To get around this bias, we propose instrumenting initial bacterial mortality with HLA diversity;

estimates from this two-stage estimation are given in Panel B. As in Table 9, the absolute magnitude of

the coefficient of interest increases in all columns. Importantly, the instrumented coefficient of initial

bacterial mortality becomes significantly related with income in column (6) of Panel B.

The estimates from Tables 9 and 10 suggest two main findings. First, the use of pre-period

health outcomes bias the spatial effects of the innovation towards zero. Using our measure of latent

resistance as an instrument for pre-period health teases out pre-period unobservable

institutional/cultural/economic differences that are likely negatively associated with future growth in

socioeconomic outcomes. This is shown in the panel differences of Tables 9 and 10, which routinely

estimate larger two-stage coefficients. Second, the use of birth-cohorts allows us to directly account for

early-life impacts of the 1937 health innovation. The OLS estimates of Table 10 show clear significant

impacts on human capital accumulation that are not found in similar state-level analysis of Table 9.

When accounting for both the life-course effect and the potential bias in using pre-period health, a clear

significant impact on human capital accumulation that ultimately is associated with income

improvements is seen in column (6) of Panel B; a finding absent from prior studies (i.e., Acemoglu and

Johnson 2007 and Hansen 2014).

Conclusion Our analysis establishes a heterogeneous response to the medical innovations of the mid-20th century

that is tied to a long-run, genetically observed difference in ancestral exposure to infectious pathogens.

A statistically strong and robust relationship exists between this measure of innate resistance, HLA

heterozygosity, and the state-level mortality rate from bacterial infections prior to the invention and

widespread usage of antibiotics beginning in 1937. We hypothesize and show states composed of more

resistant populations tended to benefit less from the initial and continued usage of antibiotics.

We also propose and provide evidence that this innate measure of resistance can be used to

separate impacts of unobservable institutional and cultural factors that may bias estimation examining

the effects of the health innovations on future changes to SES outcomes. Using our approach and

contrary to prior studies (Acemoglu and Johnson 2007; Hansen 2014), we find statistically significant SES

Table 10 does not allow for effects on young children; for example, 2 year-olds in 1937 are in the control group and categorized as unaffected by the change in exposure.

Page 17: Heterogeneity in disease resistance and the impact of

16

benefits from the reduction in bacterial infectious disease mortality from the innovation of antibiotics.

The absence of results in these studies are likely due to both bias in using pre-period mortality in

measuring differential spatial impacts and the time sensitive nature of human capital accumulation in

affecting income, or output per capita.

We recognize that our measure of innate resistance has limitations. Measurement errors

abound, implying our estimated relationship may be understating the true association. Additionally, we

focus strictly on mortality and are unable to account for the harmful effects of morbidity.21 This focus

on mortality is also likely centered on the young and old, preventing us from extending our results to

more general health improvements that have a more demographically uniform distribution of health

gains or losses. Further studies are needed to either confirm or rule-out any potential role of aggregate

health improvements in shaping macroeconomic outcomes.

21 There is likely to be a high level of overlap between the eradication mortality from infectious disease and the eradication of morbidity from infectious disease. Furthermore, our mechanism of innate resistance should serve to reduce morbidity in a method similar to its role in reducing mortality.

Page 18: Heterogeneity in disease resistance and the impact of

17

References Acemoglu, D., & Johnson, S. (2007). Disease and development: The effect of life expectancy on

economic growth. Journal of Political Economy, 115(61), 925-985.

Ager, P., & Brueckner, M. (2016). Immigrants’ genes: Genetic diversity and economic development in

the US. Working Paper.

Aminov, R. (2010). A brief history of the antibiotic era: Lessons learned and challenges for the future.

Frontiers in Microbiology, 1, 134

Ashraf, Q., & Galor, O. (2013). The “Out of Africa” hypothesis, human genetic diversity, and comparative

economic development. American Economic Review, 103(1), 1-46.

Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S., & Wacziarg, R. (2003). Fractionalization.

Journal of Economic Growth, 8(2), 155-194.

Bhalotra, S., & Venkataramani (2015). Shadows of the captain of the men of death: Health innovation,

human capital investment, and institutions. Working Paper.

https://sites.google.com/site/soniaradhikabhalotra/research--

topics/Paper_WithAppendices.pdf?attredirects=0&d=1

Beach, B., Ferrie, J., Saavedra, M., & Troesken, W. (2016). Typhoid fever, water quality, and human

capital formation. The Journal of Economic History, 76(1), 41-75.

Bleakely, H. (2006). Disease and development: Comments on Acemoglu and Johnson (2006). NBER

Summer Institute on Economic Fluctuations and Growth.

http://www.personal.umich.edu/~hoytb/Bleakley_Comments_Acemoglu_Johnson.pdf

Bleakely, H. (2007). Disease and development: Evidence from hookworm eradication in the American

South. Quarterly Journal of Economics, 122(1), 73-117.

Bleakely, H. (2010). Disease and development: Evidence from the American South. Journal of the

European Economic Association, 1(2-3), 376-386.

Bloom, D.E., Canning, D., & Fink, G. (2014). Disease and development revisited. Journal of Political

Economy, 122(6), 1355-1366.

Clardy, J., Fischbach, M.A., & Currie, C.R. (2009). The natural history of antibiotics. Current Biology,

19(11), R437-R441.

Cook, C.J. (2015). The natural selection of infectious disease resistance and its effect on contemporary

health. The Review of Economics and Statistics, 97(4), 742-757.

Page 19: Heterogeneity in disease resistance and the impact of

18

Cutler, D., & Miller, G. (2005). The role of public health improvements in health advances: The

twentieth century United States. Demography, 42(1), 1-22.

Doherty, P., & Zinkernagel, R. (1975). A biological role for the major histocompatibility antigens. The

Lancet, 305(7922), 1406-1409.

Fortson, J.G. (2009). Mortality risk and human capital investment: The impact of HIV/AIDS in sub-

Saharan Africa. The Review of Economics and Statistics, 93(1), 1-15.

Garenne, M., & Aaby, P. (1990). Pattern of exposure and measles mortality in Senegal. Journal of

Infectious Disease, 161(6), 1088-1094.

Hansen, C.W. (2014). Cause of death and development in the US. Journal of Development Economics,

109, 143-153.

Hansen, C., & Strulik, H. (2017). Life expectancy and education: Evidence from the cardiovascular

revolution. Journal of Economic Growth, 22(4), 421-450.

Jayachandran, S., Lleras-Muney, A., & Smith, K.V. (2010). Modern medicine and the twentieth century

decline in mortality: Evidence on the impact of sulfa drugs. AEJ: Applied, 2(2), 118-146.

Kidd, K., et al. (2003). ALFRED–the ALlele FREquency Database–update. American Journal of Physical

Anthropology. Annual Meeting Issue: Supplement S36, 128. (URL: http://alfred.med.yale.edu)

Jeffery, K., & Bangham, C. (2000). Do infectious diseases drive MHC diversity? Microbes and Infection,

2(11), 1335-1341.

Lorentzen, P., McMillan, J., & Wacziarg, R., (2008). Death and development. Journal of Economic

Growth, 13, 81-124.

Meyer, D., Aguiar, V.R., Bitarello, B.D., Brandt, D.Y.C., & Nunes, K. (2017) A genomic perspective on HLA

evolution. Immunogenetics, DOI 10.1007/s00251-017-1017-3.

Miller

Powers, J.H. (2004). Antimicrobial drug development—the past, the present, and the future. Clinical

Microbiology and Infection, 4, 23-31

Prugnolle, F., Manica, A., Charpentier, M., Gu`egan, J.F., Guernier, V., & Balloux, F. (2005). Pathogen-

driven selection and worldwide HLA class I diversity. Current Biology, 15(11), 1022-1027.

Qutob, N., Balloux, F., Raj, T., & others (2012). Signatures of historical demography and pathogen

richness on MHC class I genes. Immunogenetics, 64(3), 165-175.

Ramachandran, S., Deshpande, O., Roseman, C., Rosenberg, N., Feldman, M., & Cavalli-Sforza, L. (2005).

Support from the relationship of genetic and geographic distance in human populations for a serial

founder effect originating in Africa. PNAS, 102(44), 377-392.

Page 20: Heterogeneity in disease resistance and the impact of

19

Ruggles, S., Genadek, K., Goeken, R., Grover, J., & Sobek, M. Integrated Public Use Microdata Series:

Version 7.0 [dataset]. Minneapolis: University of Minnesota. https://doi.org/10.18128/D010.V7.0.

Sabeti, P., Schaffner, S.F., Fry, B., Lohmueller, J., Varilly, P., Shamovsky, O., Palma, A., Mikkelsen, T.S.,

Altshuler, D., & Lander, E.S. (2006). Positive natural selection in the human lineage. Science, 312(5780),

1614-1620.

Spurgin, L., & Richardson, D. (2010). How pathogens drive genetic diversity: MHC, mechanisms and

misunderstandings. Proceedings. Biological Sciences / The Royal Society, 277(1684), 979-988.

Stock J., & Yogo M. (2005). Testing for weak instruments in linear IV regression. In: Andrews DWK

Identification and Inference for Econometric Models. New York: Cambridge University Press; 2005. pp.

80-108.

Shiina, T., Inoko, H., & Kulski, J.K. (2004). An update of the HLA genomic region, locus information and

disease associations: 2004. Tissue Antigens, 64(6), 631-649.

Turner, C., Tamura, R., Mullholland, S., & Baier, S. (2007). Education and income of the states of the

United States: 1840-2000. Journal of Economic Growth, 12(2), 101-158.

Wolfe, N., Dunavan, C., & Diamond, J. (2007). Origins of major human infections. Nature, 447(7142),

279-283.

World Health Organization (2010). Coutdown to 2015 Decade Report (2000-2010).

Page 21: Heterogeneity in disease resistance and the impact of

20

Variable Appendix HLA Regressors of Interest

State HLA Diversity: This variable is a weighted average of state-level ancestral HLA heterozygosity for

individuals born between 1933 and 1937 (individuals born up to 5 years before the introduction of sulfa

drugs in 1937). Self-reported ancestry from the 5% sample of the 1980, 1990, and 2000 census are

matched to country/ethnic HLA heterozygosity measures from Cook (2015). This matching is listed in

Appendix B.

Individuals in the Census can report up to 2 ethnicities/ancestries. For those reporting 2

ancestries, we simply take the average of the matched HLA diversity score.

Mixed HLA Diversity: Our primary way of calculating state-level HLA diversity takes the weighted

average of each reported ethnicity’s HLA diversity. This method assumes no intermingling amongst

different ethnic groups. The other extreme considers fully admixed populations. To account for this

extreme, we take the weighted average of genetic variants (or alleles) to find the frequency of the

variant in the larger (and assumed mixed) population. Expected heterozygosity is then calculated using

the admixed allele frequencies, creating a measure of HLA diversity for fully integrated populations.

State-level allele frequencies are found in a similar manner as the base/segregated measure of

HLA diversity: we simply match reported ancestry for those born 5 years prior to the 1937 intervention,

to ethnic allele frequencies of Cook (2015). We then take the weighted average of these frequencies to

create a state-level allele frequency. Mixed HLA diversity is then expected heterozygosity calculated

from these state-level allele frequencies.

State Outcomes

Bacterial Mortality Rate: The sum (excluding missing) of mortality rates (deaths per 100,000) from

typhoid, scarlet fever, pertussis, tuberculosis, diphtheria, influenza and pneumonia, diarrhea and

enteritis, maternal mortality, and syphilis. Data are given at the state-year level. The availability of data

differs by year. Table 2 lists the time range for when each disease is listed in the National Vital Statistics

Reports. Data from 1900-1930 are from Grant Miller’s NBER dataset. Data from 1931-2000 have been

digitized from the annual National Vital Statistics Reports.

Birth Rate: The state-year birth rate for 1915-2000. Data are digitized from National Vital Statistics

reports.

Years of Schooling: Estimated state-year years of schooling for 1900-2000. Data are from Turner et al.

(2007).

Experience: Estimated state-year years of experience for 1900-2000. Data are from Turner et al. (2007).

Population: State-year population for 1900-2000. Data are from Turner et al. (2007).

Income: Estimated state-year real income for 1900-2000. Prior to 1930, data are by decade. Yearly

measures are then interpolated for 1900-1930. Data are from Turner et al. (2007).

Page 22: Heterogeneity in disease resistance and the impact of

21

Census Outcomes

Years of Schooling: From IPUMS variable EDUCD, which gives detailed years of completed schooling.

Data are aggregated by birth state and birth year for those 30 and older and for each wave the 1980,

1990, and 2000 waves of the 5% census sample and from IPUMS (Ruggles et al. 2017). This implies the

maximum number of observations would be for the 48 states by the number of years (41) by the

number of census waves (3), or 5,904. But due to the age restriction, the sample is limited because

more recent birth years are found for only more recent census waves. Furthermore, bacterial mortality

data are not available for all states between 1920 and 1932, which further limits the sample when

controlling for the residual mortality rate.

Family Income: From IPUMS variable FTOTINC, which is the total family income in contemporary (i.e.,

nominal) pre-tax dollars. We only consider those with positive income that is less than capped level of

9,999,999. Data are aggregated by birth state and birth year for those 30 and older and for each wave

the 1980, 1990, and 2000 waves of the 5% census sample and from IPUMS (Ruggles et al. 2017). This

implies the maximum number of observations would be for the 48 states by the number of years (41) by

the number of census waves (3), or 3,936. But due to the age restriction, the sample is limited because

more recent birth years are found for only more recent census waves. Furthermore, bacterial mortality

data are not available for all states between 1920 and 1932, which further limits the sample when

controlling for the residual mortality rate.

Children Born: From IPUMS variable CHBORN, which is the total number of children born (not surviving)

to women in the sample. Data are aggregated by birth state and birth year for those 40 and older and

are available for the 1980 and 1990 waves of the 5% census sample and are from IPUMS (Ruggles et al.

2017). This implies the maximum number of observations would be for the 48 states by the number of

years (41) by the number of census waves (2), or 5,904. But due to the age restriction, the sample is

limited because more recent birth years are found for only more recent census waves.

Poverty: From IPUMS variable POVERTY, which gives the percentage of family income relative to

poverty thresholds. Following Bhalotra and Venkataramani (2015), our measure is coded as a dummy

variable to capture those families within 200% of the poverty threshold. Data are aggregated by birth

state and birth year for those 30 and older and for each wave the 1980, 1990, and 2000 waves of the 5%

census sample and from IPUMS (Ruggles et al. 2017). This implies the maximum number of

observations would be for the 48 states by the number of years (41) by the number of census waves (3),

or 5,904. But due to the age restriction, the sample is limited because more recent birth years are found

for only more recent census waves. Furthermore, bacterial mortality data are not available for all states

between 1920 and 1932, which further limits the sample when controlling for the residual mortality

rate.

Controls

Residual Mortality Rate: The total mortality rate (per 100,000) minus the bacterial mortality rate. Data

are from the National Vital Statistics.

Count of Bacterial Infections: The number of bacterial infections included in calculating the bacterial

mortality rate. For 1990-2000, data are available only for influenza/pneumonia.

Page 23: Heterogeneity in disease resistance and the impact of

22

Fraction Black: The 1932-1936 average state-level fraction of the population that is black. This data is

by way of Lleras-Muney (2004).

Urbanization Rate: The 1932-1936 average state-level urban fraction of the population. This data is by

way of Lleras-Muney (2004).

Fraction of Foreign Born: The 1932-1936 average state-level fraction of the population that is not native

to the United States. This data is by way of Lleras-Muney (2004).

Schools per square mile: The 1932-1936 average number of schools per square mile. This data is by

way of Lleras-Muney (2004).

Hospitals per Square Mile: The 1932-1936 average number of hospitals per square mile. This data is by

way of Lleras-Muney (2004).

Education Expenditures per Capita: The 1932-1936 average state-level average education expenditure

per capita. This data is by way of Lleras-Muney (2004).

Physicians per Capita: The 1932-1936 average number of physicians per capita. This data is by way of

Lleras-Muney (2004).

World War 2 Spending: The sum (in $000’s) of major war supply contracts and major war facilities from

June 1940-September 1945. From variables 88-91 of Michael Haines’s 2896 entry at ICPSR from part 70

of the 1947 city and county data book. This variable mirrors that of World War II spending from

Fishback and Cullen (2013).

AG’s Overall Genetic Diversity: Census reported ancestry is also matched to Ashraf and Galor’s (2013)

country-level measure of genetic diversity. The state level measure is the weighted average of each

ancestry’s measure of genetic diversity.

Ethnic Fractionalization: Ethnic fractionalization is measured as a Hirfendahl index for the fraction of a

state’s population attributed to each reported ancestry/ethnicity for the base sample of Census

respondents used to measure HLA diversity.

Page 24: Heterogeneity in disease resistance and the impact of

Tables and FiguresTable 1. Summary Statistics

Variable Obs. Mean Std Deviation Min Max

State HLA Heterozygosity, 1937 48 0.3365 0.0065 0.3094 0.3456

By Region and Division:

Northeast 9 0.3418 0.0032 0.3382 0.3456New England 6 0.3436 0.0022 0.3396 0.3456Mid-Atlantic 3 0.3382 0.0001 0.3382 0.3383

South 16 0.3333 0.0040 0.3251 0.3388South Atlantic 8 0.3349 0.0028 0.3306 0.3388East South Central 4 0.3334 0.0042 0.3291 0.3383West South Central 4 0.3300 0.0049 0.3251 0.3363

Midwest 12 0.3393 0.0010 0.3380 0.34178East North Central 5 0.3390 0.0006 0.3380 0.3397West North Central 7 0.3395 0.0012 0.3382 .3418

West 11 0.3340 0.0104 0.3094 0.3433Mountain 8 0.3326 0.0119 0.3094 0.3433Pacific (excl. AK & HI) 3 0.3379 0.0046 0.3326 0.3407

Bacterial Mortality Rates 4151 127.98 149.27 5.3 1210.87

Pre (1900-1936, unbalanced) 1079 331.02 151.33 102.03 1210.87Post (1937-2000) 3072 56.66 49.99 5.3 618.3Initial (1932-1936 mean by state) 48 219.55 71.75 142.12 536.69

Time Varying Controls

Residual Mortality 4151 878.50 128.55 57.6 1965.47Count of Included Bacterial Infections 4151 6.95 2.41 1 9

Pre-period (1932-1936 mean) State-Level Controls

Frac. Black 48 9.40 13.37 0.06 49.8Urbanization Rate 48 46.55 19.25 18.06 92.08Frac. Foreign Born 48 9.05 6.87 0.4 24.66Schools per square mile 48 0.12 0.09 0.0029 0.36Hospitals per square mile 48 0.0045 0.0068 0.0002 0.0334Physicians per capita 48 0.0012 0.0003 0.0007 0.0018World War 2 spending (in $100Ks) 48 86.33 119.7 0.18 436.38Ethnic Fractionalization 48 0.87 0.03 0.78 0.92Initial Income (in $1000s) 48 12.76 3.70 5.69 20.88

SES Outcomes

Contemporary State Sample (state by year)Births (per 1000 of pop.) 3827 19.4955 4.7503 10.7 34.9Years of Schooling 4151 9.8179 2.3820 3.6036 14.1410Experience 4151 19.4071 1.4655 12.5986 24.7297Population (1000s) 4151 3693.961 4064.522 90 33871.65Income (1000s) 4151 28.981 13.2338 4.4524 70.6046

Aggregated Census Sample(birth-state by birth-year by census wave)Number of Children Born 2279 2.8948 0.4348 1.6 4.25Years of Schooling 5637 13.6321 0.8652 10.3507 15.2270High School Grad Frac. 5637 0.8095 0.1230 0.2970 0.9889Poverty Frac. 5637 0.2295 0.0762 0.0775 0.5552Family Income (1000s) 5637 45.2784 16.5173 17.1398 96.7936

Summary & Notes: This table provides summary statistics for most variables used in our analysis. Definitions and sources ofeach variable are given in the variable appendix. State-level HLA diversity scores are given in Figure 1. Arizona has the lowestamount of diversity, and Maine has the highest level.

23

Page 25: Heterogeneity in disease resistance and the impact of

Tab

le2.

Pos

t-19

37tr

eatm

ent

for

each

bac

teri

alin

fect

ion

Dep

end

ent

Var

iab

le:

lnM

orta

lity

Rat

efr

om

:T

yp

hoid

Sca

rlet

Fev

erP

ertu

ssis

Tu

ber

culo

sis

Dip

hth

eria

(1)

(2)

(3)

(4)

(5)

Pos

t-19

37In

dic

ator

-6.1

006***

-5.1

977***

-5.2

208***

-3.1

101***

-5.9

399***

(0.2

713)

(0.3

101)

(0.3

055)

(0.3

078)

(0.2

885)

Sta

teF

EY

YY

YY

Tim

eT

ren

dY

YY

YY

Pos

Tim

eT

ren

dY

YY

YY

Div

isio

Tim

eT

ren

dY

YY

YY

Ob

serv

atio

ns

2087

3479

3537

3815

2567

RS

qr.

0.9

100

0.8

748

0.8

668

0.9

313

0.8

794

Yea

rs(u

nb

alan

ced

)1900-1

957

1900-1

988a

1900-1

988

1900-1

993

1900-1

967

Flu

an

dP

neu

mon

iaD

iarr

hea

Mate

rnal

Mort

ali

tyS

yp

hil

isA

ll(6

)(7

)(8

)(9

)(1

0)

Pos

t-19

37In

dic

ator

-5.0

747***

-5.5

163***

-4.0

705***

3.4

464***

-0.8

469***

(0.2

982)

(0.2

942)

(0.2

989)

(0.2

784)

(0.0

919)

Sta

teF

EY

YY

YY

Tim

eT

ren

dY

YY

YY

Pos

Tim

eT

ren

dY

YY

YY

Div

isio

Tim

eT

ren

dY

YY

YY

Ob

serv

atio

ns

4151

2711

3671

2782

4151

RS

qr.

0.8

571

0.8

865

0.8

873

0.8

972

0.9

065

Yea

rs(u

nb

alan

ced

)1900-2

000

1900-1

970

1900-1

990

1931-1

988

1900-2

000

Su

mm

ary

&N

ote

s:T

his

tab

lesh

ows

ad

ecli

ne

inb

acte

rial

infe

ctio

ns

aft

erth

epro

pose

din

terv

enti

on

yea

rof

1937.

All

esti

mati

on

isp

erfo

rmed

wit

hO

LS

wit

hst

ate-

clu

ster

edst

and

ard

erro

rsre

por

ted

inp

aren

thes

es.

Sta

tist

ical

sign

ifica

nce

isd

enote

dby

*,

**,

an

d***,

rep

rese

nti

ng

sign

ifica

nce

at

the

10,

5,

an

d1%

leve

ls,

resp

ecti

vely

.

aNodata

foryears

1949-1950

24

Page 26: Heterogeneity in disease resistance and the impact of

Tab

le3.

Pool

edeff

ect

ofH

LA

div

ersi

ty

Dep

end

ent

vari

ab

le:

lnof

mort

ali

tyfr

om

bact

eria

lin

fect

ion

s,1900-2

000

Sam

ple

per

iod

:1900-1

936

1937-2

000

1937-1

968

1969-2

000

(1)

(2)

(3)

(4)

lnH

LA

div

ersi

ty-9

.8426***

-4.0

232***

-7.0

485***

-0.9

979

(1.4

064)

(0.9

929)

(1.3

637)

(0.9

486)

Yea

rF

EY

YY

Y

Cen

sus

Div

isio

ns×

Yea

rF

EY

YY

YO

bse

rvat

ion

s1079

3072

1536

1536

RS

qr.

0.8

532

0.9

299

0.9

182

0.5

820

Su

mm

ary

&N

ote

s:T

his

tab

lesh

ows

that

HL

Ad

iver

sity

,w

hic

his

am

easu

reof

inn

ate

resi

stan

ce,

has

ast

ron

ger

effec

tin

per

iod

sp

rior

toth

ein

venti

on

an

dw

ides

pre

adu

sage

ofan

tib

ioti

cs.

Th

ees

tim

ates

ofT

able

3are

furt

her

illu

stra

ted

by

Fig

ure

2,

wh

ich

plo

tsth

eco

effici

ent

of

HL

Ad

iver

sity

on

bact

eria

lm

ort

ali

tyfr

om19

25-2

000.

All

esti

mat

ion

isp

erfo

rmed

wit

hO

LS

wit

hst

ate

-clu

ster

edst

an

dard

erro

rsre

port

edin

pare

nth

eses

.S

tati

stic

al

sign

ifica

nce

isd

enote

dby

*,

**,

and

***,

rep

rese

nti

ng

sign

ifica

nce

atth

e10

,5,

and

1%le

vel

s,re

spec

tive

ly.

25

Page 27: Heterogeneity in disease resistance and the impact of

Tab

le4.

Inte

nsi

tyof

trea

tmen

tfo

rea

chbac

teri

alin

fect

ion

Dep

end

ent

Var

iab

le:

lnM

orta

lity

Rat

efr

om

:T

yp

hoid

Sca

rlet

Fev

erP

ertu

ssis

Tu

ber

culo

sis

Dip

hth

eria

(1)

(2)

(3)

(4)

(5)

Pos

t-19

37×

lnH

LA

div

ersi

ty10.5

178**

1.5

289

11.8

007**

16.4

518***

11.4

033**

(4.5

373)

(4.4

033)

(4.7

859)

(4.6

864)

(5.4

123)

Sta

teF

EY

YY

YY

Yea

rF

EY

YY

YY

Cen

sus

Div

isio

ns×

Yea

rF

EY

YY

YY

Ob

serv

atio

ns

2087

3479

3535

3815

2567

RS

qr.

0.9

786

0.9

600

0.9

735

0.9

869

0.9

776

Yea

rs(u

nb

alan

ced

)1900-1

957

1900-1

988a

1900-1

988

1900-1

993

1900-1

967

Flu

an

dP

neu

mon

iaD

iarr

hea

Mate

rnal

Mort

ali

tyS

yp

hil

isA

ll(6

)(7

)(8

)(9

)(1

0)

Pos

t-19

37×

lnH

LA

div

ersi

ty8.3

232*

11.1

173**

9.2

750*

7.4

702***

5.6

042***

(4.7

486)

(4.7

024)

(5.1

785)

(2.1

406)

(1.2

364)

[1em

]S

tate

FE

YY

YY

Y

Yea

rF

EY

YY

YY

Cen

sus

Div

isio

ns×

Yea

rF

EY

YY

YY

Ob

serv

atio

ns

4151

2711

3671

2782

4151

RS

qr.

0.9

794

0.9

740

0.9

800

0.9

672

0.9

815

Yea

rs(u

nb

alan

ced

)1900-2

000

1900-1

970

1900-1

990

1931-1

988

1900-2

000

Su

mm

ary

&N

ote

s:T

his

tab

lesh

ows

HL

Ad

iver

sity

as

an

inte

nsi

ty-o

f-tr

eatm

ent

for

each

bact

eria

lin

fect

ion

that

com

pri

ses

ou

rm

easu

reof

bact

eria

lm

ort

ali

ty.

All

esti

mat

ion

isp

erfo

rmed

wit

hO

LS

wit

hst

ate-

clu

ster

edst

an

dard

erro

rsre

port

edin

pare

nth

eses

.S

tati

stic

al

sign

ifica

nce

isd

enote

dby

*,

**,

an

d***,

rep

rese

nti

ng

sign

ifica

nce

atth

e10

,5,

and

1%le

vels

,re

spec

tive

ly.

aNodata

foryears

1949-1950

26

Page 28: Heterogeneity in disease resistance and the impact of

Table 5. Baseline estimation: Piecemeal inclusion of controls

Dependent variable: ln of mortality from bacterial infections, 1900-2000

(1) (2) (3) (4) (5)

Post-1937 × ln HLA diversity 5.6042*** 5.1853*** 6.2646*** 5.2537*** 5.5830***(1.2364) (1.2267) (1.6332) (1.4213) (1.4966)

ln Residual Mortality 0.2575* 0.2740**(0.1365) (0.1181)

Count of Bacterial Infections -0.0089 -0.0186(0.0322) (0.0282)

Post-1937×controls measured in 1937:

ln Fraction Black -0.0219 -0.0646**(0.0367) (0.0304)

ln Urbanization Rate -0.2200* -0.1279(0.1142) (0.1853)

ln Fraction Foreign Born -0.0101 0.0378(0.0528) (0.0529)

ln Ethnic Fractionalization 1.2749* 1.1275*(0.6955) (0.6031)

ln Schools per Square Mile 0.1648** 0.1080(0.0706) (0.0650)

ln Edu. Exp. per Capita 0.2253 0.1406(0.1544) (0.1445)

ln Hospitals per Square Mile -0.1343* -0.0652(0.0763) (0.0804)

ln Physicians per Capita -0.0959 0.0358(0.1759) (0.1782)

ln World War II Spending 0.0205 0.0830***(0.0213) (0.0302)

ln Income per Capita -0.2765 -0.4962*(0.2550) (0.2890)

State, Year, and Year-by-Division FE Y Y Y Y Y

Observations 4151 4151 4151 4151 4151R Sqr. 0.9815 0.9823 0.9822 0.9823 0.9834

Summary & Notes: This table comprises our baseline estimation, which seeks to show that HLA diversity contributed to theresponse of medicines introduced in 1937. The positive coefficient of HLA diversity signifies a larger decline for those states thathave lower diversity within the HLA system. This supports our primary hypothesis. Controls for 1937 are interacted with thepost-1937 indicator. All estimation is performed with OLS with state-clustered standard errors reported in parentheses.Statistical significance is denoted by *, **, and ***, representing significance at the 10, 5, and 1% levels, respectively.

27

Page 29: Heterogeneity in disease resistance and the impact of

Table 6. Placebo test: Other causes of mortality and HLA diversity

Dependent variable: ln of all mortality less bacterial infections, 1900-2000

(1) (2) (3) (4) (5)

Post-1937 × ln HLA diversity 1.6245 1.6253 1.8878 -0.6815 0.4504(1.7332) (1.7334) (2.1942) (1.3982) (1.9229)

Count of Bacterial Infections 0.0126 0.0110(0.0207) (0.0166)

Post-1937×pre-period controls:

ln Fraction Black -0.0169 0.0173(0.0320) (0.0292)

ln Urbanization Rate -0.1304 -0.1937(0.1024) (0.1451)

ln Fraction Foreign Born 0.0269 -0.0888(0.0732) (0.0537)

ln Ethnic Fractionalization 0.8484 0.5235(0.6837) (0.5526)

ln Schools per Square Mile 0.0760 -0.0039(0.0533) (0.0628)

ln Edu. Exp. per Capita -0.0159 0.0805(0.1284) (0.1464)

ln Hospitals per Square Mile 0.0022 0.0635(0.0582) (0.0692)

ln Physicians per Capita -0.3810** -0.3496**(0.1676) (0.1515)

ln World War II Spending -0.0661** -0.0800**(0.0288) (0.0325)

ln Income per Capita 0.4426* 0.6626**(0.2431) (0.2566)

State, Year, and Year-by-Division FE Y Y Y Y Y

Observations 4151 4151 4151 4151 4151R Sqr. 0.5617 0.5617 0.5703 0.6028 0.6150

Summary & Notes: This table serves as a placebo test of the findings in Table 5. As shown, HLA diversity in the post-1937period has no association with other sources of mortality. Controls for 1937 are interacted with the post-1937 indicator. Allestimation is performed with OLS with state-clustered standard errors reported in parentheses. Statistical significance isdenoted by *, **, and ***, representing significance at the 10, 5, and 1% levels, respectively.

28

Page 30: Heterogeneity in disease resistance and the impact of

Table 7. Controlling for overall genetic diversity

Dependent variable: ln of mortality from bacterial infections, 1900-2000

(1) (2) (3) (4) (5)

Panel A. Controlling for AG’s genetic diversity

Post-1937 × ln HLA diversity 3.8492** 3.0435* 1.5173 4.5851* 1.7533(1.7140) (1.7977) (2.9087) (2.3581) (3.0627)

Post-1937 × ln Overall Genetic Diversity 4.6724 5.6814 8.9233** 1.5144 6.6439(3.6433) (3.5775) (4.0580) (3.9904) (4.3514)

Controls:Resid. Mort./ Bac. Count N Y N N Y

Demographic N N Y N Y

Infrastructure N N N Y Y

State, Year, and Year-by-Division FE Y Y Y Y Y

Observations 4151 4151 4151 4151 4151R Sqr. 0.9815 0.9824 0.9824 0.9823 0.9835

p-value for coef. of HLA div. = coef. of Table 5 0.31 0.24 0.11 0.78 0.22

p-value for joint sig. of HLA and AG’s div. 0.00 0.00 0.00 0.00 0.00

Panel B. Ratio of HLA diversity to overall diversity

Post-1937 × ln Ratio of HLA to Overall Diversity 6.0664** 5.3222* 5.4837 6.6347*** 5.2695(2.5585) (2.7494) (3.3601) (2.3869) (3.2982)

Controls:Resid. Mort./ Bac. Count N Y N N Y

Demographic N N Y N Y

Infrastructure N N N Y Y

State, Year, and Year-by-Division FE Y Y Y Y Y

Observations 4151 4151 4151 4151 4151R Sqr. 0.9811 0.9820 0.9818 0.9821 0.9832

Summary & Notes: This table controls for Ashraf and Galor’s overall measure of diversity. This measure of diversity isstrongly correlated with HLA diversity, increasing the standard errors in Panel A. To partially mitigate this collinearity, PanelB looks at the ratio of HLA diversity to AG’s measure of diversity. The inclusion of controls follows the format of Table 5.Time-invariant controls for 1937 are interacted with the post-1937 indicator. These include demographic controls–black fractionof the state, the urbanization rate, and the fraction of the population that is foreign born–and infrastructure controls–WorldWar 2 spending, schools per square mile, hospitals per square mile, and doctors per capita. All estimation is performed withOLS with state-clustered standard errors reported in parentheses. Statistical significance is denoted by *, **, and ***,representing significance at the 10, 5, and 1% levels, respectively.

29

Page 31: Heterogeneity in disease resistance and the impact of

Table 8. Mixed HLA Heterozygosity

Dependent variable: ln of mortality from bacterial infections, 1900-2000

(1) (2) (3) (4) (5)

Panel A. Mixed Ethnicity HLA Diversity

Post-1937 × ln mixed-ethnic HLA diversity 7.4687 9.1163 8.1091 6.2116 10.5054***(6.2778) (5.5596) (4.9864) (4.1544) (3.4436)

Controls:Resid. Mort./ Bac. Count N Y N N Y

Demographic N N Y N Y

Infrastructure N N N Y Y

State, Year, and Year-by-Division FE Y Y Y Y Y

Observations 4151 4151 4151 4151 4151R Sqr. 0.9808 0.9819 0.9818 0.9819 0.9833

Panel B. Mean of Seg. and Mixed Ethnicity HLA Diversity

Post-1937 × Avg. of Segregated and Mixed HLA diversity 8.3163*** 8.0381*** 8.4120*** 7.2637*** 8.0170***(1.9835) (1.8388) (2.5978) (2.0870) (2.2521)

Controls:Resid. Mort./ Bac. Count N Y N N Y

Demographic N N Y N Y

Infrastructure N N N Y Y

State, Year, and Year-by-Division FE Y Y Y Y Y

Observations 4151 4151 4151 4151 4151R Sqr. 0.9814 0.9823 0.9821 0.9822 0.9834

Summary & Notes: This table considers alternative ways of calculating HLA heterozygosity. Our base measure isrepresentative of segregated ethnic populations. Panel A instead considers an HLA diversity measure from a gene frequenciesrepresentative of fully mixed ethnicities. Panel B considers the average between the two measures. Time-invariant controls for1937 are interacted with the post-1937 indicator. These include demographic controls–black fraction of the state, theurbanization rate, and the fraction of the population that is foreign born–and infrastructure controls–World War 2 spending,schools per square mile, hospitals per square mile, and doctors per capita. All estimation is performed with OLS withstate-clustered standard errors reported in parentheses. Statistical significance is denoted by *, **, and ***, representingsignificance at the 10, 5, and 1% levels, respectively.

30

Page 32: Heterogeneity in disease resistance and the impact of

Tab

le9.

Inst

rum

enti

ng

pre

-per

iod

mor

tality

:C

onte

mp

orar

yst

ate

anal

ysi

s

Dep

end

ent

vari

able

:ln

Bac.

Mort

lnB

irth

Rate

lnY

ears

of

Sch

.ln

Exp

.ln

Pop

.ln

Inc.

(1)

(2)

(3)

(4)

(5)

(6)

Pan

elA

.O

LS

esti

mati

on

Pos

lnp

re-p

erio

db

acte

rial

mor

t.-0

.6223***

-0.1

154

0.0

650

0.0

541

0.0

450

-0.0

542

(0.0

923)

(0.0

806)

(0.0

437)

(0.0

491)

(0.1

662)

(0.0

702)

Con

trol

s:R

esid

.M

ort.

/B

ac.

Cou

nt

YY

YY

YY

Dem

ogra

ph

icY

YY

YY

Y

Infr

astr

uct

ure

YY

YY

YY

Sta

te,

Yea

r,an

dY

ear-

by-D

ivis

ion

FE

YY

YY

YY

Ob

serv

atio

ns

4151

3811

4151

4151

4151

4151

RS

qr.

0.9

841

0.9

471

0.9

900

0.8

762

0.9

852

0.9

884

Pan

elB

.2S

LS

esti

mati

on

(in

stru

men

t=P

ost×

lnH

LA

div

ersi

ty)

Pos

lnp

re-p

erio

db

acte

rial

mor

t.-0

.5890***

-0.2

142**

0.1

598**

0.1

522***

0.4

226

-0.0

011

(0.1

319)

(0.1

011)

(0.0

774)

(0.0

520)

(0.2

864)

(0.0

839)

Con

trol

s:R

esid

.M

ort.

/B

ac.

Cou

nt

YY

YY

YY

Dem

ogra

ph

icY

YY

YY

Y

Infr

astr

uct

ure

YY

YY

YY

Sta

te,

Yea

r,an

dY

ear-

by-D

ivis

ion

FE

YY

YY

YY

Ob

serv

atio

ns

4151

3811

4151

4151

4151

4151

Fir

st-s

tage

Fst

atis

tic

(KP

)24.1

75

23.5

61

24.1

75

24.1

75

24.1

75

24.1

75

p-v

alu

e,K

P-L

Mst

atis

tic

0.0

068

0.0

053

0.0

068

0.0

068

0.0

068

0.0

068

Su

mm

ary

&N

ote

s:T

his

tab

leco

mp

ares

OL

San

d2S

LS

esti

mate

sfo

ra

com

mon

lyu

sed

mea

sure

of

het

erogen

eity

base

don

pre

-per

iod

mort

ali

ty.

We,

alo

ng

wit

hot

her

s,ar

gue

that

pre

-per

iod

mor

tali

tyis

pot

enti

ally

rela

ted

wit

hin

com

ean

doth

erm

easu

res

of

wel

l-b

ein

gth

at

may

lead

ton

egati

veb

ias

init

ses

tim

ate

dre

lati

onsh

ipw

ith

chan

ges

toso

cioec

onom

icou

tcom

es.

Tim

e-in

vari

ant

contr

ols

for

the

pre

-per

iod

are

inte

ract

edw

ith

the

post

-1937

ind

icato

r.T

hes

ein

clu

de

dem

ogra

ph

icco

ntr

ols–

the

bla

ckfr

acti

onof

ast

ate,

the

urb

an

izati

on

rate

,an

dth

efr

act

ion

of

the

pop

ula

tion

that

isfo

reig

nb

orn

–an

din

frast

ruct

ure

contr

ols–

Wor

ldW

ar2

spen

din

g,sc

hool

sp

ersq

uar

em

ile,

hosp

itals

per

squ

are

mil

e,ed

uca

tion

exp

end

itu

res

per

cap

ita,

doct

ors

per

cap

ita,

an

dp

re-p

erio

din

com

e.S

tan

dar

der

rors

are

clu

sted

by

stat

e,an

dst

atis

tica

lsi

gnifi

can

ceis

den

ote

dby

*,

**,

an

d***,

rep

rese

nti

ng

sign

ifica

nce

at

the

10,

5,

an

d1%

leve

ls,

resp

ecti

vely

.

31

Page 33: Heterogeneity in disease resistance and the impact of

Tab

le10

.In

stru

men

ting

pre

-per

iod

mor

tality

:B

irth

cohor

tan

alysi

s

Dep

end

ent

vari

able

:ln

Ch

ild

ren

Born

lnY

ears

of

Sch

.H

igh

Sch

ool

Gra

d.

Fra

c.P

over

tyF

rac.

lnF

am

ily

Inc.

(1)

(2)

(3)

(4)

(5)

Pan

elA

.O

LS

esti

mati

on

Pos

t-19

37b

irth

-yea

lnp

re-p

erio

db

act.

mor

t.-0

.0398**

0.0

557***

0.1

075***

-0.0

292**

0.0

339*

(0.0

167)

(0.0

118)

(0.0

198)

(0.0

109)

(0.0

173)

Con

trol

s:B

irth

-Sta

teB

asel

ine

YY

YY

Y

Bir

th-S

tate

,B

irth

-Yea

r,an

dB

irth

-Yea

r-by-D

ivis

ion

YY

YY

Y

Cen

sus

Wav

ean

dC

ensu

s-W

ave-

by-D

ivis

ion

FE

YY

YY

Y

Ob

serv

atio

ns

2279

5637

5637

5637

5637

RS

qr.

0.9

566

0.9

743

0.9

757

0.7

199

0.9

030

Pan

elB

.2S

LS

esti

mati

on

(in

stru

men

t=P

ost×

lnH

LA

div

ersi

ty)

Pos

t-19

37b

irth

-yea

lnp

re-p

erio

db

act.

mor

t.-0

.0689**

0.0

990***

0.1

560***

-0.0

522***

0.0

518**

(0.0

300)

(0.0

168)

(0.0

283)

(0.0

169)

(0.0

248)

Con

trol

s:B

irth

-Sta

teB

asel

ine

YY

YY

Y

Bir

th-S

tate

,B

irth

-Yea

r,an

dB

irth

-Yea

r-by-D

ivis

ion

YY

YY

Y

Cen

sus

Wav

ean

dC

ensu

s-W

ave-

by-D

ivis

ion

FE

YY

YY

Y

Ob

serv

atio

ns

2279

5637

5637

5637

5637

Fir

st-s

tage

Fst

atis

tics

(KP

)25.5

849

27.5

066

27.5

066

27.5

066

27.5

066

p-v

alu

e,K

P-L

Mst

atis

tic

0.0

047

0.0

052

0.0

052

0.0

052

0.0

052

Cen

sus

Wav

es(b

yd

ecad

e)1980-1

990

1980-2

000

1980-2

000

1980-2

000

1980-2

000

Res

tric

tion

sW

om

en40

an

dold

erA

ll30

an

dold

erA

ll30

an

dold

erA

ll30

an

dold

erA

ll30

an

dold

er

Su

mm

ary

&N

ote

s:T

his

tab

lere

apea

tsth

ees

tim

atio

nst

rate

gy

of

Tab

le9

by

com

pari

ng

OL

San

d2S

LS

esti

mate

sfo

rp

re-p

erio

dm

ort

ali

ty.

Th

ein

trod

uct

ion

of

anti

bio

tics

isli

kely

tohav

ela

rger

effec

tsin

earl

ych

ild

hood

.T

om

easu

reth

ese

effec

ts,

Tab

le10

exam

ines

diff

eren

ces

inb

irth

-sta

teaggre

gate

dco

hort

sb

orn

pri

or

toan

daf

ter

the

1937

intr

od

uct

ion

;th

isis

sim

ilar

toth

est

rate

gy

emp

loye

dby

Bh

alo

tra

an

dV

enka

tara

man

i(2

015).

Bir

th-s

tate

base

lin

eco

ntr

ols

incl

ud

eb

irth

-sta

tean

db

irth

-yea

rfi

xed

effec

tsan

dth

est

ate-

year

resi

du

alm

ort

ali

tyra

te.

Ad

dit

ion

al

tim

e-in

vari

ant

contr

ols

for

the

pre

-per

iod

are

inte

ract

edw

ith

the

post

-1937

ind

icat

or.

Th

ese

incl

ud

ed

emog

rap

hic

contr

ols–

the

bla

ckfr

act

ion

of

ast

ate

,th

eu

rban

izati

on

rate

,an

dth

efr

act

ion

of

the

pop

ula

tion

that

isfo

reig

nb

orn

–an

din

fras

tru

ctu

reco

ntr

ols–

Wor

ldW

ar2

spen

din

g,sc

hool

sp

ersq

uare

mil

e,h

osp

itals

per

squ

are

mil

e,ed

uca

tion

exp

end

itu

res

per

cap

ita,

doct

ors

per

cap

ita,

an

dp

re-p

erio

din

com

e.T

oac

cou

nt

for

age

diff

eren

ces

du

rin

gth

esa

mp

ling

per

iod

,ce

nsu

sw

ave

fixed

effec

tsare

incl

ud

edin

all

esti

mati

on

s;th

ein

tera

ctio

nw

ith

cen

sus

div

isio

ns

isal

soin

clu

ded

.S

tan

dar

der

rors

are

clu

ster

edby

bir

thst

ate

,an

dst

ati

stic

al

sign

ifica

nce

isd

enote

dby

*,

**,

an

d***,

rep

rese

nti

ng

signifi

can

ceat

the

10,

5,an

d1%

leve

ls,

resp

ecti

vel

y.

32

Page 34: Heterogeneity in disease resistance and the impact of

010

020

030

040

0B

acte

rial M

orta

lity

Rat

e (p

er 1

00K

)

1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002Year

Above Median HLA States Below Median HLA States

Figure 1. Bifurcated HLA Heterozygosity: Year by Year Association.

Summary & Notes: The mean mortality rate from infectious disease is given on the y-axis. The sample is bifurcated by thosestates above and below the median HLA heterozygosity. As shown, those states above the median had a lower mortality rate in1940 that corresponds to a lower decline in mortality following the mid-20th century medical innovations.

33

Page 35: Heterogeneity in disease resistance and the impact of

0.325

1

0.338

0.332

6

0.336

0.319

9

0.333

2

0.309

4

0.340

70.3

387

0.339

9

0.342

1

0.343

3

0.338

8

0.339

4

0.338

5

0.339

9

0.340

4

0.338

0.338

2

0.327

6

0.339

1

0.341

8

0.332

3

0.338

2

0.3337

0.339

0.331

1

0.330

9

0.334

2

0.338

2

0.339

4

0.329

1

0.336

3

0.335

4

0.338

30.3

348

0.339

7

0.345

5

0.330

6

0.338

1

0.34450.3456

0.342

9

0.339

6

0.336

70.338

3

0.338

8

0.343

7

Fig

ure

2A.

Sta

te-L

evel

HL

AH

eter

ozgy

osit

y.

Su

mm

ary

&N

ote

s:T

his

figu

rep

lots

the

valu

eof

HL

Ad

iver

sity

for

each

state

.A

rizo

na

has

the

low

est

am

ou

nt

of

div

ersi

ty,

an

dM

ain

eh

as

the

hig

hes

tle

vel

.

34

Page 36: Heterogeneity in disease resistance and the impact of

Fig

ure

2B.

Shad

edH

LA

Het

eroz

gyos

ity.

Su

mm

ary

&N

ote

s:D

arke

rar

eas

rep

rese

nt

stat

esw

ith

agre

ate

ram

ou

nt

of

div

ersi

tyw

ith

inth

eH

LA

syst

em.

35

Page 37: Heterogeneity in disease resistance and the impact of

-20

-15

-10

-50

510

Effe

ct o

f HLA

Het

. on

Bac

teria

l Mor

talit

y

1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000Year

Coefficient of ln HLA Heterozygosity95% confidence interval

Figure 3. Yearly Association

Summary & Notes: This figure plots the coefficient of ln HLA diversity while regressing the natural log ofbacterial mortality for each year between 1925 and 2000. As shown, a statistically significant negativerelationship is observed for early years, but this relationship becomes insignificant for more contemporaryperiods.

36

Page 38: Heterogeneity in disease resistance and the impact of

Appendix: Matching IPUM’s ancestry to HLA diversity from Cook (2015)

Census Ancestry Ethnicity/Country from ALFRED and Cook (2015)

HLA heterozygosity

Country Code

Alsatian, Alsace-Lorraine .5 French, .5 Orcadian 0.346954 FRA

Andorran Italian 0.338867 ADO

Austrian Austria (country) 0.344784 AUT

Tirolean .5 French, .5 Orcadian 0.346954 DEU

Basque Basque 0.318691 ESP

French Basque Basque 0.318691 FRA

Belgian Belgium (country) 0.346954 BEL

Flemish .5 French, .5 Orcadian 0.346954 BEL

Walloon French 0.353319 BEL

British UK (country) 0.345724 GBR

British Isles UK (country) 0.345724 GBR

Channel Islander UK (country) 0.345724 GBR

Gibraltan Italian 0.338867 ITA

Cornish .5 French, .5 Orcadian 0.346954 GBR

Corsican .5 Italian, .5 French 0.346093 ITA

Cypriot Cyprus (country) 0.328028 CYP

Greek Cypriote Cyprus (country) 0.328028 CYP

Turkish Cypriote Cyprus (country) 0.328028 CYP

Danish Denmark (country) 0.340367 DNK

Dutch Netherlands (country) 0.346954 NLD

English .5 French, .5 Orcadian 0.346954 GBR

Faeroe Islander Orcadian 0.340588 GBR

Finnish Estonian 0.331338 FIN

Karelian Estonian 0.331338 FIN

French French 0.353319 FRA

Lorrainian French 0.353319 FRA

Breton .5 French, .5 Orcadian 0.346954 GBR

Frisian Orcadian 0.340588 DNK

Friulian Italian 0.338867 ITA

Page 39: Heterogeneity in disease resistance and the impact of

German Germany (country 0.340131 DEU

Bavarian Orcadian 0.340588 DEU

Berliner Orcadian 0.340588 DEU

Hamburger Orcadian 0.340588 DEU

Hanoverian Orcadian 0.340588 DEU

Hessian Orcadian 0.340588 DEU

Lubecker Orcadian 0.340588 DEU

Pomeranian Russian 0.321653 DEU

Prussian Orcadian 0.340588 DEU

Saxon .5 French, .5 Orcadian 0.346954 DEU

Sudetenlander Orcadian 0.340588 DEU

Westphalian .5 French, .5 Orcadian 0.346954 DEU

Greek Grece (country) 0.33026 GRC

Cretan .5 Russian, .5 Italian 0.33026 GRC

Cycladic Islander .5 Russian, .5 Italian 0.33026 GRC

Icelander Iceland (country) 0.340588 ISL

Irish Ireland (country) 0.346954 IRL

Italian Italian 0.338867 ITA

Abruzzi Italian 0.338867 ITA

Apulian Italian 0.338867 ITA

Basilicata Italian 0.338867 ITA

Calabrian Italian 0.338867 ITA

Amalfin Italian 0.338867 ITA

Emilia Romagna Italian 0.338867 ITA

Rome Italian 0.338867 ITA

Ligurian Italian 0.338867 ITA

Lombardian Italian 0.338867 ITA

Marches French 0.353319 ITA

Molise Italian 0.338867 ITA

Piedmontese Italian 0.338867 ITA

Puglia Italian 0.338867 ITA

Sardinian Sardinian 0.320644 ITA

Sicilian Sardinian 0.320644 ITA

Tuscan Italian 0.338867 ITA

Trentino Italian 0.338867 ITA

Page 40: Heterogeneity in disease resistance and the impact of

Umbrian Italian 0.338867 ITA

Valle dAosta Italian 0.338867 ITA

Venetian Italian 0.338867 ITA

Lapp Estonian 0.331338 EST

Liechtensteiner Liechtenstein (country) 0.345662 LIE

Luxemburger Luxemburg (country) 0.340903 LUX

Maltese Italian 0.338867 ITA

Manx Orcadian 0.340588 IMY

Monegasque French 0.353319 FRA

Northern Irelander UK (country) 0.345724 GBR

Norwegian Orcadian 0.340588 NOR

Portuguese Portugal (country) 0.345939 PRT

Azorean .5 Italian, .5 French 0.346093 PRT

Madeiran .5 Italian, .5 French 0.346093 PRT

Scottish UK (country) 0.345724 GBR

Swedish Sweden (country) 0.340435 SWE

Aland Islander Orcadian 0.340588 FIN

Swiss Switzerland (country) 0.342733 CHE

Suisse Switzerland (country) 0.342733 CHE

Romansch Italian 0.338867 ROM

Suisse Romane Italian 0.338867 ROM

Welsh UK (country) 0.345724 GBR

Scandinavian, Nordic Orcadian 0.340588 SWE

Albanian Albania (country) 0.323064 ALB

Azerbaijani Uyghur 0.321912 AZE

Belourussian Belarus (country) 0.321653 BLR

Bulgarian Bulgaria (country) 0.318447 BGR

Carpathian .5 Russian, .5 Italian 0.33026 ROM

Cossack Russian 0.321653 RUS

Croatian Croatia (country) 0.337452 HRV

Czechoslovakian Czech Republic (country) 0.321794 CZE

Bohemian Russian 0.321653 RUS

Estonian Estonian 0.331338 EST

Livonian Estonian 0.331338 EST

Finno Ugrian Estonian 0.331338 FIN

Page 41: Heterogeneity in disease resistance and the impact of

Mordovian Russian 0.321653 RUS

Voytak Russian 0.321653 RUS

Georgian Georgia (country) 0.315343 GEO

Germans from Russia .5 French, .5 Orcadian 0.346954 DEU

Rom .5 Russian, .5 Italian 0.33026 ROM

Hungarian Estonian 0.331338 HUN

Magyar Estonian 0.331338 HUN

Latvian Latvia (country) 0.321653 LVA

Lithuanian Lithuania (country) 0.321653 LTU

Macedonian Macedonia (country) 0.329711 MKD

Ossetian Balochi 0.311038 IRN

Polish Poland (country) 0.321653 POL

Kashubian Russian 0.321653 RUS

Romanian Romania (country) 0.330442 ROM

Bessarabian Estonian 0.331338 MDA

Moldavian Moldava (country) 0.327403 MDA

Wallachian .5 Russian, .5 Italian 0.33026 ROM

Russian Russian 0.321653 RUS

Muscovite Russian 0.321653 RUS

Serbian Serbia (country) 0.330307 YUG

Slovak Slovakia (country) 0.322838 SVK

Slovene Slovenia (country) 0.330264 SVN

Sorb/Wend Russian 0.321653 RUS

Bashkir Yakut 0.315058 TUR

Chevash Yakut 0.315058 TUR

Yakut Yakut 0.315058 TUR

Tatar Yakut 0.315058 TUR

Uzbek Uyghur 0.321912 UZB

Ukrainian Ukraine (country) 0.321678 UKR

Yugoslavian Serbia (country) 0.330307 YUG

Slav Russian 0.321653 RUS

Central European, nec .5 Russian, .5 Italian 0.33026 ROM

Northern European, nec Orcadian 0.340588 DNK

Southern European, nec Italian 0.338867 ITA

Western European, nec French 0.353319 FRA

Page 42: Heterogeneity in disease resistance and the impact of

Eastern European, nec Russian 0.321653 RUS

European, nec .5 Italian, .5 French 0.346093 DEU

Spaniard Spain (country) 0.345652 ESP

Catalonian Spain (country) 0.345652 ESP

Balearic Islander Spain (country) 0.345652 ESP

Galician .5 Italian, .5 French 0.346093 ESP

Mexican Mexico (country) 0.292182 MEX

Mexican American Mexico (country) 0.292182 MEX

Chicano/Chicana Mexico (country) 0.292182 MEX

Nuevo Mexicano Mexico (country) 0.292182 MEX

Californio Mexico (country) 0.292182 MEX

Costa Rican Costa Rica (country) 0.341711 CRI

Guatemalan Guatemala (country) 0.273953 GTM

Honduran Honduras (country) 0.302716 HND

Nicaraguan Nicaragua (country) 0.310932 NIC

Panamanian Panama (country) 0.296404 PAN

Salvadoran El Salvador (country) 0.299874 SLV

Latin American Mexico (country) 0.292182 MEX

Argentinean Argentinia (country) 0.337511 ARG

Bolivian Bolivia (country) 0.258775 BOL

Chilean Chile (country) 0.283092 CHL

Colombian Colombia (country) 0.306605 COL

Ecuadorian Ecuador (country) 0.276056 ECU

Paraguayan Paraguay (country) 0.28956 PRY

Peruvian Peru (country) 0.264896 PER

Uruguayan Uruguay (country) 0.34069 URY

Venezuelan Venezuala (country) 0.304496 VEN

South American Brazil (country) 0.324576 BRA

Puerto Rican Dominican Republic (country) 0.322436 PRI

Cuban Cuba (country) 0.331594 CUB

Dominican Domincan Republic (country) 0.322436 DOM

Hispanic Mexico (country) 0.292182 MEX

Spanish .5 Italian, .5 French 0.346093 ESP

Spanish American .5 Italian, .5 French 0.346093 ESP

Bahamian Bahamas (country) 0.322089 BHS

Page 43: Heterogeneity in disease resistance and the impact of

Barbadian Barbados (country) 0.319475 BRB

Belizean Belize (country) 0.311487 BLZ

Bermudan Bermuda (country) 0.329096 BMU

Cayman Islander Jamaica (country) 0.32057 CYM

Jamaican Jamaica (country) 0.32057 JAM

Dutch West Indies Jamaica (country) 0.32057 ANT

Aruba Islander Venezuala (country) 0.304496 ABW

St Maarten Islander Antigua (country) 0.318978 ANT

Trinidadian/Tobagonian Trinidad (country) 0.32142 TTO

Trinidadian Trinidad (country) 0.32142 TTO

Tobagonian Trinidad (country) 0.32142 TTO

U.S. Virgin Islander Antigua (country) 0.318978 VIR

British Virgin Islander Antigua (country) 0.318978 VIR

British West Indian Bermuda (country) 0.329096 BMU

Turks and Caicos Islander Bahamas (country) 0.322089 BHS

Anguilla Islander Antigua (country) 0.318978 ATG

Dominica Islander Dominca (country) 0.31651 DMA

Grenadian Grenada (country) 0.318636 GRD

St Lucia Islander St Kitts (country) 0.319547 LCA

French West Indies St Kitts (country) 0.319547 KNA

Guadeloupe Islander Dominca (country) 0.31651 DMA

Cayenne Dominca (country) 0.31651 DMA

West Indian Dominican Republic (country) 0.322436 DOM

Haitian Haiti (country) 0.319228 HTI

Brazilian Brazil (country) 0.324576 BRA

San Andres Jamaica (country) 0.32057 JAM

Guyanese/British Guiana Guyana (country) 0.313851 GUY

Providencia Jamaica (country) 0.32057 JAM

Surinam/Dutch Guiana Surinam (country) 0.320199 SUR

Algerian Algeria (country) 0.332228 DZA

Egyptian Egypt (country) 0.329321 EGY

Libyan . . LBY

Moroccan Morroco (country) 0.335216 MAR

Ifni Morroco (country) 0.335216 MAR

Tunisian Tunisia (country) 0.32949 TUN

Page 44: Heterogeneity in disease resistance and the impact of

North African Egypt (country) 0.329321 EGY

Alhucemas Morroco (country) 0.335216 MAR

Berber Mozabite 0.34386 MAR

Rio de Oro Morroco (country) 0.335216 MAR

Bahraini Bahrain (country) 0.327284 BHR

Iranian Iran (country) 0.314294 IRN

Iraqi Iraq (country) 0.325699 IRQ

Israeli . . ISR

Jordanian Jordan (country) 0.328972 JOR

TransJordan Jordan (country) 0.328972 JOR

Kuwaiti Kuwait (country) 0.32845 KWT

Lebanese Lebanon (country) 0.328391 LBN

Saudi Arabian Saudi Arabia (country) 0.329846 SAU

Syrian Syria (country) 0.327922 SYR

Armenian Armenia (country) 0.317006 ARM

Turkish Turkey (country) 0.315704 TUR

Yemeni Oman (country) 0.327634 OMN

Omani Oman (country) 0.327634 OMN

Muscat Oman (country) 0.327634 OMN

Trucial Oman Oman (country) 0.327634 OMN

Qatar Qatar (country) 0.324068 QAT

Bedouin Bedouin 0.334572 SAU

Kurdish Balochi 0.311038 IRN

Kuria Muria Islander Oman (country) 0.327634 OMN

Palestinian Palestinian 0.329321 JOR

Gazan Palestinian 0.329321 JOR

West Bank Palestinian 0.329321 JOR

South Yemeni Oman (country) 0.327634 YEM

Aden Oman (country) 0.327634 YEM

United Arab Emirates Saudi Arabia (country) 0.329846 ARE

Assyrian/Chaldean/Syriac Syria (country) 0.327922 SYR

Middle Eastern Saudi Arabia (country) 0.329846 SAU

Arab Palestinian 0.329321 SAU

Angolan Angola (country) 0.329822 AGO

Benin Benin (country) 0.311931 BEN

Page 45: Heterogeneity in disease resistance and the impact of

Botswana Botswana (country) 0.325655 BWA

Burundian Burundi (country) 0.329497 BDI

Cameroonian Cameroon (country) 0.317818 CMR

Cape Verdean Cape Verde (country) 0.328605 CPV

Chadian . . TCD

Congolese Congo (country) 0.329822 COG

Equatorial Guinea Equatorial Guinea (country) 0.329238 GNQ

Corsico Islander Equatorial Guinea (country) 0.329238 GNQ

Ethiopian . . ETH

Eritrean . . ERI

Gabonese Gabon (country) 0.329822 GAB

Gambian Gambia (country) 0.284383 GMB

Ghanian Ghana (country) 0.309695 GHA

Guinean Guinea (country) 0.284383 GIN

Guinea Bissau Guinea Bissau (country) 0.284383 GNB

Ivory Coast Ivory Coast (country) 0.305893 CIV

Kenyan . . KEN

Lesotho Lesotho (country) 0.329822 LSO

Liberian Liberia (country) 0.296039 LBR

Madagascan Madagascar (country) 0.300658 MDG

Malian Mali (country) 0.295182 MLI

Namibian Namibia (country) 0.320899 NAM

Niger Niger (country) 0.30862 NER

Nigerian Nigeria (country) 0.311891 NGA

Fulani Mandenka 0.284383 GIN

Hausa Mandenka 0.284383 NGA

Ibo Yoruba 0.314909 NGA

Tiv Yoruba 0.314909 NGA

Rwandan Rwanda (country) 0.329505 RWA

Senegalese Senegal (country) 0.284383 SEN

Sierra Leonean Sierra Leone (country) 0.284383 SLE

Somalian . . SOM

Swaziland Swaziland (country) 0.33031 SWZ

South African South Africa (country) 0.329926 ZAF

Union of South Africa South Africa (country) 0.329926 ZAF

Page 46: Heterogeneity in disease resistance and the impact of

Afrikaner Orcadian 0.340588 NLD

Zulu Bantu 0.329822 ZAF

Sudanese . . SDN

Fur . . SDN

Tanzanian Tanzania (country) 0.326463 TZA

Togo Togo (country) 0.303878 TGO

Ugandan Uganda (country) 0.329822 UGA

Zairian DRC (country) 0.329822 ZAR

Zambian Zambia (country) 0.329822 ZMB

Zimbabwean Zimbabwe (country) 0.329572 ZWE

African Islands Bantu 0.329822 MOZ

Central African . . .

East African . . .

West African Yoruba 0.314909 NGA

African Black 0.318553 AA

Afghan Afghanistan (country) 0.329023 AFG

Baluchi Balochi 0.311038 PAK

Pathan Pashtun 0.330319 PAK

Bengali Bangladesh (country) 0.323009 BGD

Bhutanese Bhutan (country) 0.289036 BTN

Nepali Nepal (country) 0.241703 NPL

Asian Indian India (country) 0.320623 IND

Andaman Islander India (country) 0.320623 IND

Andhra Pradesh India (country) 0.320623 IND

Assamese India (country) 0.320623 IND

Goanese India (country) 0.320623 IND

Gujarati India (country) 0.320623 IND

Karnatakan India (country) 0.320623 IND

Keralan India (country) 0.320623 IND

Maharashtran India (country) 0.320623 IND

Madrasi India (country) 0.320623 IND

Mysore India (country) 0.320623 IND

Naga India (country) 0.320623 IND

Pondicherry India (country) 0.320623 IND

Punjabi Pakistan (country) 0.324431 PAK

Page 47: Heterogeneity in disease resistance and the impact of

Tamil Brahui 0.313751 LKA

Pakistani Pakistan (country) 0.324431 PAK

Sri Lankan Sri Lanka (country) 0.320953 LKA

Singhalese Sri Lanka (country) 0.320953 LKA

Veddah Sri Lanka (country) 0.320953 LKA

Maldivian Sri Lanka (country) 0.320953 LKA

Burmese Myanmar (country) 0.302644 MMR

Shan Myanmar (country) 0.302644 MMR

Cambodian Cambodia (country) 0.299148 KHM

Khmer Cambodian 0.298463 KHM

Chinese China (country) 0.320225 CHN

Cantonese China (country) 0.320225 CHN

Manchurian China (country) 0.320225 CHN

Mongolian Mongolia (country) 0.315546 MNG

Tibetan China (country) 0.320225 CHN

Hong Kong China (country) 0.320225 CHN

Macao China (country) 0.320225 CHN

Filipino Phillipines (country) 0.298659 PHL

Indonesian Indonesia (country) 0.298463 IDN

Japanese Japan (country) 0.313769 JPN

Ryukyu Islander Japan (country) 0.313769 JPN

Okinawan Japan (country) 0.313769 JPN

Korean South Korea (country) 0.329926 KOR

Laotian Laos (country) 0.271095 LAO

Meo Miao 0.321474 CHN

Hmong She 0.275142 CHN

Malaysian Malaysia (country) 0.306422 MYS

Singaporean Singapore (country) 0.317352 SIN

Thai Thailand (country) 0.278977 THA

Black Thai Thailand (country) 0.278977 THA

Western Lao Laos (country) 0.271095 LAO

Taiwanese China (country) 0.320225 CHN

Vietnamese Vietnam (country) 0.298078 VNM

Katu Cambodian 0.298463 KHM

Mnong She 0.275142 CHN

Page 48: Heterogeneity in disease resistance and the impact of

Montagnard Vietnam (country) 0.298078 VNM

Indochinese Vietnam (country) 0.298078 VNM

Eurasian . . .

Asian . . .

Australian Australia (country) 0.343894 AUS

Tasmanian . . .

New Zealander New Zealand (country) 0.327982 NZL

Polynesian .5 Cambodian, .5 Melanesian 0.254042 WSM

Maori .5 Cambodian, .5 Melanesian 0.254042 WSM

Hawaiian .5 Cambodian, .5 Melanesian 0.254042 WSM

Part Hawaiian .5 Cambodian, .5 Melanesian 0.254042 WSM

Samoan .5 Cambodian, .5 Melanesian 0.254042 WSM

Tongan Tonga (country) 0.256664 TON

Tokelauan .5 Cambodian, .5 Melanesian 0.254042 TON

Cook Islander .5 Cambodian, .5 Melanesian 0.254042 TON

Tahitian .5 Cambodian, .5 Melanesian 0.254042 TON

Niuean .5 Cambodian, .5 Melanesian 0.254042 TON

Micronesian Micronesia (country) 0.210421 FSM

Guamanian .5 Cambodian, .5 Melanesian 0.254042 MNP

Chamorro Islander .5 Cambodian, .5 Melanesian 0.254042 GUM

Saipanese .5 Cambodian, .5 Melanesian 0.254042 MNP

Palauan Palau (country) 0.272599 PLW

Marshall Islander Marshall Islands (country) 0.210078 MHL

Kosraean Nauru (country) 0.241703 MHL

Chuukese Nauru (country) 0.241703 MHL

Yap Islander Nauru (country) 0.241703 MHL

Caroline Islander Nauru (country) 0.241703 MHL

Kiribatese .5 Cambodian, .5 Melanesian 0.254042 KIR

Nauruan Nauru (country) 0.241703 MHL

Melanesian Islander Melanesian 0.209622 SLB

Fijian Fiji (country) 0.260992 FJI

New Guinean PNG (country) 0.234712 PNG

Papuan PNG (country) 0.234712 PNG

Solomon Islander SI (country) 0.211849 SLB

Vanuatuan Vanuatu (country) 0.211002 VUT

Page 49: Heterogeneity in disease resistance and the impact of

Pacific Islander .5 Cambodian, .5 Melanesian 0.254042 OC

Oceania .5 Cambodian, .5 Melanesian 0.254042 OC

Afro-American Black 0.318553 AA

American Indian (all tribes)

Pima 0.246333 AI

Aleut Pima 0.246333 AI

Eskimo Pima 0.246333 AI

White/Caucasian .5 Italian, .5 French 0.346093 WH

Greenlander . . .

Canadian Canada (country) 0.350252 CAN

Newfoundland Canada (country) 0.350252 CAN

Nova Scotian Canada (country) 0.350252 CAN

French Canadian French 0.353319 FRA

Acadian French 0.353319 FRA

American USA (country) 0.335237 USA

United States USA (country) 0.335237 USA

North American USA (country) 0.335237 USA