heterogeneity in disease resistance and the impact of
TRANSCRIPT
Heterogeneity in disease resistance and the impact of antibiotics in the US.
Abstract: The discovery of antibiotics—beginning in 1937 with the widespread usage of sulfa drugs—led to rapid
declines in mortality rates from bacterial infectious diseases; however, there is limited understanding of
the potential heterogeneity in these mortality impacts. Our primary hypothesis is that the impact of
antibiotics is moderated by a population’s inherent resistance mechanisms, from which more resistant
populations benefited less than more susceptible populations. To measure this heterogeneity, we use a
yearly panel of bacterial infectious disease deaths that covers most of the 20th century for the 48
contiguous US states. We find that states with higher levels of innate resistance, measured by
population-level genetic diversity within the human leukocyte antigen (HLA) system, have smaller
mortality responses from the discovery of antibiotics, suggesting area-level genetic endowments of
disease resistance and the discovery of medical technologies have acted as substitutes in determining
levels of health across the US. We then use this measure of resistance as an instrument for levels of pre-
1937 bacterial disease mortality to estimate the impacts of infectious disease reductions on a variety of
socioeconomic outcomes, showing previous results were understated.
C. Justin Cook
University of California-Merced
Jason M. Fletcher
University of Wisconsin-Madison
1
Introduction
At the beginning of the 20th century, deaths per 1000 live births in the United States within the first
month and first year of life were nearly 50 and 150, respectively. One century later, early life mortality
has been substantially reduced to 4.6 deaths in the first month and 6.9 deaths in the first year.1 A key
determinant of this decline is the result of improvements in environmental factors such as nutrition and
sanitation, access to effective care, and innovations in medical technologies and information.2 While
these historical examples suggest the possibility that extending these improvements to new and
underserved populations could likewise result in large scale increases in population health, it is also
possible that extending these improvements could be less effective than predicted due to heterogeneity
in effects. One source of this heterogeneity is the extent to which new medications substitute for, or
complement, existing characteristics in the population and their environments. Specifically, the long
histories of interactions between people and bacteria around the world have shaped both the people
and the bacteria, allowing some populations to have developed “natural” resistances that undercut the
usefulness of medications that serve the same purpose.
This paper explores the idea of interaction between two principal classes of determinants—that
medical technologies may interplay with innate population characteristics. The focus of the analysis is
on infectious diseases, where we exploit a genetically determined population resistance phenomenon
akin to herd immunity that preceded modern medicine. By incorporating population genetics ideas into
a macroeconomic health analysis, we pursue the hypothesis that medical innovations may have
systematically differential effectiveness that depends on the population under study. Specifically,
medical innovations could be a substitute for population genetic characteristics in the production of
health, such that expanding the innovations to new populations may be less effective than expected.
This substitution effect may be particularly likely in cases of infectious diseases. This paper uses the
expansion of highly successful medical innovations for infectious diseases to test this hypothesis.
1 Considering contemporary developing countries, which constitute a general proxy for early 20th century America, the overwhelming majority of deaths within the first year of life are due to infectious disease—primarily, diarrhea, pneumonia, and sepsis (WHO 2010). 2 For example, innovations in water sanitation are shown to reduce total mortality by 13% and infant mortality by 46% (Beach et al. 2016; Cutler and Miller 2004); Jayachandran et al. (2010) show evidence for a statistical break in bacterial infectious disease mortality in 1937 that is tied to the widespread usage of the sulfa drug. Prontosil; and Hansen (2014) shows improvements in life expectancy across US states from the broad package of medicinal innovations in the mid-20th century.
2
In so doing, we focus on the discovery and widespread usage of antibiotics in the mid-20th
century. Following Jayachandran et al. (2010) and given ongoing testing that led to the mass-
manufacturing of penicillin only eight years later in 1945, we take 1937 as a broad intervention year for
the beginning of a general antibiotic era (Aminov 2010; Clardy et al. 2009; Powers 2004).3 We show that
the beneficial effect of antibiotics was not the same for all populations. Long running historical
differences in disease environments shaped the innate response of populations, leaving some
populations more susceptible than others. We propose that more susceptible groups are likely to
benefit more from the introduction of these effective medicines than those that were better able to
resist infection, leading to a heterogeneous effect of antibiotics.
In short, due to differences in the timing of agriculture and the availability of domesticate
animals, societies have had historically different exposure to infectious disease (Wolfe et al. 2007).
Evidence of this historical difference is seen in the large amount of diversity in the set of genes
associated with the recognition and disposal of foreign pathogens, the human leukocyte antigen (HLA)
system (Prugnolle et al. 2005). Genetic diversity within the HLA system provides resistance to
populations by slowing the spread of infectious pathogens and is shown to be strongly associated with
cross-country health outcomes in years prior to, but not after, the mid-20th century health innovations
(Cook 2015). For the current work, we argue that population aggregations—i.e., U.S. states —with more
genetic diversity for HLA genes, i.e. those that have a higher innate resistance, will have a smaller
relative benefit from the discovery of antibiotics in 1937. The intuition being that states with greater
HLA diversity will be relatively better off in regard to infectious disease mortality in periods prior to the
use of antibiotics and will therefore not have as large a decline in mortality from the introduction of this
treatment.
To test our hypothesis, we construct a state-level measure of genetic diversity within the HLA
system for 1937. We also collect a yearly state-level panel of death rates attributed to bacterial
infections for the 20th century. Preliminary evidence of our hypothesis is presented in Figure 1, which
separates the average bacterial mortality rate for those states above and below median HLA genetic
diversity. As shown, more susceptible states (low HLA diversity) have initially higher relative levels of
bacterial mortality in periods prior to 1937. But after 1937, both low and high HLA states are shown to
3 Class of antimicrobial agent and year of FDA approval (from Table 1 of Powers 2004): Pencillin, 1941; Aminoglycosides, 1944; Chloramphenicol, 1949; Tetracyclines, 1950; Macrolides/Lincosamides/Streptogramins, 1952; Glycopeptides, 1956; Rifamycins, 1957; Nitromidiazoles, 1959; Quinolones, 1962; and Trimethoprim, 1968.
3
have sharp declines in mortality and eventually converge at the same time to a new lower baseline
mortality rate. Given the initial disparity in mortality rates, the more susceptible states, or states with
low amounts of diversity within the HLA system, appear to have a more rapid decline in mortality post
treatment. It is this more rapid decline—tied to a state measure of HLA diversity—that we seek to
estimate.
Indeed, we do show a robust relationship that is consistent with our hypothesis. State-level HLA
diversity has a significant positive association with the post-1937 decline in bacterial mortality rates.
The positive effect implies a larger decline in bacterial infections for states with lower levels of diversity
within the HLA system. This finding is robust to controlling for the demographic composition of the
state, measures of infrastructure and income, and other measures of genetic diversity.
We extend our main finding by proposing that HLA diversity can be used to better measure the
socio-economic effects of medical innovations. Prior studies use pre-innovation mortality as a measure
of intensity for exogenous health innovations—see e.g., Acemoglu and Johnson (2007), Hansen (2014),
and Bhalotra and Venkataramani (2015). Pre-innovation mortality, however, is likely associated with
persistent institutional or cultural factors that can lead to bias in estimating the effect of health
innovations on post-innovation socio-economic measures (Bloom et al. 2014). We propose that pre-
innovation mortality rates are a function of exogenous, latent differences in resistance between
populations and potentially endogenous institutional/cultural factors—e.g., access to nutrition and care,
public works projects, etc. Therefore, we instrument pre-period mortality with our measure of HLA
diversity to address the portion of pre-period mortality determined by endogenous institutional factors.
The instrumented pre-period mortality measure then allows a causal assessment of medical innovations
on population-level socioeconomic and health outcomes. Using our framework, we find statistically
significant effects on a range of socioeconomic outcomes in contrast to previous OLS estimates in the
literature that suggest limited effects.
Background
HLA Diversity: Functions and Hypothesized Causes for Population Variation
Our mechanism of genetic resistance is a measure of genetic diversity within the set of genes comprising
the HLA system. The HLA (human leukocyte antigen) system is associated with the creation of proteins
that recognize and are responsible for removing foreign cells from the body. This system of genes is
hypothesized to have undergone recent selection (Sabeti et al. 2006), and this selection is not for the
4
uniformity of variants that provide a particular benefit, rather the selection of the HLA system is one for
diversity—the set of genes comprising the HLA system being one of the most diverse regions in the
human genome (Jeffrey and Bangham 2000; Meyer et al. 2017). Importantly, genetic diversity within the
HLA system is a broad measure of immunity that is not strictly tied to a particular infectious pathogen.
HLA diversity provides resistance in a population by increasing the variety of immune responses
within the population. The measles virus provides an illustrative example: Obtaining the virus from a
(genetically similar) relative increases the likelihood of death from the virus twofold compared to
obtaining the virus from a (genetically distant) unrelated individual (Garenne and Aaby 1990). For
individuals within the population, rare variants (corresponding to rare immune response) are favorable,
in that pathogens are likely to adapt and overcome common responses. Having a rare immune response
would therefore be beneficial; hence, the rare variant would increase in frequency.4
The roots of population differences in the level of genetic diversity within the HLA system are
tied to the out-of-Africa migration and the Neolithic Revolution. Serial founder effects from the
migration out of East Africa are associated with overall genetic diversity of a population, including the
level of genetic diversity within the HLA system (Ashraf and Galor 2013, Prugnolle et al. 2005, Qutob et
al. 2012, Ramachandran et al. 2005).5 Additional variation in HLA heterozygosity is tied to the
differential timing and composition of the transition to agriculture. The presence of domesticate
animals and the large, dense, and sedentary populations that could only be achieved after agriculture
provided the basis for an epidemiological transition that led to a large increase in the number of
infectious diseases (Cook 2015).
Data and Empirical Methodology
State-Level HLA Heterozygosity
Our primary analysis will be at the state-level within the US, which allows us to avoid country-level
confounders that are likely to bias our estimations as well as providing quality improvements in many of
4 This is only one possible mechanism for the selection of diversity within the HLA system. For a summary of others, please see Spurgin and Richardson (2010). For our purposes, the precise cause of diversity within the HLA system is not important. 5 A negative linear relationship exists between a population’s level of genetic diversity and the distance the population is along historic migration routes from East Africa. This relationship is due to a serial founder effect, in which emigrating (or founding) populations contain a subset of the genome of the origin population. As a result, subsequent emigration reduce genetic diversity along migration routes. This is discussed in greater detail in Ashraf and Galor (2013).
5
the health and economic variables we use. As discussed in greater detail in Cook (2015), the measure of
HLA diversity is derived from the Allele Frequency Database, or ALFRED (Kidd et al. 2003). ALFRED
contains a wide array of genetic data for approximately 50 anthropologically defined ethnicities,
covering all continents in which humans live. Our focus is solely on genetic variants within the HLA
System. The HLA system is a collection of 239 genes located on the sixth chromosome (Shiina et al.
2004). From ALFRED, we are able to obtain data for 156 different single nucleotide polymorphisms, or
SNPs, for each of 51 distinct ethnicities. A SNP is a single change along a strand of DNA. These genetic
variants, or alleles, are then used to calculate the measure of genetic diversity specific to immune
function: expected heterozygosity.
Expected heterozygosity is a commonly used measure from population genetics that measures
the probability that two individuals differ in their genetic variant at a particular locus (or SNP in our
case). Formally, expected heterozygosity is defined as:
𝐻𝑒𝑥𝑝 = 1 −1
𝑚∑ ∑ 𝑝𝑖
2
𝑘𝑙
𝑖=1
𝑚
𝑙=1
where 𝑝𝑖 represents the fraction of allele 𝑖 within each population (ethnicity for our case), and expected
heterozygosity is found by the average across the 𝑚 loci (the 156 SNPs).
Expected heterozygosity scores are calculated for each ethnicity, which are then aggregated into
the country-level measure by taking the weighted average of each ethnic-specific heterozygosity score.
Ethnic compositions, or the fraction of a country’s population attributed to each ethnicity, are found
from Alesina et al. (2003). Of note, the 51 ethnicities in ALFRED are not all directly matched to an
identical ethnicity from Alesina; rather, language similarities are used to match the ethnicities of ALFRED
to closely matched ethnicities within Alesina. This method yields data for roughly 175 countries, of
which 131 are used within the analysis of Cook (2015). However, in our analysis, instead of using
country-level data, we will focus on a single country and examine variation across states in the US. Thus,
we will need to extend the methods of calculating country-level measures of genetic resistance found in
the literature to allow state-level measures. To do so, we use data from the 1980, 1990, and 2000 US
Census’s 5% state sample, which is a 1-in-20 random sample of the US population (Ruggles et al. 2017).
The 1980 Census is the first available census that records each respondent’s self-reported ancestry. We
then match this ancestry to either a specific ethnicity or country for which we have prior measurement
6
of HLA heterozygosity.6 This measure constructed from the 1980/1990/2000 Census ancestry
categories, however, would capture periods that are after the proposed intervention data of 1937.
Therefore, to account for the pre-period level of genetic resistance, we use reported state of birth and
age to construct a measure of state-level HLA heterozygosity for 1937. In other words, we aggregate by
state of birth, rather than state of residence, for respondents born before or during 1937. This results in
a 1937’s measure of genetic resistance for the 48 contiguous US states, our primary measure of genetic
resistance. Our primary measure of diversity within the HLA system considers reported ancestry groups
to be segregated (i.e., no intermarriage), so that a state’s HLA diversity is simply a weighted average of
the HLA diversity from the state’s ancestral composition.7 Reported ancestries along with the respective
match to Cook’s HLA heterozygosity score are reported in the appendix. Summary statistics for the US
and by region are given in Table 1. Additionally, relative state-level HLA diversity comparisons are
shown in Figures 1 and 2.
Outcome and Control Variables
Vital statistics data are from the annual Vital Statistics of the United States. The mortality rate from
each specified infectious disease is by state. Data for years 1900-1930 are from Grant Miller’s NBER
data, and we collected vital statistics data from 1931-2000. The result is an unbalanced panel of the
contiguous US states that covers much of the 20th century (we note instances of missing data in our
results tables below). The infectious diseases of interest include typhoid fever, scarlet fever, pertussis
(whooping cough), tuberculosis, diphtheria, flu and pneumonia, diarrhea and enteritis, syphilis, and
maternal mortality (a proxy for puerperal fever).
Our base specification comprises a difference-in-differences framework that includes state,
year, and year-by-census region fixed effects, and further controls are piecemeal introduced in three
sets. The first set of controls attempts to account for additional unobserved state-year variation in the
bacterial mortality rate by including the residual mortality rate (i.e., the total mortality rate less bacterial
mortality) and the per year count of bacterial infections used to calculate the bacterial mortality rate.
Additional controls, which attempt to account for a differential post-1937 trend, include a set of
demographic controls—the fraction of a state’s population that is black, the fraction of a state’s 1937
population that is foreign born, the urbanization rate in 1937, and a measure of ethnic fractionalization
6 The average is taken for individuals that report two different ancestries. 7 Table 8 considers alternatives to this assumption of segregated ethnicities/ancestral groups. In so doing, we calculate a measure of HLA diversity based upon fully mixed ethnic groups by creating a weighting average of each individual genetic variant used to compute expected heterozygosity.
7
based on the census-level reported ethnicity—and a set of infrastructure controls—schools per square
mile in 1937, hospitals per square mile in 1937, physicians per capita in 1937, and World War II military
spending. Further definitions and sources for all variables can be found in the Variable Appendix.
Empirical Strategy
The first step of our empirical analysis is concerned with verifying both the treatment—the use of sulfa
drugs and other antibiotics after 1937—and its intensity—measured by genetic diversity within the HLA
system. To verify the treatment date, we use within-state estimation to show declining mortality rates
from bacterial infections after the proposed date of 1937.8 In establishing a differential state-level
response from genetic diversity, we will pool years in the pre and post periods. The idea being that our
measure of innate resistance has a hypothesized significant effect on bacterial infections before
antibiotics that becomes significantly weaker in the post periods.
Secondly, we will pursue difference-in-differences specifications in the spirit of Acemoglu and
Johnson (2007) and Hansen (2014).9 In so doing, we compare the relative change in mortality in the
post-innovation period (to the pre-innovation period) between states that were either more or less
genetically susceptible to infectious disease. HLA heterozygosity measures the intensity of treatment.
States with greater levels of diversity within the HLA system will be more resistant to mortality from
infectious disease in the pre-innovation period and are hypothesized to benefit less from the medical
innovations; the opposite being true for states with relatively low HLA heterozygosity. And in order to
eliminate bias from endogenous usage of the treatment (i.e., the newly developed antibiotics), we will
use a common post-innovation treatment date for all states, 1937.
Formally, our estimation is given by:
ln 𝐵𝑎𝑐𝑡𝑒𝑟𝑖𝑎𝑙 𝑀𝑜𝑟𝑡𝑖𝑡 = 𝛽1 ln 𝐻𝐿𝐴 ℎ𝑒𝑡𝑖 × 𝐼𝑡𝑝𝑜𝑠𝑡
+ 𝛽2𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑑𝑖𝑠𝑒𝑎𝑠𝑒𝑠𝑖𝑡
+𝛽3 ln 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑀𝑜𝑟𝑡𝑖𝑡 + ∑ 𝛽𝑗𝑋𝑖 × 𝐼𝑡𝑝𝑜𝑠𝑡
𝑗
+ ∑ 𝛽𝑟𝑅𝑖 × 𝛾𝑡 +
𝑟
𝛾𝑐 + 𝛾𝑡 + 𝜀𝑖𝑡
Where 𝑖 represents each of the contiguous US states, 𝑡 represents time, or each year between 1900 and
2000. The natural log of our primary measure of innate resistance is denoted by ln 𝐻𝐿𝐴 ℎ𝑒𝑡, and 𝐼𝑡𝑝𝑜𝑠𝑡
8 Extensive analysis on the use of 1937 as a post-sulfa drugs intervention date is given by Jayachandran et al., who perform a number of strategies in verifying 1937 as the treatment year. 9 Our difference-in-differences specification is also partially derived from Nunn and Qian (2011), which is similar in its intensity-of-treatment framework.
8
is an indicator for the post-innovation period, or an indicator for periods 1937-2000. State and year
fixed effects are represented by 𝛾𝑐 and 𝛾𝑡, respectively. To account for differential characteristics that
may be associated with HLA diversity and the corresponding change to bacterial mortality, time
invariant controls (c.1937) interacted with the post-1937 indicator are given by 𝑋𝑖 × 𝐼𝑡𝑝𝑜𝑠𝑡
. While
coverage for most years exists, the availability of some bacterial infections are not present in every year.
We therefore control for the number of bacterial infections included in the construction of bacterial
mortality. We also control for the yearly state specific overall mortality rate less bacterial infections.
This is done to account for unobserved state specific trends that may also influence bacterial mortality.
Our focus is the heterogeneity in the decline in mortality following the use of antibiotics.
Therefore, our coefficient of interest is represented by 𝛽1 above and is hypothesized to be positive. The
positive coefficient implies that states with higher HLA diversity had a lower decline in mortality
following the introduction of antibiotics in 1937. The unadjusted relationship is presented by Figure 1,
which shows the decline in bacterial mortality for states above and below median HLA diversity. This
figure represents our primary hypothesis by showing that more diverse states had a lower initial level of
bacterial mortality, resulting in a smaller decline post-1937.
Results
Preliminary Analysis
The first step of our analysis is to verify a declining mortality from bacterial infections after 1937. This is
shown in Table 2, which regresses the mortality rate from a specified infectious disease on a post-1937
indicator while including state fixed effects and a time trend and its interaction with the post-1937
indicator. As shown, the post-1937 indicator typically has a statistically significant negative effect on
mortality rates for bacterial infections considered.10
Table 3 pools all considered bacterial infections across years in the pre-1937 and post-1937
periods. Our hypothesis is that our state-level measure of HLA heterozygosity has a strong negative
association with mortality from infectious disease in the pre-innovation period, and that this
relationship weakens once antibiotics begin to be used in the mid-20th century.11
10 The exception being syphilis, which is shown to have an increase post-1937. This is due to a combination of relatively small pre-period, the inclusion of time trends, and a post-1937 spike in syphilis deaths. Syphilis was not treatable by sulfa drugs, rather it was treated following the introduction of penicillin 8 years later (Davenport 2012). 11 This is similar to the country-level analysis in Table 4 of Cook (2015).
9
For the pre-1937 period, a strong negative association exists between genetic diversity within
the HLA system and bacterial mortality. This negative relationship implies that a higher level of HLA
diversity (i.e., more resistant) is associated with a lower level of mortality from bacterial infections.
Using the coefficient from column (1), a standard deviation increase in HLA diversity is associated with
roughly a 20 percent decline in bacterial mortality rates.
For years after 1937, however, the point estimate of the coefficient of HLA diversity is about half
of the pre-1937 estimate. A further illustration of the diminishing role of HLA diversity following 1937 is
given by Figure 3, which plots the year-by-year coefficient estimate of our state-level HLA diversity on
bacterial mortality. As shown, HLA diversity has a statistically significant negative association with
bacterial mortality in early years. The effect of HLA diversity, however, becomes statistically
indistinguishable from zero in more contemporary periods. This supports our arguments for a reduced
role of innate resistance in the post-antibiotic period. To estimate this contemporary null effect,
columns (3) and (4) equally divide the post-1937 period. As shown, the significant effect of column (2) is
driven solely in the years immediately after 1937, a time in which additional antibiotics are being
introduced.
Table 4 repeats the within-state estimation of Table 2, replacing the simple post-indicator with
its interaction with HLA diversity. Again, our hypothesis is that the coefficient of HLA diversity
interacted with the post-1937 indicator will be positive, implying lower levels of HLA diversity
experienced larger declines in bacterial mortality following the 1937 intervention. As hypothesized, the
interaction between the post-1937 indicator and HLA diversity is generally positive and statistically
significant at conventional levels.
Our baseline analysis bundles all available bacterial infections. This is given by column (10) of
Table 4. Regressing all bacterial infections on our primary coefficient produces the expected positive
coefficient that is significant at the 1-percent level. From this coefficient and at the mean-HLA diverse
state, a standard deviation decline in HLA diversity is associated with approximately a 10% decline in
bacterial mortality following the 1937 intervention. Those states with low levels of HLA diversity are
hypothesized (and shown in Table 3) to have a lower innate resistance that leads to higher levels of
mortality in the absence of medicine. Following the innovations of 1937 and the subsequent
innovations of penicillin, etc., the initial gap in bacterial mortality rates between the low and high HLA
diverse states begins to close, being virtually eliminated by 1960.
10
Baseline Results
Our baseline difference-in-differences estimation is given in Table 5. All bacterial infections listed in
Table 4 are summed within year to create our bacterial mortality measure. As outlined by Equation 2,
our primary hypothesis is that the coefficient on the natural log of HLA heterozygosity interacted with
the post-1937 indicator is positive, implying a sharper decline in mortality for states with low HLA
diversity, or a low level of resistance.
Column (1) gives the bivariate difference-in-differences estimation, controlling only for state,
year, and year-by-division effects. A positive and statistically significant coefficient of interest is
estimated. This estimated coefficient remains similar throughout Table 5, which piecemeal includes our
baseline set of controls. The count of bacterial infections and the state-year residual mortality rate are
included in column (2). Demographic variables—i.e., the mean 1932-1936 fractions of the black
population, foreign born population, urbanization rate, and ethnic fractionalization—are included in
column (3), and infrastructure measures—i.e., schools per mile, hospitals per mile, education
expenditures per capita, physicians per capita, total World War II spending (c.1940-1945), and initial
income levels—are included in column (4). All controls are included in column (5) with little effect on
the magnitude or significance of the coefficient of HLA diversity.
To show evidence that HLA heterozygosity impacts this process solely through resistance to
infectious pathogens, Table 6 considers a falsification test, using mortality from all other causes as the
dependent variable. No significant relationship exists between other causes of mortality and HLA
heterozygosity, providing further support for HLA diversity’s only role as a resistance mechanism to
infectious pathogens.
Robustness to Ashraf and Galor’s Genetic Diversity
As argued and shown by Ashraf and Galor (2013; hereafter AG), genetic diversity is associated with
economic development. This is further shown in the US by Ager and Bruekner (2016). It is possible that
our measure of genetic diversity is simply accounting for the effect of AG. Therefore, Table 7 controls
for a state’s overall level of genetic diversity. To create AG’s measure of genetic diversity, we match
the self-reported ancestry to a country in AG’s data. Then we take the state-level weighted average of
this diversity measure; this is identical to our calculation of state HLA diversity.
Panel A includes AG’s overall diversity interacted with the post-1937 indicator into our baseline
analysis of Table 5. As shown, the point estimate of the coefficient is reduced in magnitude and the
11
standard error increases, but the coefficient of HLA diversity is generally statistically indistinguishable
from the estimate when omitting AG’s diversity measure. This is primarily due to inflated standard
errors for the coefficient of interest from the inclusion of AG’s measure of genetic diversity. AG’s
measure of genetic diversity is due to serial founder effects in the migration out of Africa. As shown in
Table 2 of Cook (2015), this pre-agricultural base of genetic diversity has a strong positive association
with the amount of genetic diversity within the HLA system. Indeed, the correlation coefficient between
the two measures is 0.74. Given this close relationship between AG’s overall diversity and diversity
within the HLA system, high collinearity is expected.
As an alternative way to account for the overall level of diversity, Panel B regresses bacterial
infectious disease mortality on the ratio of a state’s HLA diversity to the state’s overall measure of
genetic diversity. The effect of this ratio is mostly statistically significant at conventional levels and is in
line with the previous estimations of Table 4.12 More precisely, states that contain a higher amount of
HLA diversity to overall diversity within the genome generally experienced a lessened decline in
bacterial mortality rates following the intervention of 1937.
Alternative Strategies for Aggregating State Ethnicities
In calculating HLA diversity, we simply take the weighted average of each ancestral group’s HLA
diversity. This assumes that all ancestral groups within a state are segregated. An alternative approach
is to take the weighted average for the frequency of each genetic variant and then calculate expected
heterozygosity from these state-level gene frequencies. This method assumes that populations are
completely mixed. We consider these two methods of calculating HLA diversity to be two extremes on
the spectrum of ethnic interactions.13
In short when considering mixed ethnicities, states with larger minority populations tend to
have large increases in measured diversity within the HLA system. For example, Louisiana has the
highest amount of diversity for the mixed ethnic score but is 32nd when considering the segregated
score. Indeed, the highest 8 scores for the mixed HLA diversity score belong to states in the South.
Therefore, when considering the mixed score it is imperative to control for region (col. 2 and 5) and the
fraction of the population that is black (col. 3 and 5).
12 The p-value for the coefficient of the ratio of HLA diversity to AG’s diversity is less than 0.15 in columns (3) and (5). 13 The correlation coefficient between the two measures of HLA diversity is 0.37.
12
Panel A of Table 8 replaces our primary measure of HLA diversity, which considers
ethnic/ancestral populations to be segregated, with a measure that considers ethnic populations to be
fully mixed. The coefficient of mixed HLA diversity is insignificant and the opposite sign than expected
except when accounting for census region. Once controlling census regions, the effect of HLA diversity
is similar (though slightly larger in magnitude) to the estimates of our primary segregated measure.
Panel B splits the differences between the two extremes of our assumptions of ethnic
mixing/segregation by taking the average between our primary segregated measure and the mixed
measure of Panel A. This average yields more consistent, statistically significant coefficients that are
slightly larger in magnitude than our baseline segregated measure.
Instrumenting Pre-Period Bacterial Mortality
A common strategy in evaluating the impacts of health innovations is to perform difference-in-
differences estimation while using pre-innovation health outcomes as a measure of spatial variation to
measure intensity of the innovation.14 The idea is that the effect of the treatment will be larger for
areas that will directly benefit more from the treatment. In the current setting, the effect of sulfa drugs
(and antibiotics generally) will be larger for states with higher bacterial mortality rates in the years
preceding the intervention.
As noted by Bloom et al. (2014), however, pre-period health outcomes are not randomly
allotted and may be associated with unobservable factors that could lead to bias in estimating the
treatment effect for post-innovation SES outcomes. For example, the US South has the highest average
bacterial mortality rate prior to the 1937 intervention date.15 The factors that determine this higher
level for the South are also likely correlated with future growth in SES outcomes. In part, these factors
can be controlled for (e.g., the inclusion of pre-period income in our base analysis), but hard to measure
institutional or cultural factors may remain that could be associated with slowed human capital
accumulation, a slower demographic response, and reduced growth in income and wealth. When not
accounting for this source of bias, the association between pre-period health differences and post-
period growth in SES measures is muted.
14 See for example, Acemoglu and Johnson (2007), Bhalotra and Venkataramani (2015), Bleakley (2007), Hansen (2014), and Hansen and Strulik (2017). 15 For 1932-1936 mean bacterial mortality rates, the average for the South is 256 compared to 181 for the Northeast, 246 for the West, and 175 for the Midwest.
13
Our use of HLA diversity can be used to separate quasi-random, latent causes of pre-period
health from institutional/cultural/economic factors. Pre-period bacterial mortality rates are a function
of population resistance/susceptibility—measured by HLA diversity—and a collection of
institutional/cultural/economic factors—observable and unobservable—that are tied to bacterial
mortality through nutrition, public works projects, environmental conditions, etc. By estimating the
change to pre-period bacterial mortality rates tied only to the innate ability of the population to resist
infectious disease, we in effect remove the bias attributable to institutional/cultural/economic factors,
leading to stronger statistical associations between pre-period health and post-innovation SES
outcomes. This is what we find in Tables 9 and 10.
Table 9 repeats our base estimating strategy replacing the interaction between the post-1937
indicator and state HLA diversity with a similar interaction with pre-period bacterial mortality rates. Our
focus is on differential effects on a range of SES outcomes from instrumenting pre-period health with
HLA diversity. Panel A gives OLS estimates, Panel B provides 2SLS estimates, and outcomes are
respectively by column the bacterial mortality rate, the birth rate, years of schooling, worker
experience, population, and real income.16
Column (1) of Panel A follows the spirit of the previous literature by estimating the differential
decline in bacterial mortality from pre-period health differences. As expected, states with higher pre-
period mortality rates experienced a sharper decline following the innovation of antibiotics. When
considering SES outcomes, however, the differential effect from pre-period health is shown to not be
significantly associated. This is seen in columns (2)-(6), where the initial bacterial mortality is
insignificantly related to state-level changes in the birth rate, years of schooling, worker experience,
population, or income. In contrast, the estimates of Panel B, which instruments initial bacterial
mortality, are shown to increase in absolute magnitude for all SES outcomes (col. 2-6) while not
changing the association with declines in bacterial mortality (col. 1). This is supportive of our hypothesis
that the estimated effect of un-instrumented pre-period health is biased towards zero. Notably, the
effect of initial bacterial mortality becomes statistically significant for birth rates and measures of
human capital (col. 2-4), while remaining statistically insignificant for population and income.
Our estimations suggest that an important channel of medicinal innovations in shaping future
income (or output) is through increases in human capital. The estimates of Panel B of Table 9 are
16 The natural log of all outcomes is used in Table 9. The Variable Appendix gives further discussion and sources for all outcome variables.
14
supportive of a Beckerian framework, where the reduction in infectious diseases, which primarily affect
the young (and old), reduces the risks associated with investing in children. These early-life human
capital investments, however, may be difficult to detect in our aggregate state-level analysis.
Therefore, to better measure the cohort effects across the life-course of the 1937 intervention,
we examine birth-year differences in effects, similar to that of Bhalotra and Venkataramani (2015). In so
doing, Table 10 mirrors the analysis of Table 9 but focuses attention on examining the impacts of
medical interventions as a “cohort” (or birth year) phenomenon rather than a “period” (or survey year)
phenomenon. To implement this approach, we replace the contemporary state-level sample with a
birth-year by birth-state sample that is aggregated from individuals in three waves (1980, 1990, and
2000) of the 5% census sample.17 Observations are at the census wave-birth year-birth state level,
otherwise aggregations are similar to those of Table 9 but now take into account cohort differences in
place of aggregate contemporary differences.18 Our focus remains on the state-level difference in
response due to initial bacterial mortality rates. Again, Panel A will present OLS estimates, and Panel B
will give 2SLS estimates, instrumenting initial mortality with HLA diversity. A key difference between
Table 9 and 10 being that treatment is based upon birth year in Table 10. This difference allows us to
better measure life-course differences that may make a smaller impact on the state-level averages of
Table 9.
The OLS estimates of initial bacterial mortality for Panel A now become generally significant with
the expected sign for most SES measures.19 Following the 1937 intervention, states with higher levels of
initial mortality, and therefore benefited more from the intervention, generally are seen to have positive
statistically significant effects on reductions in the number of children born, years of schooling, high
school graduation rates, reductions in poverty, and family income.20 But as we argue above, this
17 Due to data limitations, the sample is restricted to those born from 1920 to 1970. The sample is also restricted to women 40 and older for the number of children born, and to all 30 and older for years of schooling, high school graduation rates, the poverty rate, and family income. This implies that older census waves are absent for more recent birth years; e.g., those born in 1970 are only measured for the 2000 wave of the 5% sample. 18 The census wave level is necessary to account for age differences at the time of census sampling. For example, an individual born in 1940 would be 40 during 1980 wave, 50 during the 1990 wave, and 60 during 2000 wave. Considering the life-cycle of income, it is imperative that census wave be accounted for. 19 SES measures for Table 10 are not identical to those of Table 9; however, the included variables—children born, years of schooling, high school graduation, an indicator of poverty, and family—are relevant and similar in context. 20 In comparing Tables 9 and 10, it is expected that the effect of initial bacterial mortality would be larger in the cohort analysis of Table 10. The absence of this difference in magnitude may be attributed to the sampling periods. We measure accumulated years of schooling up until 2000, but because of the age restriction, most of the analysis is weighted to those in the 1980 and 1990 censuses. When restricting the state sample to 1920-1970, the 2sls coefficient is 0.098; for 1920-1980, it is 0.12; for 1920-1990, it is 0.13. Furthermore, the cohort analysis of
15
estimated effect is likely to be biased towards zero by unobserved institutional/cultural/economic
factors. To get around this bias, we propose instrumenting initial bacterial mortality with HLA diversity;
estimates from this two-stage estimation are given in Panel B. As in Table 9, the absolute magnitude of
the coefficient of interest increases in all columns. Importantly, the instrumented coefficient of initial
bacterial mortality becomes significantly related with income in column (6) of Panel B.
The estimates from Tables 9 and 10 suggest two main findings. First, the use of pre-period
health outcomes bias the spatial effects of the innovation towards zero. Using our measure of latent
resistance as an instrument for pre-period health teases out pre-period unobservable
institutional/cultural/economic differences that are likely negatively associated with future growth in
socioeconomic outcomes. This is shown in the panel differences of Tables 9 and 10, which routinely
estimate larger two-stage coefficients. Second, the use of birth-cohorts allows us to directly account for
early-life impacts of the 1937 health innovation. The OLS estimates of Table 10 show clear significant
impacts on human capital accumulation that are not found in similar state-level analysis of Table 9.
When accounting for both the life-course effect and the potential bias in using pre-period health, a clear
significant impact on human capital accumulation that ultimately is associated with income
improvements is seen in column (6) of Panel B; a finding absent from prior studies (i.e., Acemoglu and
Johnson 2007 and Hansen 2014).
Conclusion Our analysis establishes a heterogeneous response to the medical innovations of the mid-20th century
that is tied to a long-run, genetically observed difference in ancestral exposure to infectious pathogens.
A statistically strong and robust relationship exists between this measure of innate resistance, HLA
heterozygosity, and the state-level mortality rate from bacterial infections prior to the invention and
widespread usage of antibiotics beginning in 1937. We hypothesize and show states composed of more
resistant populations tended to benefit less from the initial and continued usage of antibiotics.
We also propose and provide evidence that this innate measure of resistance can be used to
separate impacts of unobservable institutional and cultural factors that may bias estimation examining
the effects of the health innovations on future changes to SES outcomes. Using our approach and
contrary to prior studies (Acemoglu and Johnson 2007; Hansen 2014), we find statistically significant SES
Table 10 does not allow for effects on young children; for example, 2 year-olds in 1937 are in the control group and categorized as unaffected by the change in exposure.
16
benefits from the reduction in bacterial infectious disease mortality from the innovation of antibiotics.
The absence of results in these studies are likely due to both bias in using pre-period mortality in
measuring differential spatial impacts and the time sensitive nature of human capital accumulation in
affecting income, or output per capita.
We recognize that our measure of innate resistance has limitations. Measurement errors
abound, implying our estimated relationship may be understating the true association. Additionally, we
focus strictly on mortality and are unable to account for the harmful effects of morbidity.21 This focus
on mortality is also likely centered on the young and old, preventing us from extending our results to
more general health improvements that have a more demographically uniform distribution of health
gains or losses. Further studies are needed to either confirm or rule-out any potential role of aggregate
health improvements in shaping macroeconomic outcomes.
21 There is likely to be a high level of overlap between the eradication mortality from infectious disease and the eradication of morbidity from infectious disease. Furthermore, our mechanism of innate resistance should serve to reduce morbidity in a method similar to its role in reducing mortality.
17
References Acemoglu, D., & Johnson, S. (2007). Disease and development: The effect of life expectancy on
economic growth. Journal of Political Economy, 115(61), 925-985.
Ager, P., & Brueckner, M. (2016). Immigrants’ genes: Genetic diversity and economic development in
the US. Working Paper.
Aminov, R. (2010). A brief history of the antibiotic era: Lessons learned and challenges for the future.
Frontiers in Microbiology, 1, 134
Ashraf, Q., & Galor, O. (2013). The “Out of Africa” hypothesis, human genetic diversity, and comparative
economic development. American Economic Review, 103(1), 1-46.
Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S., & Wacziarg, R. (2003). Fractionalization.
Journal of Economic Growth, 8(2), 155-194.
Bhalotra, S., & Venkataramani (2015). Shadows of the captain of the men of death: Health innovation,
human capital investment, and institutions. Working Paper.
https://sites.google.com/site/soniaradhikabhalotra/research--
topics/Paper_WithAppendices.pdf?attredirects=0&d=1
Beach, B., Ferrie, J., Saavedra, M., & Troesken, W. (2016). Typhoid fever, water quality, and human
capital formation. The Journal of Economic History, 76(1), 41-75.
Bleakely, H. (2006). Disease and development: Comments on Acemoglu and Johnson (2006). NBER
Summer Institute on Economic Fluctuations and Growth.
http://www.personal.umich.edu/~hoytb/Bleakley_Comments_Acemoglu_Johnson.pdf
Bleakely, H. (2007). Disease and development: Evidence from hookworm eradication in the American
South. Quarterly Journal of Economics, 122(1), 73-117.
Bleakely, H. (2010). Disease and development: Evidence from the American South. Journal of the
European Economic Association, 1(2-3), 376-386.
Bloom, D.E., Canning, D., & Fink, G. (2014). Disease and development revisited. Journal of Political
Economy, 122(6), 1355-1366.
Clardy, J., Fischbach, M.A., & Currie, C.R. (2009). The natural history of antibiotics. Current Biology,
19(11), R437-R441.
Cook, C.J. (2015). The natural selection of infectious disease resistance and its effect on contemporary
health. The Review of Economics and Statistics, 97(4), 742-757.
18
Cutler, D., & Miller, G. (2005). The role of public health improvements in health advances: The
twentieth century United States. Demography, 42(1), 1-22.
Doherty, P., & Zinkernagel, R. (1975). A biological role for the major histocompatibility antigens. The
Lancet, 305(7922), 1406-1409.
Fortson, J.G. (2009). Mortality risk and human capital investment: The impact of HIV/AIDS in sub-
Saharan Africa. The Review of Economics and Statistics, 93(1), 1-15.
Garenne, M., & Aaby, P. (1990). Pattern of exposure and measles mortality in Senegal. Journal of
Infectious Disease, 161(6), 1088-1094.
Hansen, C.W. (2014). Cause of death and development in the US. Journal of Development Economics,
109, 143-153.
Hansen, C., & Strulik, H. (2017). Life expectancy and education: Evidence from the cardiovascular
revolution. Journal of Economic Growth, 22(4), 421-450.
Jayachandran, S., Lleras-Muney, A., & Smith, K.V. (2010). Modern medicine and the twentieth century
decline in mortality: Evidence on the impact of sulfa drugs. AEJ: Applied, 2(2), 118-146.
Kidd, K., et al. (2003). ALFRED–the ALlele FREquency Database–update. American Journal of Physical
Anthropology. Annual Meeting Issue: Supplement S36, 128. (URL: http://alfred.med.yale.edu)
Jeffery, K., & Bangham, C. (2000). Do infectious diseases drive MHC diversity? Microbes and Infection,
2(11), 1335-1341.
Lorentzen, P., McMillan, J., & Wacziarg, R., (2008). Death and development. Journal of Economic
Growth, 13, 81-124.
Meyer, D., Aguiar, V.R., Bitarello, B.D., Brandt, D.Y.C., & Nunes, K. (2017) A genomic perspective on HLA
evolution. Immunogenetics, DOI 10.1007/s00251-017-1017-3.
Miller
Powers, J.H. (2004). Antimicrobial drug development—the past, the present, and the future. Clinical
Microbiology and Infection, 4, 23-31
Prugnolle, F., Manica, A., Charpentier, M., Gu`egan, J.F., Guernier, V., & Balloux, F. (2005). Pathogen-
driven selection and worldwide HLA class I diversity. Current Biology, 15(11), 1022-1027.
Qutob, N., Balloux, F., Raj, T., & others (2012). Signatures of historical demography and pathogen
richness on MHC class I genes. Immunogenetics, 64(3), 165-175.
Ramachandran, S., Deshpande, O., Roseman, C., Rosenberg, N., Feldman, M., & Cavalli-Sforza, L. (2005).
Support from the relationship of genetic and geographic distance in human populations for a serial
founder effect originating in Africa. PNAS, 102(44), 377-392.
19
Ruggles, S., Genadek, K., Goeken, R., Grover, J., & Sobek, M. Integrated Public Use Microdata Series:
Version 7.0 [dataset]. Minneapolis: University of Minnesota. https://doi.org/10.18128/D010.V7.0.
Sabeti, P., Schaffner, S.F., Fry, B., Lohmueller, J., Varilly, P., Shamovsky, O., Palma, A., Mikkelsen, T.S.,
Altshuler, D., & Lander, E.S. (2006). Positive natural selection in the human lineage. Science, 312(5780),
1614-1620.
Spurgin, L., & Richardson, D. (2010). How pathogens drive genetic diversity: MHC, mechanisms and
misunderstandings. Proceedings. Biological Sciences / The Royal Society, 277(1684), 979-988.
Stock J., & Yogo M. (2005). Testing for weak instruments in linear IV regression. In: Andrews DWK
Identification and Inference for Econometric Models. New York: Cambridge University Press; 2005. pp.
80-108.
Shiina, T., Inoko, H., & Kulski, J.K. (2004). An update of the HLA genomic region, locus information and
disease associations: 2004. Tissue Antigens, 64(6), 631-649.
Turner, C., Tamura, R., Mullholland, S., & Baier, S. (2007). Education and income of the states of the
United States: 1840-2000. Journal of Economic Growth, 12(2), 101-158.
Wolfe, N., Dunavan, C., & Diamond, J. (2007). Origins of major human infections. Nature, 447(7142),
279-283.
World Health Organization (2010). Coutdown to 2015 Decade Report (2000-2010).
20
Variable Appendix HLA Regressors of Interest
State HLA Diversity: This variable is a weighted average of state-level ancestral HLA heterozygosity for
individuals born between 1933 and 1937 (individuals born up to 5 years before the introduction of sulfa
drugs in 1937). Self-reported ancestry from the 5% sample of the 1980, 1990, and 2000 census are
matched to country/ethnic HLA heterozygosity measures from Cook (2015). This matching is listed in
Appendix B.
Individuals in the Census can report up to 2 ethnicities/ancestries. For those reporting 2
ancestries, we simply take the average of the matched HLA diversity score.
Mixed HLA Diversity: Our primary way of calculating state-level HLA diversity takes the weighted
average of each reported ethnicity’s HLA diversity. This method assumes no intermingling amongst
different ethnic groups. The other extreme considers fully admixed populations. To account for this
extreme, we take the weighted average of genetic variants (or alleles) to find the frequency of the
variant in the larger (and assumed mixed) population. Expected heterozygosity is then calculated using
the admixed allele frequencies, creating a measure of HLA diversity for fully integrated populations.
State-level allele frequencies are found in a similar manner as the base/segregated measure of
HLA diversity: we simply match reported ancestry for those born 5 years prior to the 1937 intervention,
to ethnic allele frequencies of Cook (2015). We then take the weighted average of these frequencies to
create a state-level allele frequency. Mixed HLA diversity is then expected heterozygosity calculated
from these state-level allele frequencies.
State Outcomes
Bacterial Mortality Rate: The sum (excluding missing) of mortality rates (deaths per 100,000) from
typhoid, scarlet fever, pertussis, tuberculosis, diphtheria, influenza and pneumonia, diarrhea and
enteritis, maternal mortality, and syphilis. Data are given at the state-year level. The availability of data
differs by year. Table 2 lists the time range for when each disease is listed in the National Vital Statistics
Reports. Data from 1900-1930 are from Grant Miller’s NBER dataset. Data from 1931-2000 have been
digitized from the annual National Vital Statistics Reports.
Birth Rate: The state-year birth rate for 1915-2000. Data are digitized from National Vital Statistics
reports.
Years of Schooling: Estimated state-year years of schooling for 1900-2000. Data are from Turner et al.
(2007).
Experience: Estimated state-year years of experience for 1900-2000. Data are from Turner et al. (2007).
Population: State-year population for 1900-2000. Data are from Turner et al. (2007).
Income: Estimated state-year real income for 1900-2000. Prior to 1930, data are by decade. Yearly
measures are then interpolated for 1900-1930. Data are from Turner et al. (2007).
21
Census Outcomes
Years of Schooling: From IPUMS variable EDUCD, which gives detailed years of completed schooling.
Data are aggregated by birth state and birth year for those 30 and older and for each wave the 1980,
1990, and 2000 waves of the 5% census sample and from IPUMS (Ruggles et al. 2017). This implies the
maximum number of observations would be for the 48 states by the number of years (41) by the
number of census waves (3), or 5,904. But due to the age restriction, the sample is limited because
more recent birth years are found for only more recent census waves. Furthermore, bacterial mortality
data are not available for all states between 1920 and 1932, which further limits the sample when
controlling for the residual mortality rate.
Family Income: From IPUMS variable FTOTINC, which is the total family income in contemporary (i.e.,
nominal) pre-tax dollars. We only consider those with positive income that is less than capped level of
9,999,999. Data are aggregated by birth state and birth year for those 30 and older and for each wave
the 1980, 1990, and 2000 waves of the 5% census sample and from IPUMS (Ruggles et al. 2017). This
implies the maximum number of observations would be for the 48 states by the number of years (41) by
the number of census waves (3), or 3,936. But due to the age restriction, the sample is limited because
more recent birth years are found for only more recent census waves. Furthermore, bacterial mortality
data are not available for all states between 1920 and 1932, which further limits the sample when
controlling for the residual mortality rate.
Children Born: From IPUMS variable CHBORN, which is the total number of children born (not surviving)
to women in the sample. Data are aggregated by birth state and birth year for those 40 and older and
are available for the 1980 and 1990 waves of the 5% census sample and are from IPUMS (Ruggles et al.
2017). This implies the maximum number of observations would be for the 48 states by the number of
years (41) by the number of census waves (2), or 5,904. But due to the age restriction, the sample is
limited because more recent birth years are found for only more recent census waves.
Poverty: From IPUMS variable POVERTY, which gives the percentage of family income relative to
poverty thresholds. Following Bhalotra and Venkataramani (2015), our measure is coded as a dummy
variable to capture those families within 200% of the poverty threshold. Data are aggregated by birth
state and birth year for those 30 and older and for each wave the 1980, 1990, and 2000 waves of the 5%
census sample and from IPUMS (Ruggles et al. 2017). This implies the maximum number of
observations would be for the 48 states by the number of years (41) by the number of census waves (3),
or 5,904. But due to the age restriction, the sample is limited because more recent birth years are found
for only more recent census waves. Furthermore, bacterial mortality data are not available for all states
between 1920 and 1932, which further limits the sample when controlling for the residual mortality
rate.
Controls
Residual Mortality Rate: The total mortality rate (per 100,000) minus the bacterial mortality rate. Data
are from the National Vital Statistics.
Count of Bacterial Infections: The number of bacterial infections included in calculating the bacterial
mortality rate. For 1990-2000, data are available only for influenza/pneumonia.
22
Fraction Black: The 1932-1936 average state-level fraction of the population that is black. This data is
by way of Lleras-Muney (2004).
Urbanization Rate: The 1932-1936 average state-level urban fraction of the population. This data is by
way of Lleras-Muney (2004).
Fraction of Foreign Born: The 1932-1936 average state-level fraction of the population that is not native
to the United States. This data is by way of Lleras-Muney (2004).
Schools per square mile: The 1932-1936 average number of schools per square mile. This data is by
way of Lleras-Muney (2004).
Hospitals per Square Mile: The 1932-1936 average number of hospitals per square mile. This data is by
way of Lleras-Muney (2004).
Education Expenditures per Capita: The 1932-1936 average state-level average education expenditure
per capita. This data is by way of Lleras-Muney (2004).
Physicians per Capita: The 1932-1936 average number of physicians per capita. This data is by way of
Lleras-Muney (2004).
World War 2 Spending: The sum (in $000’s) of major war supply contracts and major war facilities from
June 1940-September 1945. From variables 88-91 of Michael Haines’s 2896 entry at ICPSR from part 70
of the 1947 city and county data book. This variable mirrors that of World War II spending from
Fishback and Cullen (2013).
AG’s Overall Genetic Diversity: Census reported ancestry is also matched to Ashraf and Galor’s (2013)
country-level measure of genetic diversity. The state level measure is the weighted average of each
ancestry’s measure of genetic diversity.
Ethnic Fractionalization: Ethnic fractionalization is measured as a Hirfendahl index for the fraction of a
state’s population attributed to each reported ancestry/ethnicity for the base sample of Census
respondents used to measure HLA diversity.
Tables and FiguresTable 1. Summary Statistics
Variable Obs. Mean Std Deviation Min Max
State HLA Heterozygosity, 1937 48 0.3365 0.0065 0.3094 0.3456
By Region and Division:
Northeast 9 0.3418 0.0032 0.3382 0.3456New England 6 0.3436 0.0022 0.3396 0.3456Mid-Atlantic 3 0.3382 0.0001 0.3382 0.3383
South 16 0.3333 0.0040 0.3251 0.3388South Atlantic 8 0.3349 0.0028 0.3306 0.3388East South Central 4 0.3334 0.0042 0.3291 0.3383West South Central 4 0.3300 0.0049 0.3251 0.3363
Midwest 12 0.3393 0.0010 0.3380 0.34178East North Central 5 0.3390 0.0006 0.3380 0.3397West North Central 7 0.3395 0.0012 0.3382 .3418
West 11 0.3340 0.0104 0.3094 0.3433Mountain 8 0.3326 0.0119 0.3094 0.3433Pacific (excl. AK & HI) 3 0.3379 0.0046 0.3326 0.3407
Bacterial Mortality Rates 4151 127.98 149.27 5.3 1210.87
Pre (1900-1936, unbalanced) 1079 331.02 151.33 102.03 1210.87Post (1937-2000) 3072 56.66 49.99 5.3 618.3Initial (1932-1936 mean by state) 48 219.55 71.75 142.12 536.69
Time Varying Controls
Residual Mortality 4151 878.50 128.55 57.6 1965.47Count of Included Bacterial Infections 4151 6.95 2.41 1 9
Pre-period (1932-1936 mean) State-Level Controls
Frac. Black 48 9.40 13.37 0.06 49.8Urbanization Rate 48 46.55 19.25 18.06 92.08Frac. Foreign Born 48 9.05 6.87 0.4 24.66Schools per square mile 48 0.12 0.09 0.0029 0.36Hospitals per square mile 48 0.0045 0.0068 0.0002 0.0334Physicians per capita 48 0.0012 0.0003 0.0007 0.0018World War 2 spending (in $100Ks) 48 86.33 119.7 0.18 436.38Ethnic Fractionalization 48 0.87 0.03 0.78 0.92Initial Income (in $1000s) 48 12.76 3.70 5.69 20.88
SES Outcomes
Contemporary State Sample (state by year)Births (per 1000 of pop.) 3827 19.4955 4.7503 10.7 34.9Years of Schooling 4151 9.8179 2.3820 3.6036 14.1410Experience 4151 19.4071 1.4655 12.5986 24.7297Population (1000s) 4151 3693.961 4064.522 90 33871.65Income (1000s) 4151 28.981 13.2338 4.4524 70.6046
Aggregated Census Sample(birth-state by birth-year by census wave)Number of Children Born 2279 2.8948 0.4348 1.6 4.25Years of Schooling 5637 13.6321 0.8652 10.3507 15.2270High School Grad Frac. 5637 0.8095 0.1230 0.2970 0.9889Poverty Frac. 5637 0.2295 0.0762 0.0775 0.5552Family Income (1000s) 5637 45.2784 16.5173 17.1398 96.7936
Summary & Notes: This table provides summary statistics for most variables used in our analysis. Definitions and sources ofeach variable are given in the variable appendix. State-level HLA diversity scores are given in Figure 1. Arizona has the lowestamount of diversity, and Maine has the highest level.
23
Tab
le2.
Pos
t-19
37tr
eatm
ent
for
each
bac
teri
alin
fect
ion
Dep
end
ent
Var
iab
le:
lnM
orta
lity
Rat
efr
om
:T
yp
hoid
Sca
rlet
Fev
erP
ertu
ssis
Tu
ber
culo
sis
Dip
hth
eria
(1)
(2)
(3)
(4)
(5)
Pos
t-19
37In
dic
ator
-6.1
006***
-5.1
977***
-5.2
208***
-3.1
101***
-5.9
399***
(0.2
713)
(0.3
101)
(0.3
055)
(0.3
078)
(0.2
885)
Sta
teF
EY
YY
YY
Tim
eT
ren
dY
YY
YY
Pos
t×
Tim
eT
ren
dY
YY
YY
Div
isio
n×
Tim
eT
ren
dY
YY
YY
Ob
serv
atio
ns
2087
3479
3537
3815
2567
RS
qr.
0.9
100
0.8
748
0.8
668
0.9
313
0.8
794
Yea
rs(u
nb
alan
ced
)1900-1
957
1900-1
988a
1900-1
988
1900-1
993
1900-1
967
Flu
an
dP
neu
mon
iaD
iarr
hea
Mate
rnal
Mort
ali
tyS
yp
hil
isA
ll(6
)(7
)(8
)(9
)(1
0)
Pos
t-19
37In
dic
ator
-5.0
747***
-5.5
163***
-4.0
705***
3.4
464***
-0.8
469***
(0.2
982)
(0.2
942)
(0.2
989)
(0.2
784)
(0.0
919)
Sta
teF
EY
YY
YY
Tim
eT
ren
dY
YY
YY
Pos
t×
Tim
eT
ren
dY
YY
YY
Div
isio
n×
Tim
eT
ren
dY
YY
YY
Ob
serv
atio
ns
4151
2711
3671
2782
4151
RS
qr.
0.8
571
0.8
865
0.8
873
0.8
972
0.9
065
Yea
rs(u
nb
alan
ced
)1900-2
000
1900-1
970
1900-1
990
1931-1
988
1900-2
000
Su
mm
ary
&N
ote
s:T
his
tab
lesh
ows
ad
ecli
ne
inb
acte
rial
infe
ctio
ns
aft
erth
epro
pose
din
terv
enti
on
yea
rof
1937.
All
esti
mati
on
isp
erfo
rmed
wit
hO
LS
wit
hst
ate-
clu
ster
edst
and
ard
erro
rsre
por
ted
inp
aren
thes
es.
Sta
tist
ical
sign
ifica
nce
isd
enote
dby
*,
**,
an
d***,
rep
rese
nti
ng
sign
ifica
nce
at
the
10,
5,
an
d1%
leve
ls,
resp
ecti
vely
.
aNodata
foryears
1949-1950
24
Tab
le3.
Pool
edeff
ect
ofH
LA
div
ersi
ty
Dep
end
ent
vari
ab
le:
lnof
mort
ali
tyfr
om
bact
eria
lin
fect
ion
s,1900-2
000
Sam
ple
per
iod
:1900-1
936
1937-2
000
1937-1
968
1969-2
000
(1)
(2)
(3)
(4)
lnH
LA
div
ersi
ty-9
.8426***
-4.0
232***
-7.0
485***
-0.9
979
(1.4
064)
(0.9
929)
(1.3
637)
(0.9
486)
Yea
rF
EY
YY
Y
Cen
sus
Div
isio
ns×
Yea
rF
EY
YY
YO
bse
rvat
ion
s1079
3072
1536
1536
RS
qr.
0.8
532
0.9
299
0.9
182
0.5
820
Su
mm
ary
&N
ote
s:T
his
tab
lesh
ows
that
HL
Ad
iver
sity
,w
hic
his
am
easu
reof
inn
ate
resi
stan
ce,
has
ast
ron
ger
effec
tin
per
iod
sp
rior
toth
ein
venti
on
an
dw
ides
pre
adu
sage
ofan
tib
ioti
cs.
Th
ees
tim
ates
ofT
able
3are
furt
her
illu
stra
ted
by
Fig
ure
2,
wh
ich
plo
tsth
eco
effici
ent
of
HL
Ad
iver
sity
on
bact
eria
lm
ort
ali
tyfr
om19
25-2
000.
All
esti
mat
ion
isp
erfo
rmed
wit
hO
LS
wit
hst
ate
-clu
ster
edst
an
dard
erro
rsre
port
edin
pare
nth
eses
.S
tati
stic
al
sign
ifica
nce
isd
enote
dby
*,
**,
and
***,
rep
rese
nti
ng
sign
ifica
nce
atth
e10
,5,
and
1%le
vel
s,re
spec
tive
ly.
25
Tab
le4.
Inte
nsi
tyof
trea
tmen
tfo
rea
chbac
teri
alin
fect
ion
Dep
end
ent
Var
iab
le:
lnM
orta
lity
Rat
efr
om
:T
yp
hoid
Sca
rlet
Fev
erP
ertu
ssis
Tu
ber
culo
sis
Dip
hth
eria
(1)
(2)
(3)
(4)
(5)
Pos
t-19
37×
lnH
LA
div
ersi
ty10.5
178**
1.5
289
11.8
007**
16.4
518***
11.4
033**
(4.5
373)
(4.4
033)
(4.7
859)
(4.6
864)
(5.4
123)
Sta
teF
EY
YY
YY
Yea
rF
EY
YY
YY
Cen
sus
Div
isio
ns×
Yea
rF
EY
YY
YY
Ob
serv
atio
ns
2087
3479
3535
3815
2567
RS
qr.
0.9
786
0.9
600
0.9
735
0.9
869
0.9
776
Yea
rs(u
nb
alan
ced
)1900-1
957
1900-1
988a
1900-1
988
1900-1
993
1900-1
967
Flu
an
dP
neu
mon
iaD
iarr
hea
Mate
rnal
Mort
ali
tyS
yp
hil
isA
ll(6
)(7
)(8
)(9
)(1
0)
Pos
t-19
37×
lnH
LA
div
ersi
ty8.3
232*
11.1
173**
9.2
750*
7.4
702***
5.6
042***
(4.7
486)
(4.7
024)
(5.1
785)
(2.1
406)
(1.2
364)
[1em
]S
tate
FE
YY
YY
Y
Yea
rF
EY
YY
YY
Cen
sus
Div
isio
ns×
Yea
rF
EY
YY
YY
Ob
serv
atio
ns
4151
2711
3671
2782
4151
RS
qr.
0.9
794
0.9
740
0.9
800
0.9
672
0.9
815
Yea
rs(u
nb
alan
ced
)1900-2
000
1900-1
970
1900-1
990
1931-1
988
1900-2
000
Su
mm
ary
&N
ote
s:T
his
tab
lesh
ows
HL
Ad
iver
sity
as
an
inte
nsi
ty-o
f-tr
eatm
ent
for
each
bact
eria
lin
fect
ion
that
com
pri
ses
ou
rm
easu
reof
bact
eria
lm
ort
ali
ty.
All
esti
mat
ion
isp
erfo
rmed
wit
hO
LS
wit
hst
ate-
clu
ster
edst
an
dard
erro
rsre
port
edin
pare
nth
eses
.S
tati
stic
al
sign
ifica
nce
isd
enote
dby
*,
**,
an
d***,
rep
rese
nti
ng
sign
ifica
nce
atth
e10
,5,
and
1%le
vels
,re
spec
tive
ly.
aNodata
foryears
1949-1950
26
Table 5. Baseline estimation: Piecemeal inclusion of controls
Dependent variable: ln of mortality from bacterial infections, 1900-2000
(1) (2) (3) (4) (5)
Post-1937 × ln HLA diversity 5.6042*** 5.1853*** 6.2646*** 5.2537*** 5.5830***(1.2364) (1.2267) (1.6332) (1.4213) (1.4966)
ln Residual Mortality 0.2575* 0.2740**(0.1365) (0.1181)
Count of Bacterial Infections -0.0089 -0.0186(0.0322) (0.0282)
Post-1937×controls measured in 1937:
ln Fraction Black -0.0219 -0.0646**(0.0367) (0.0304)
ln Urbanization Rate -0.2200* -0.1279(0.1142) (0.1853)
ln Fraction Foreign Born -0.0101 0.0378(0.0528) (0.0529)
ln Ethnic Fractionalization 1.2749* 1.1275*(0.6955) (0.6031)
ln Schools per Square Mile 0.1648** 0.1080(0.0706) (0.0650)
ln Edu. Exp. per Capita 0.2253 0.1406(0.1544) (0.1445)
ln Hospitals per Square Mile -0.1343* -0.0652(0.0763) (0.0804)
ln Physicians per Capita -0.0959 0.0358(0.1759) (0.1782)
ln World War II Spending 0.0205 0.0830***(0.0213) (0.0302)
ln Income per Capita -0.2765 -0.4962*(0.2550) (0.2890)
State, Year, and Year-by-Division FE Y Y Y Y Y
Observations 4151 4151 4151 4151 4151R Sqr. 0.9815 0.9823 0.9822 0.9823 0.9834
Summary & Notes: This table comprises our baseline estimation, which seeks to show that HLA diversity contributed to theresponse of medicines introduced in 1937. The positive coefficient of HLA diversity signifies a larger decline for those states thathave lower diversity within the HLA system. This supports our primary hypothesis. Controls for 1937 are interacted with thepost-1937 indicator. All estimation is performed with OLS with state-clustered standard errors reported in parentheses.Statistical significance is denoted by *, **, and ***, representing significance at the 10, 5, and 1% levels, respectively.
27
Table 6. Placebo test: Other causes of mortality and HLA diversity
Dependent variable: ln of all mortality less bacterial infections, 1900-2000
(1) (2) (3) (4) (5)
Post-1937 × ln HLA diversity 1.6245 1.6253 1.8878 -0.6815 0.4504(1.7332) (1.7334) (2.1942) (1.3982) (1.9229)
Count of Bacterial Infections 0.0126 0.0110(0.0207) (0.0166)
Post-1937×pre-period controls:
ln Fraction Black -0.0169 0.0173(0.0320) (0.0292)
ln Urbanization Rate -0.1304 -0.1937(0.1024) (0.1451)
ln Fraction Foreign Born 0.0269 -0.0888(0.0732) (0.0537)
ln Ethnic Fractionalization 0.8484 0.5235(0.6837) (0.5526)
ln Schools per Square Mile 0.0760 -0.0039(0.0533) (0.0628)
ln Edu. Exp. per Capita -0.0159 0.0805(0.1284) (0.1464)
ln Hospitals per Square Mile 0.0022 0.0635(0.0582) (0.0692)
ln Physicians per Capita -0.3810** -0.3496**(0.1676) (0.1515)
ln World War II Spending -0.0661** -0.0800**(0.0288) (0.0325)
ln Income per Capita 0.4426* 0.6626**(0.2431) (0.2566)
State, Year, and Year-by-Division FE Y Y Y Y Y
Observations 4151 4151 4151 4151 4151R Sqr. 0.5617 0.5617 0.5703 0.6028 0.6150
Summary & Notes: This table serves as a placebo test of the findings in Table 5. As shown, HLA diversity in the post-1937period has no association with other sources of mortality. Controls for 1937 are interacted with the post-1937 indicator. Allestimation is performed with OLS with state-clustered standard errors reported in parentheses. Statistical significance isdenoted by *, **, and ***, representing significance at the 10, 5, and 1% levels, respectively.
28
Table 7. Controlling for overall genetic diversity
Dependent variable: ln of mortality from bacterial infections, 1900-2000
(1) (2) (3) (4) (5)
Panel A. Controlling for AG’s genetic diversity
Post-1937 × ln HLA diversity 3.8492** 3.0435* 1.5173 4.5851* 1.7533(1.7140) (1.7977) (2.9087) (2.3581) (3.0627)
Post-1937 × ln Overall Genetic Diversity 4.6724 5.6814 8.9233** 1.5144 6.6439(3.6433) (3.5775) (4.0580) (3.9904) (4.3514)
Controls:Resid. Mort./ Bac. Count N Y N N Y
Demographic N N Y N Y
Infrastructure N N N Y Y
State, Year, and Year-by-Division FE Y Y Y Y Y
Observations 4151 4151 4151 4151 4151R Sqr. 0.9815 0.9824 0.9824 0.9823 0.9835
p-value for coef. of HLA div. = coef. of Table 5 0.31 0.24 0.11 0.78 0.22
p-value for joint sig. of HLA and AG’s div. 0.00 0.00 0.00 0.00 0.00
Panel B. Ratio of HLA diversity to overall diversity
Post-1937 × ln Ratio of HLA to Overall Diversity 6.0664** 5.3222* 5.4837 6.6347*** 5.2695(2.5585) (2.7494) (3.3601) (2.3869) (3.2982)
Controls:Resid. Mort./ Bac. Count N Y N N Y
Demographic N N Y N Y
Infrastructure N N N Y Y
State, Year, and Year-by-Division FE Y Y Y Y Y
Observations 4151 4151 4151 4151 4151R Sqr. 0.9811 0.9820 0.9818 0.9821 0.9832
Summary & Notes: This table controls for Ashraf and Galor’s overall measure of diversity. This measure of diversity isstrongly correlated with HLA diversity, increasing the standard errors in Panel A. To partially mitigate this collinearity, PanelB looks at the ratio of HLA diversity to AG’s measure of diversity. The inclusion of controls follows the format of Table 5.Time-invariant controls for 1937 are interacted with the post-1937 indicator. These include demographic controls–black fractionof the state, the urbanization rate, and the fraction of the population that is foreign born–and infrastructure controls–WorldWar 2 spending, schools per square mile, hospitals per square mile, and doctors per capita. All estimation is performed withOLS with state-clustered standard errors reported in parentheses. Statistical significance is denoted by *, **, and ***,representing significance at the 10, 5, and 1% levels, respectively.
29
Table 8. Mixed HLA Heterozygosity
Dependent variable: ln of mortality from bacterial infections, 1900-2000
(1) (2) (3) (4) (5)
Panel A. Mixed Ethnicity HLA Diversity
Post-1937 × ln mixed-ethnic HLA diversity 7.4687 9.1163 8.1091 6.2116 10.5054***(6.2778) (5.5596) (4.9864) (4.1544) (3.4436)
Controls:Resid. Mort./ Bac. Count N Y N N Y
Demographic N N Y N Y
Infrastructure N N N Y Y
State, Year, and Year-by-Division FE Y Y Y Y Y
Observations 4151 4151 4151 4151 4151R Sqr. 0.9808 0.9819 0.9818 0.9819 0.9833
Panel B. Mean of Seg. and Mixed Ethnicity HLA Diversity
Post-1937 × Avg. of Segregated and Mixed HLA diversity 8.3163*** 8.0381*** 8.4120*** 7.2637*** 8.0170***(1.9835) (1.8388) (2.5978) (2.0870) (2.2521)
Controls:Resid. Mort./ Bac. Count N Y N N Y
Demographic N N Y N Y
Infrastructure N N N Y Y
State, Year, and Year-by-Division FE Y Y Y Y Y
Observations 4151 4151 4151 4151 4151R Sqr. 0.9814 0.9823 0.9821 0.9822 0.9834
Summary & Notes: This table considers alternative ways of calculating HLA heterozygosity. Our base measure isrepresentative of segregated ethnic populations. Panel A instead considers an HLA diversity measure from a gene frequenciesrepresentative of fully mixed ethnicities. Panel B considers the average between the two measures. Time-invariant controls for1937 are interacted with the post-1937 indicator. These include demographic controls–black fraction of the state, theurbanization rate, and the fraction of the population that is foreign born–and infrastructure controls–World War 2 spending,schools per square mile, hospitals per square mile, and doctors per capita. All estimation is performed with OLS withstate-clustered standard errors reported in parentheses. Statistical significance is denoted by *, **, and ***, representingsignificance at the 10, 5, and 1% levels, respectively.
30
Tab
le9.
Inst
rum
enti
ng
pre
-per
iod
mor
tality
:C
onte
mp
orar
yst
ate
anal
ysi
s
Dep
end
ent
vari
able
:ln
Bac.
Mort
lnB
irth
Rate
lnY
ears
of
Sch
.ln
Exp
.ln
Pop
.ln
Inc.
(1)
(2)
(3)
(4)
(5)
(6)
Pan
elA
.O
LS
esti
mati
on
Pos
t×
lnp
re-p
erio
db
acte
rial
mor
t.-0
.6223***
-0.1
154
0.0
650
0.0
541
0.0
450
-0.0
542
(0.0
923)
(0.0
806)
(0.0
437)
(0.0
491)
(0.1
662)
(0.0
702)
Con
trol
s:R
esid
.M
ort.
/B
ac.
Cou
nt
YY
YY
YY
Dem
ogra
ph
icY
YY
YY
Y
Infr
astr
uct
ure
YY
YY
YY
Sta
te,
Yea
r,an
dY
ear-
by-D
ivis
ion
FE
YY
YY
YY
Ob
serv
atio
ns
4151
3811
4151
4151
4151
4151
RS
qr.
0.9
841
0.9
471
0.9
900
0.8
762
0.9
852
0.9
884
Pan
elB
.2S
LS
esti
mati
on
(in
stru
men
t=P
ost×
lnH
LA
div
ersi
ty)
Pos
t×
lnp
re-p
erio
db
acte
rial
mor
t.-0
.5890***
-0.2
142**
0.1
598**
0.1
522***
0.4
226
-0.0
011
(0.1
319)
(0.1
011)
(0.0
774)
(0.0
520)
(0.2
864)
(0.0
839)
Con
trol
s:R
esid
.M
ort.
/B
ac.
Cou
nt
YY
YY
YY
Dem
ogra
ph
icY
YY
YY
Y
Infr
astr
uct
ure
YY
YY
YY
Sta
te,
Yea
r,an
dY
ear-
by-D
ivis
ion
FE
YY
YY
YY
Ob
serv
atio
ns
4151
3811
4151
4151
4151
4151
Fir
st-s
tage
Fst
atis
tic
(KP
)24.1
75
23.5
61
24.1
75
24.1
75
24.1
75
24.1
75
p-v
alu
e,K
P-L
Mst
atis
tic
0.0
068
0.0
053
0.0
068
0.0
068
0.0
068
0.0
068
Su
mm
ary
&N
ote
s:T
his
tab
leco
mp
ares
OL
San
d2S
LS
esti
mate
sfo
ra
com
mon
lyu
sed
mea
sure
of
het
erogen
eity
base
don
pre
-per
iod
mort
ali
ty.
We,
alo
ng
wit
hot
her
s,ar
gue
that
pre
-per
iod
mor
tali
tyis
pot
enti
ally
rela
ted
wit
hin
com
ean
doth
erm
easu
res
of
wel
l-b
ein
gth
at
may
lead
ton
egati
veb
ias
init
ses
tim
ate
dre
lati
onsh
ipw
ith
chan
ges
toso
cioec
onom
icou
tcom
es.
Tim
e-in
vari
ant
contr
ols
for
the
pre
-per
iod
are
inte
ract
edw
ith
the
post
-1937
ind
icato
r.T
hes
ein
clu
de
dem
ogra
ph
icco
ntr
ols–
the
bla
ckfr
acti
onof
ast
ate,
the
urb
an
izati
on
rate
,an
dth
efr
act
ion
of
the
pop
ula
tion
that
isfo
reig
nb
orn
–an
din
frast
ruct
ure
contr
ols–
Wor
ldW
ar2
spen
din
g,sc
hool
sp
ersq
uar
em
ile,
hosp
itals
per
squ
are
mil
e,ed
uca
tion
exp
end
itu
res
per
cap
ita,
doct
ors
per
cap
ita,
an
dp
re-p
erio
din
com
e.S
tan
dar
der
rors
are
clu
sted
by
stat
e,an
dst
atis
tica
lsi
gnifi
can
ceis
den
ote
dby
*,
**,
an
d***,
rep
rese
nti
ng
sign
ifica
nce
at
the
10,
5,
an
d1%
leve
ls,
resp
ecti
vely
.
31
Tab
le10
.In
stru
men
ting
pre
-per
iod
mor
tality
:B
irth
cohor
tan
alysi
s
Dep
end
ent
vari
able
:ln
Ch
ild
ren
Born
lnY
ears
of
Sch
.H
igh
Sch
ool
Gra
d.
Fra
c.P
over
tyF
rac.
lnF
am
ily
Inc.
(1)
(2)
(3)
(4)
(5)
Pan
elA
.O
LS
esti
mati
on
Pos
t-19
37b
irth
-yea
r×
lnp
re-p
erio
db
act.
mor
t.-0
.0398**
0.0
557***
0.1
075***
-0.0
292**
0.0
339*
(0.0
167)
(0.0
118)
(0.0
198)
(0.0
109)
(0.0
173)
Con
trol
s:B
irth
-Sta
teB
asel
ine
YY
YY
Y
Bir
th-S
tate
,B
irth
-Yea
r,an
dB
irth
-Yea
r-by-D
ivis
ion
YY
YY
Y
Cen
sus
Wav
ean
dC
ensu
s-W
ave-
by-D
ivis
ion
FE
YY
YY
Y
Ob
serv
atio
ns
2279
5637
5637
5637
5637
RS
qr.
0.9
566
0.9
743
0.9
757
0.7
199
0.9
030
Pan
elB
.2S
LS
esti
mati
on
(in
stru
men
t=P
ost×
lnH
LA
div
ersi
ty)
Pos
t-19
37b
irth
-yea
r×
lnp
re-p
erio
db
act.
mor
t.-0
.0689**
0.0
990***
0.1
560***
-0.0
522***
0.0
518**
(0.0
300)
(0.0
168)
(0.0
283)
(0.0
169)
(0.0
248)
Con
trol
s:B
irth
-Sta
teB
asel
ine
YY
YY
Y
Bir
th-S
tate
,B
irth
-Yea
r,an
dB
irth
-Yea
r-by-D
ivis
ion
YY
YY
Y
Cen
sus
Wav
ean
dC
ensu
s-W
ave-
by-D
ivis
ion
FE
YY
YY
Y
Ob
serv
atio
ns
2279
5637
5637
5637
5637
Fir
st-s
tage
Fst
atis
tics
(KP
)25.5
849
27.5
066
27.5
066
27.5
066
27.5
066
p-v
alu
e,K
P-L
Mst
atis
tic
0.0
047
0.0
052
0.0
052
0.0
052
0.0
052
Cen
sus
Wav
es(b
yd
ecad
e)1980-1
990
1980-2
000
1980-2
000
1980-2
000
1980-2
000
Res
tric
tion
sW
om
en40
an
dold
erA
ll30
an
dold
erA
ll30
an
dold
erA
ll30
an
dold
erA
ll30
an
dold
er
Su
mm
ary
&N
ote
s:T
his
tab
lere
apea
tsth
ees
tim
atio
nst
rate
gy
of
Tab
le9
by
com
pari
ng
OL
San
d2S
LS
esti
mate
sfo
rp
re-p
erio
dm
ort
ali
ty.
Th
ein
trod
uct
ion
of
anti
bio
tics
isli
kely
tohav
ela
rger
effec
tsin
earl
ych
ild
hood
.T
om
easu
reth
ese
effec
ts,
Tab
le10
exam
ines
diff
eren
ces
inb
irth
-sta
teaggre
gate
dco
hort
sb
orn
pri
or
toan
daf
ter
the
1937
intr
od
uct
ion
;th
isis
sim
ilar
toth
est
rate
gy
emp
loye
dby
Bh
alo
tra
an
dV
enka
tara
man
i(2
015).
Bir
th-s
tate
base
lin
eco
ntr
ols
incl
ud
eb
irth
-sta
tean
db
irth
-yea
rfi
xed
effec
tsan
dth
est
ate-
year
resi
du
alm
ort
ali
tyra
te.
Ad
dit
ion
al
tim
e-in
vari
ant
contr
ols
for
the
pre
-per
iod
are
inte
ract
edw
ith
the
post
-1937
ind
icat
or.
Th
ese
incl
ud
ed
emog
rap
hic
contr
ols–
the
bla
ckfr
act
ion
of
ast
ate
,th
eu
rban
izati
on
rate
,an
dth
efr
act
ion
of
the
pop
ula
tion
that
isfo
reig
nb
orn
–an
din
fras
tru
ctu
reco
ntr
ols–
Wor
ldW
ar2
spen
din
g,sc
hool
sp
ersq
uare
mil
e,h
osp
itals
per
squ
are
mil
e,ed
uca
tion
exp
end
itu
res
per
cap
ita,
doct
ors
per
cap
ita,
an
dp
re-p
erio
din
com
e.T
oac
cou
nt
for
age
diff
eren
ces
du
rin
gth
esa
mp
ling
per
iod
,ce
nsu
sw
ave
fixed
effec
tsare
incl
ud
edin
all
esti
mati
on
s;th
ein
tera
ctio
nw
ith
cen
sus
div
isio
ns
isal
soin
clu
ded
.S
tan
dar
der
rors
are
clu
ster
edby
bir
thst
ate
,an
dst
ati
stic
al
sign
ifica
nce
isd
enote
dby
*,
**,
an
d***,
rep
rese
nti
ng
signifi
can
ceat
the
10,
5,an
d1%
leve
ls,
resp
ecti
vel
y.
32
010
020
030
040
0B
acte
rial M
orta
lity
Rat
e (p
er 1
00K
)
1927 1932 1937 1942 1947 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002Year
Above Median HLA States Below Median HLA States
Figure 1. Bifurcated HLA Heterozygosity: Year by Year Association.
Summary & Notes: The mean mortality rate from infectious disease is given on the y-axis. The sample is bifurcated by thosestates above and below the median HLA heterozygosity. As shown, those states above the median had a lower mortality rate in1940 that corresponds to a lower decline in mortality following the mid-20th century medical innovations.
33
0.325
1
0.338
0.332
6
0.336
0.319
9
0.333
2
0.309
4
0.340
70.3
387
0.339
9
0.342
1
0.343
3
0.338
8
0.339
4
0.338
5
0.339
9
0.340
4
0.338
0.338
2
0.327
6
0.339
1
0.341
8
0.332
3
0.338
2
0.3337
0.339
0.331
1
0.330
9
0.334
2
0.338
2
0.339
4
0.329
1
0.336
3
0.335
4
0.338
30.3
348
0.339
7
0.345
5
0.330
6
0.338
1
0.34450.3456
0.342
9
0.339
6
0.336
70.338
3
0.338
8
0.343
7
Fig
ure
2A.
Sta
te-L
evel
HL
AH
eter
ozgy
osit
y.
Su
mm
ary
&N
ote
s:T
his
figu
rep
lots
the
valu
eof
HL
Ad
iver
sity
for
each
state
.A
rizo
na
has
the
low
est
am
ou
nt
of
div
ersi
ty,
an
dM
ain
eh
as
the
hig
hes
tle
vel
.
34
Fig
ure
2B.
Shad
edH
LA
Het
eroz
gyos
ity.
Su
mm
ary
&N
ote
s:D
arke
rar
eas
rep
rese
nt
stat
esw
ith
agre
ate
ram
ou
nt
of
div
ersi
tyw
ith
inth
eH
LA
syst
em.
35
-20
-15
-10
-50
510
Effe
ct o
f HLA
Het
. on
Bac
teria
l Mor
talit
y
1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000Year
Coefficient of ln HLA Heterozygosity95% confidence interval
Figure 3. Yearly Association
Summary & Notes: This figure plots the coefficient of ln HLA diversity while regressing the natural log ofbacterial mortality for each year between 1925 and 2000. As shown, a statistically significant negativerelationship is observed for early years, but this relationship becomes insignificant for more contemporaryperiods.
36
Appendix: Matching IPUM’s ancestry to HLA diversity from Cook (2015)
Census Ancestry Ethnicity/Country from ALFRED and Cook (2015)
HLA heterozygosity
Country Code
Alsatian, Alsace-Lorraine .5 French, .5 Orcadian 0.346954 FRA
Andorran Italian 0.338867 ADO
Austrian Austria (country) 0.344784 AUT
Tirolean .5 French, .5 Orcadian 0.346954 DEU
Basque Basque 0.318691 ESP
French Basque Basque 0.318691 FRA
Belgian Belgium (country) 0.346954 BEL
Flemish .5 French, .5 Orcadian 0.346954 BEL
Walloon French 0.353319 BEL
British UK (country) 0.345724 GBR
British Isles UK (country) 0.345724 GBR
Channel Islander UK (country) 0.345724 GBR
Gibraltan Italian 0.338867 ITA
Cornish .5 French, .5 Orcadian 0.346954 GBR
Corsican .5 Italian, .5 French 0.346093 ITA
Cypriot Cyprus (country) 0.328028 CYP
Greek Cypriote Cyprus (country) 0.328028 CYP
Turkish Cypriote Cyprus (country) 0.328028 CYP
Danish Denmark (country) 0.340367 DNK
Dutch Netherlands (country) 0.346954 NLD
English .5 French, .5 Orcadian 0.346954 GBR
Faeroe Islander Orcadian 0.340588 GBR
Finnish Estonian 0.331338 FIN
Karelian Estonian 0.331338 FIN
French French 0.353319 FRA
Lorrainian French 0.353319 FRA
Breton .5 French, .5 Orcadian 0.346954 GBR
Frisian Orcadian 0.340588 DNK
Friulian Italian 0.338867 ITA
German Germany (country 0.340131 DEU
Bavarian Orcadian 0.340588 DEU
Berliner Orcadian 0.340588 DEU
Hamburger Orcadian 0.340588 DEU
Hanoverian Orcadian 0.340588 DEU
Hessian Orcadian 0.340588 DEU
Lubecker Orcadian 0.340588 DEU
Pomeranian Russian 0.321653 DEU
Prussian Orcadian 0.340588 DEU
Saxon .5 French, .5 Orcadian 0.346954 DEU
Sudetenlander Orcadian 0.340588 DEU
Westphalian .5 French, .5 Orcadian 0.346954 DEU
Greek Grece (country) 0.33026 GRC
Cretan .5 Russian, .5 Italian 0.33026 GRC
Cycladic Islander .5 Russian, .5 Italian 0.33026 GRC
Icelander Iceland (country) 0.340588 ISL
Irish Ireland (country) 0.346954 IRL
Italian Italian 0.338867 ITA
Abruzzi Italian 0.338867 ITA
Apulian Italian 0.338867 ITA
Basilicata Italian 0.338867 ITA
Calabrian Italian 0.338867 ITA
Amalfin Italian 0.338867 ITA
Emilia Romagna Italian 0.338867 ITA
Rome Italian 0.338867 ITA
Ligurian Italian 0.338867 ITA
Lombardian Italian 0.338867 ITA
Marches French 0.353319 ITA
Molise Italian 0.338867 ITA
Piedmontese Italian 0.338867 ITA
Puglia Italian 0.338867 ITA
Sardinian Sardinian 0.320644 ITA
Sicilian Sardinian 0.320644 ITA
Tuscan Italian 0.338867 ITA
Trentino Italian 0.338867 ITA
Umbrian Italian 0.338867 ITA
Valle dAosta Italian 0.338867 ITA
Venetian Italian 0.338867 ITA
Lapp Estonian 0.331338 EST
Liechtensteiner Liechtenstein (country) 0.345662 LIE
Luxemburger Luxemburg (country) 0.340903 LUX
Maltese Italian 0.338867 ITA
Manx Orcadian 0.340588 IMY
Monegasque French 0.353319 FRA
Northern Irelander UK (country) 0.345724 GBR
Norwegian Orcadian 0.340588 NOR
Portuguese Portugal (country) 0.345939 PRT
Azorean .5 Italian, .5 French 0.346093 PRT
Madeiran .5 Italian, .5 French 0.346093 PRT
Scottish UK (country) 0.345724 GBR
Swedish Sweden (country) 0.340435 SWE
Aland Islander Orcadian 0.340588 FIN
Swiss Switzerland (country) 0.342733 CHE
Suisse Switzerland (country) 0.342733 CHE
Romansch Italian 0.338867 ROM
Suisse Romane Italian 0.338867 ROM
Welsh UK (country) 0.345724 GBR
Scandinavian, Nordic Orcadian 0.340588 SWE
Albanian Albania (country) 0.323064 ALB
Azerbaijani Uyghur 0.321912 AZE
Belourussian Belarus (country) 0.321653 BLR
Bulgarian Bulgaria (country) 0.318447 BGR
Carpathian .5 Russian, .5 Italian 0.33026 ROM
Cossack Russian 0.321653 RUS
Croatian Croatia (country) 0.337452 HRV
Czechoslovakian Czech Republic (country) 0.321794 CZE
Bohemian Russian 0.321653 RUS
Estonian Estonian 0.331338 EST
Livonian Estonian 0.331338 EST
Finno Ugrian Estonian 0.331338 FIN
Mordovian Russian 0.321653 RUS
Voytak Russian 0.321653 RUS
Georgian Georgia (country) 0.315343 GEO
Germans from Russia .5 French, .5 Orcadian 0.346954 DEU
Rom .5 Russian, .5 Italian 0.33026 ROM
Hungarian Estonian 0.331338 HUN
Magyar Estonian 0.331338 HUN
Latvian Latvia (country) 0.321653 LVA
Lithuanian Lithuania (country) 0.321653 LTU
Macedonian Macedonia (country) 0.329711 MKD
Ossetian Balochi 0.311038 IRN
Polish Poland (country) 0.321653 POL
Kashubian Russian 0.321653 RUS
Romanian Romania (country) 0.330442 ROM
Bessarabian Estonian 0.331338 MDA
Moldavian Moldava (country) 0.327403 MDA
Wallachian .5 Russian, .5 Italian 0.33026 ROM
Russian Russian 0.321653 RUS
Muscovite Russian 0.321653 RUS
Serbian Serbia (country) 0.330307 YUG
Slovak Slovakia (country) 0.322838 SVK
Slovene Slovenia (country) 0.330264 SVN
Sorb/Wend Russian 0.321653 RUS
Bashkir Yakut 0.315058 TUR
Chevash Yakut 0.315058 TUR
Yakut Yakut 0.315058 TUR
Tatar Yakut 0.315058 TUR
Uzbek Uyghur 0.321912 UZB
Ukrainian Ukraine (country) 0.321678 UKR
Yugoslavian Serbia (country) 0.330307 YUG
Slav Russian 0.321653 RUS
Central European, nec .5 Russian, .5 Italian 0.33026 ROM
Northern European, nec Orcadian 0.340588 DNK
Southern European, nec Italian 0.338867 ITA
Western European, nec French 0.353319 FRA
Eastern European, nec Russian 0.321653 RUS
European, nec .5 Italian, .5 French 0.346093 DEU
Spaniard Spain (country) 0.345652 ESP
Catalonian Spain (country) 0.345652 ESP
Balearic Islander Spain (country) 0.345652 ESP
Galician .5 Italian, .5 French 0.346093 ESP
Mexican Mexico (country) 0.292182 MEX
Mexican American Mexico (country) 0.292182 MEX
Chicano/Chicana Mexico (country) 0.292182 MEX
Nuevo Mexicano Mexico (country) 0.292182 MEX
Californio Mexico (country) 0.292182 MEX
Costa Rican Costa Rica (country) 0.341711 CRI
Guatemalan Guatemala (country) 0.273953 GTM
Honduran Honduras (country) 0.302716 HND
Nicaraguan Nicaragua (country) 0.310932 NIC
Panamanian Panama (country) 0.296404 PAN
Salvadoran El Salvador (country) 0.299874 SLV
Latin American Mexico (country) 0.292182 MEX
Argentinean Argentinia (country) 0.337511 ARG
Bolivian Bolivia (country) 0.258775 BOL
Chilean Chile (country) 0.283092 CHL
Colombian Colombia (country) 0.306605 COL
Ecuadorian Ecuador (country) 0.276056 ECU
Paraguayan Paraguay (country) 0.28956 PRY
Peruvian Peru (country) 0.264896 PER
Uruguayan Uruguay (country) 0.34069 URY
Venezuelan Venezuala (country) 0.304496 VEN
South American Brazil (country) 0.324576 BRA
Puerto Rican Dominican Republic (country) 0.322436 PRI
Cuban Cuba (country) 0.331594 CUB
Dominican Domincan Republic (country) 0.322436 DOM
Hispanic Mexico (country) 0.292182 MEX
Spanish .5 Italian, .5 French 0.346093 ESP
Spanish American .5 Italian, .5 French 0.346093 ESP
Bahamian Bahamas (country) 0.322089 BHS
Barbadian Barbados (country) 0.319475 BRB
Belizean Belize (country) 0.311487 BLZ
Bermudan Bermuda (country) 0.329096 BMU
Cayman Islander Jamaica (country) 0.32057 CYM
Jamaican Jamaica (country) 0.32057 JAM
Dutch West Indies Jamaica (country) 0.32057 ANT
Aruba Islander Venezuala (country) 0.304496 ABW
St Maarten Islander Antigua (country) 0.318978 ANT
Trinidadian/Tobagonian Trinidad (country) 0.32142 TTO
Trinidadian Trinidad (country) 0.32142 TTO
Tobagonian Trinidad (country) 0.32142 TTO
U.S. Virgin Islander Antigua (country) 0.318978 VIR
British Virgin Islander Antigua (country) 0.318978 VIR
British West Indian Bermuda (country) 0.329096 BMU
Turks and Caicos Islander Bahamas (country) 0.322089 BHS
Anguilla Islander Antigua (country) 0.318978 ATG
Dominica Islander Dominca (country) 0.31651 DMA
Grenadian Grenada (country) 0.318636 GRD
St Lucia Islander St Kitts (country) 0.319547 LCA
French West Indies St Kitts (country) 0.319547 KNA
Guadeloupe Islander Dominca (country) 0.31651 DMA
Cayenne Dominca (country) 0.31651 DMA
West Indian Dominican Republic (country) 0.322436 DOM
Haitian Haiti (country) 0.319228 HTI
Brazilian Brazil (country) 0.324576 BRA
San Andres Jamaica (country) 0.32057 JAM
Guyanese/British Guiana Guyana (country) 0.313851 GUY
Providencia Jamaica (country) 0.32057 JAM
Surinam/Dutch Guiana Surinam (country) 0.320199 SUR
Algerian Algeria (country) 0.332228 DZA
Egyptian Egypt (country) 0.329321 EGY
Libyan . . LBY
Moroccan Morroco (country) 0.335216 MAR
Ifni Morroco (country) 0.335216 MAR
Tunisian Tunisia (country) 0.32949 TUN
North African Egypt (country) 0.329321 EGY
Alhucemas Morroco (country) 0.335216 MAR
Berber Mozabite 0.34386 MAR
Rio de Oro Morroco (country) 0.335216 MAR
Bahraini Bahrain (country) 0.327284 BHR
Iranian Iran (country) 0.314294 IRN
Iraqi Iraq (country) 0.325699 IRQ
Israeli . . ISR
Jordanian Jordan (country) 0.328972 JOR
TransJordan Jordan (country) 0.328972 JOR
Kuwaiti Kuwait (country) 0.32845 KWT
Lebanese Lebanon (country) 0.328391 LBN
Saudi Arabian Saudi Arabia (country) 0.329846 SAU
Syrian Syria (country) 0.327922 SYR
Armenian Armenia (country) 0.317006 ARM
Turkish Turkey (country) 0.315704 TUR
Yemeni Oman (country) 0.327634 OMN
Omani Oman (country) 0.327634 OMN
Muscat Oman (country) 0.327634 OMN
Trucial Oman Oman (country) 0.327634 OMN
Qatar Qatar (country) 0.324068 QAT
Bedouin Bedouin 0.334572 SAU
Kurdish Balochi 0.311038 IRN
Kuria Muria Islander Oman (country) 0.327634 OMN
Palestinian Palestinian 0.329321 JOR
Gazan Palestinian 0.329321 JOR
West Bank Palestinian 0.329321 JOR
South Yemeni Oman (country) 0.327634 YEM
Aden Oman (country) 0.327634 YEM
United Arab Emirates Saudi Arabia (country) 0.329846 ARE
Assyrian/Chaldean/Syriac Syria (country) 0.327922 SYR
Middle Eastern Saudi Arabia (country) 0.329846 SAU
Arab Palestinian 0.329321 SAU
Angolan Angola (country) 0.329822 AGO
Benin Benin (country) 0.311931 BEN
Botswana Botswana (country) 0.325655 BWA
Burundian Burundi (country) 0.329497 BDI
Cameroonian Cameroon (country) 0.317818 CMR
Cape Verdean Cape Verde (country) 0.328605 CPV
Chadian . . TCD
Congolese Congo (country) 0.329822 COG
Equatorial Guinea Equatorial Guinea (country) 0.329238 GNQ
Corsico Islander Equatorial Guinea (country) 0.329238 GNQ
Ethiopian . . ETH
Eritrean . . ERI
Gabonese Gabon (country) 0.329822 GAB
Gambian Gambia (country) 0.284383 GMB
Ghanian Ghana (country) 0.309695 GHA
Guinean Guinea (country) 0.284383 GIN
Guinea Bissau Guinea Bissau (country) 0.284383 GNB
Ivory Coast Ivory Coast (country) 0.305893 CIV
Kenyan . . KEN
Lesotho Lesotho (country) 0.329822 LSO
Liberian Liberia (country) 0.296039 LBR
Madagascan Madagascar (country) 0.300658 MDG
Malian Mali (country) 0.295182 MLI
Namibian Namibia (country) 0.320899 NAM
Niger Niger (country) 0.30862 NER
Nigerian Nigeria (country) 0.311891 NGA
Fulani Mandenka 0.284383 GIN
Hausa Mandenka 0.284383 NGA
Ibo Yoruba 0.314909 NGA
Tiv Yoruba 0.314909 NGA
Rwandan Rwanda (country) 0.329505 RWA
Senegalese Senegal (country) 0.284383 SEN
Sierra Leonean Sierra Leone (country) 0.284383 SLE
Somalian . . SOM
Swaziland Swaziland (country) 0.33031 SWZ
South African South Africa (country) 0.329926 ZAF
Union of South Africa South Africa (country) 0.329926 ZAF
Afrikaner Orcadian 0.340588 NLD
Zulu Bantu 0.329822 ZAF
Sudanese . . SDN
Fur . . SDN
Tanzanian Tanzania (country) 0.326463 TZA
Togo Togo (country) 0.303878 TGO
Ugandan Uganda (country) 0.329822 UGA
Zairian DRC (country) 0.329822 ZAR
Zambian Zambia (country) 0.329822 ZMB
Zimbabwean Zimbabwe (country) 0.329572 ZWE
African Islands Bantu 0.329822 MOZ
Central African . . .
East African . . .
West African Yoruba 0.314909 NGA
African Black 0.318553 AA
Afghan Afghanistan (country) 0.329023 AFG
Baluchi Balochi 0.311038 PAK
Pathan Pashtun 0.330319 PAK
Bengali Bangladesh (country) 0.323009 BGD
Bhutanese Bhutan (country) 0.289036 BTN
Nepali Nepal (country) 0.241703 NPL
Asian Indian India (country) 0.320623 IND
Andaman Islander India (country) 0.320623 IND
Andhra Pradesh India (country) 0.320623 IND
Assamese India (country) 0.320623 IND
Goanese India (country) 0.320623 IND
Gujarati India (country) 0.320623 IND
Karnatakan India (country) 0.320623 IND
Keralan India (country) 0.320623 IND
Maharashtran India (country) 0.320623 IND
Madrasi India (country) 0.320623 IND
Mysore India (country) 0.320623 IND
Naga India (country) 0.320623 IND
Pondicherry India (country) 0.320623 IND
Punjabi Pakistan (country) 0.324431 PAK
Tamil Brahui 0.313751 LKA
Pakistani Pakistan (country) 0.324431 PAK
Sri Lankan Sri Lanka (country) 0.320953 LKA
Singhalese Sri Lanka (country) 0.320953 LKA
Veddah Sri Lanka (country) 0.320953 LKA
Maldivian Sri Lanka (country) 0.320953 LKA
Burmese Myanmar (country) 0.302644 MMR
Shan Myanmar (country) 0.302644 MMR
Cambodian Cambodia (country) 0.299148 KHM
Khmer Cambodian 0.298463 KHM
Chinese China (country) 0.320225 CHN
Cantonese China (country) 0.320225 CHN
Manchurian China (country) 0.320225 CHN
Mongolian Mongolia (country) 0.315546 MNG
Tibetan China (country) 0.320225 CHN
Hong Kong China (country) 0.320225 CHN
Macao China (country) 0.320225 CHN
Filipino Phillipines (country) 0.298659 PHL
Indonesian Indonesia (country) 0.298463 IDN
Japanese Japan (country) 0.313769 JPN
Ryukyu Islander Japan (country) 0.313769 JPN
Okinawan Japan (country) 0.313769 JPN
Korean South Korea (country) 0.329926 KOR
Laotian Laos (country) 0.271095 LAO
Meo Miao 0.321474 CHN
Hmong She 0.275142 CHN
Malaysian Malaysia (country) 0.306422 MYS
Singaporean Singapore (country) 0.317352 SIN
Thai Thailand (country) 0.278977 THA
Black Thai Thailand (country) 0.278977 THA
Western Lao Laos (country) 0.271095 LAO
Taiwanese China (country) 0.320225 CHN
Vietnamese Vietnam (country) 0.298078 VNM
Katu Cambodian 0.298463 KHM
Mnong She 0.275142 CHN
Montagnard Vietnam (country) 0.298078 VNM
Indochinese Vietnam (country) 0.298078 VNM
Eurasian . . .
Asian . . .
Australian Australia (country) 0.343894 AUS
Tasmanian . . .
New Zealander New Zealand (country) 0.327982 NZL
Polynesian .5 Cambodian, .5 Melanesian 0.254042 WSM
Maori .5 Cambodian, .5 Melanesian 0.254042 WSM
Hawaiian .5 Cambodian, .5 Melanesian 0.254042 WSM
Part Hawaiian .5 Cambodian, .5 Melanesian 0.254042 WSM
Samoan .5 Cambodian, .5 Melanesian 0.254042 WSM
Tongan Tonga (country) 0.256664 TON
Tokelauan .5 Cambodian, .5 Melanesian 0.254042 TON
Cook Islander .5 Cambodian, .5 Melanesian 0.254042 TON
Tahitian .5 Cambodian, .5 Melanesian 0.254042 TON
Niuean .5 Cambodian, .5 Melanesian 0.254042 TON
Micronesian Micronesia (country) 0.210421 FSM
Guamanian .5 Cambodian, .5 Melanesian 0.254042 MNP
Chamorro Islander .5 Cambodian, .5 Melanesian 0.254042 GUM
Saipanese .5 Cambodian, .5 Melanesian 0.254042 MNP
Palauan Palau (country) 0.272599 PLW
Marshall Islander Marshall Islands (country) 0.210078 MHL
Kosraean Nauru (country) 0.241703 MHL
Chuukese Nauru (country) 0.241703 MHL
Yap Islander Nauru (country) 0.241703 MHL
Caroline Islander Nauru (country) 0.241703 MHL
Kiribatese .5 Cambodian, .5 Melanesian 0.254042 KIR
Nauruan Nauru (country) 0.241703 MHL
Melanesian Islander Melanesian 0.209622 SLB
Fijian Fiji (country) 0.260992 FJI
New Guinean PNG (country) 0.234712 PNG
Papuan PNG (country) 0.234712 PNG
Solomon Islander SI (country) 0.211849 SLB
Vanuatuan Vanuatu (country) 0.211002 VUT
Pacific Islander .5 Cambodian, .5 Melanesian 0.254042 OC
Oceania .5 Cambodian, .5 Melanesian 0.254042 OC
Afro-American Black 0.318553 AA
American Indian (all tribes)
Pima 0.246333 AI
Aleut Pima 0.246333 AI
Eskimo Pima 0.246333 AI
White/Caucasian .5 Italian, .5 French 0.346093 WH
Greenlander . . .
Canadian Canada (country) 0.350252 CAN
Newfoundland Canada (country) 0.350252 CAN
Nova Scotian Canada (country) 0.350252 CAN
French Canadian French 0.353319 FRA
Acadian French 0.353319 FRA
American USA (country) 0.335237 USA
United States USA (country) 0.335237 USA
North American USA (country) 0.335237 USA