decomposing culture: an analysis of gender,...

51
Decomposing culture: An analysis of gender, language, and labor supply in the household Victor Gay * Daniel L. Hicks Estefania Santacreu-Vasut Amir Shoham § Abstract Despite broad progress in closing many dimensions of the gender gap around the globe, recent research has shown that traditional gender roles can still exert a large influence on female labor force participation, even in developed economies. This paper empirically analyzes the role of culture in determining the labor market engagement of women within the context of collective models of household decision making. In particular, we use the epidemiological approach to study the relationship between gender in language and labor market participation among married female immigrants to the U.S. We show that the presence of gender in language can act as a marker for culturally acquired gender roles and that these roles are important determinants of household labor allocations. Female immigrants who speak a language with sex-based grammatical rules exhibit lower labor force participation, hours worked, and weeks worked. Our strategy of isolating one component of culture allows us to shed light on several important mechanisms influencing women’s economic engagement, including the role of bargaining power in the household and the impact of ethnic enclaves. Our results further suggest that language may influence behavior in both of these contexts. Keywords Language, Gender gap, Labor force participation, Immigrants JEL classification F22, J16, Z13 * Department of Economics, The University of Chicago, 5757 S. University Ave, Chicago, IL, 60637, U.S.A. Department of Economics, University of Oklahoma, 308 Cate Center Drive, Norman, OK, 73019, U.S.A. Corresponding Author: Department of Economics, ESSEC Business School and THEMA, 3 Avenue Bernard Hirsch, 95021 Cergy-Pontoise, France ([email protected]) § Temple University and COMAS, 1801 N. Broad St, Philadelphia, PA, 19122, U.S.A. 1

Upload: others

Post on 15-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Decomposing culture An analysis of gender language and

labor supply in the household

Victor Gaylowast Daniel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

Abstract

Despite broad progress in closing many dimensions of the gender gap around the globe recent research has shownthat traditional gender roles can still exert a large influence on female labor force participation even in developedeconomies This paper empirically analyzes the role of culture in determining the labor market engagement of womenwithin the context of collective models of household decision making In particular we use the epidemiologicalapproach to study the relationship between gender in language and labor market participation among married femaleimmigrants to the US We show that the presence of gender in language can act as a marker for culturally acquiredgender roles and that these roles are important determinants of household labor allocations Female immigrants whospeak a language with sex-based grammatical rules exhibit lower labor force participation hours worked and weeksworked Our strategy of isolating one component of culture allows us to shed light on several important mechanismsinfluencing womenrsquos economic engagement including the role of bargaining power in the household and the impactof ethnic enclaves Our results further suggest that language may influence behavior in both of these contexts

Keywords Language Gender gap Labor force participation Immigrants

JEL classification F22 J16 Z13

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

1 Introduction

The labor force participation rate of married women has increased spectacularly during the second half of the 20th

century both in the US and around the world (Goldin 2014) Nevertheless wide differences across countries remain

with the gender gap closing faster in some countries than others and a number of developing nations such as India

lagging behind (Field et al 2016) Among the factors used to explain these differences the role of cultural norms

has been shown to play a crucial role (Fernaacutendez 2007 Fernaacutendez amp Fogli 2009 Alesina et al 2013 Farreacute amp Vella

2013 Fernaacutendez 2013) In particular scholars have begun employing the epidemiological approach to distinguish the

impact of cultural factors from that of other institutional forces (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011)1

When analyzing married couples both individual and spousal culture are thought to play an important role in

explaining labor market decisions Early collective models of labor supply emphasize the importance of taking into

account the distribution of bargaining power within the household (Chiappori et al 2002 Blundell et al 2007) Re-

cently however these models have been extended to investigate whether their main predictions depend not only on

bargaining power but also on the surrounding cultural context (Oreffice 2014) as well as on social and institutional

constraints related to traditional gender roles (Field et al 2016) These norms may influence women to allocate more

labor to household production relative to the formal market at either the intensive or extensive margin

This paper studies the impact of gender roles on the labor force participation of married female immigrants in

the US within a collective labor supply framework We explore culturally acquired gender roles as those embod-

ied by the presence of grammatical gender in language spoken Recent work has documented correlations between

linguistic structure and individual behavior (Lupyan amp Dale 2010 Chen 2013 Ladd et al 2015) with the presence

and intensity of gender in grammatical structures correlating with gender gaps in compensation and promotions divi-

sion of household labor educational attainment and political empowerment (Givati amp Troiano 2012 Santacreu-Vasut

et al 2013 Santacreu-Vasut et al 2014 Hicks et al 2015 Mavisakalyan 2015 van der Velde et al 2015 Davis amp

Reynolds 2016)

The mechanisms underlying these associations are largely unresolved Perhaps the most compelling question

whether language may causally influence behavior remains a subject of debate2 Linguistic correlations could reflect

historically acquired cultural norms of behavior that became codified in language or language itself could represent an

institutional force influencing and perpetuating a set of behaviors

From a methodological point of view because language is observed at the individual level it is possible to isolate

its effect from the influence of aggregate correlated historical cultural and biological forces This is an exercise

we undertake in this analysis following the epidemiological approach Furthermore because language acts a cultural

marker for gender roles we show that this barometer of norms allows us to examine labor market predictions of the

collective household model in the presence or absence of these cultural influences This yields new insights regarding

their relative influence

1Gay et al (2016) provides an in-depth discussion of the epidemiological approach in this context2Theoretically language structures could influence preference formation information processing categorization of social reality and the salienceof certain categories of words These impacts could alter a speakerrsquos decision making and behavior (Mavisakalyan amp Weber 2016)

2

To do this we examine labor market outcomes for nearly half a million married female immigrants to the US aged

25 to 49 in the American Community Surveys (ACS) between 2005 and 2015 In our analysis we focus on married

foreign born individuals who speak a language other than English in the home With the assistance of linguists and

drawing on information contained in the World Atlas of Linguistic Structures database (WALS) we assign to these

speakers consistent measures of the presence and intensity of gender in the grammatical structure of their language

There are several novel empirical advances to the approach taken in this paper First this analysis relies on an

incredibly detailed set of microeconomic data which provides a rich set of covariates and substantial variation to

control for many potential confounding factors These include many of the individual spousal and household level

variables analyzed in the past as well as measures which may influence the decision to migrate in the country of

origin and measures of the location to which the immigrant has moved Moreover the unique sample also allows us to

include country fixed effects to capture a wide array of unobservable cultural forces A second advantage is reliance

on the epidemiological approach in this setting which helps ameliorate a fundamental problem of identification when

examining the impact of language on behavior

The takeaways from this exercise are several First we show that married female immigrants speaking a language

with sex-based distinctions in its grammar are less likely to participate in the labor market This is true even after con-

trolling for observable characteristics such as traditional household measures husband characteristics and bargaining

power measures as well as when controlling for a vast set of unobservable cultural forces through country of origin

fixed effects3 When we empirically decompose the relationship between gender in language and gendered behavior

in this manner it suggests that roughly two thirds of this relationship can be explained by correlated cultural factors

with about one third potentially explained by language having a causal impact

Second using language as a measure of culturally acquired gender roles allows us to speak to and test the role

of bargaining power within the household relative to the impact of cultural norms We demonstrate that both are

important and that their impacts are distinct from one another

Third focusing on language spoken allows to also study the behavior of female immigrants in both linguistically

homogeneous and heterogeneous couples We exploit linguistic heterogeneity within the household to show that the

presence of gender in language has an association with a wifersquos behavior both when husbands and wives speak a

gendered language and when they speak languages with different structures Our findings suggest that while there

is an independent effect on a wifersquos behavior when she alone has a gender marked language gender marking in the

husbandrsquos language enhances this effect

Finally using language as a measure of gender roles but recognizing it as a network technology allows us to exam-

ine the role that ethnic enclaves play in influencing female labor force participation In theory enclaves may improve

labor market outcomes by providing information about formal jobs and reducing social stigma on employment At the

same time enclaves are likely to reinforce immigrant language usage and thus may enhance the impact of gender in

language or provide isolation from US norms We present evidence implying that the latter effect is present which

3One of the challenges of the literature on culture and economic behavior has been to measure culture An advantage of using language overexisting alternatives is that language is defined at the individual level and is not an outcome measure It allows us to control for country of origincharacteristics including time varying ones

3

suggests that the impact of language is stronger when it is shared with the surrounding community4

These findings speak to several literatures Mavisakalyan amp Weber (2016) review of the nascent field of linguistic

relativity and economics and point out that the mechanisms behind the associations between language and economic

behavior remain largely unresolved Despite the current studyrsquos ability to account for both time-invariant and time-

variant country of origin factors in a more rigorous manner than previous analyses the correlation between language

and behavior remains significant and negative This means that while we can demonstrate that most of this association

can be explained by language as a cultural market for correlated gender norms we cannot formally disprove that the

behavioral channel may explain some of the remaining correlation (Roberts amp Winters 2013 Hicks et al 2015 Roberts

et al 2015)5

Our research also directly contributes to literature that investigates whether the impact of intra-household bargain-

ing power on the labor supply depends on culture In particular our results imply that the standard prediction that

the spouse with higher bargaining power will substitute labor for leisure due to an income effect as in Oreffice (2014)

applies only to native born couples in the US and to immigrant couples with gender roles similar to the US Interest-

ingly our results confirm that in couples coming from countries classified as exhibiting strong traditional gender roles

the influence of bargaining power on household collective labor supply may be culturally dependent and that some of

these standard predictions may not hold for these groups

While Oreffice (2014) focuses on the intensive labor supply among couples where both spouses are working we

focus mainly on the extensive margin and how bargaining power influences female labor force participation Field

et al (2016) study the extensive margin of the labor supply among married female in India and model the impact

of bargaining power distribution within the household with social constraints related to traditional gender norms

Our findings complement these studies by empirically showing that the married female immigrants with stronger

bargaining power are more likely to work not less and that the impact of bargaining power is reinforced when

traditional gender roles are embodied in the language spoken

This paper is structured as follows Section 2 presents summary statistics for both the ACS and linguistic data

and details the empirical strategy Section 3 presents the empirical results - decomposing the impact of language from

other cultural influences presenting a wide range of robustness checks and then using gender in language as a cultural

marker to study labor supply within the household Section 4 concludes

2 Methodology

21 Data

211 Economic and demographic data

We combine data from several sources in our analysis First we obtain detailed demographic and economic data for

the immigrant population to the US from the American Community Surveys (ACS) 1 samples from 2005 to 2015

4This result is not solely attributable to selection since it holds for women married prior to migrationmdashsometimes referred to as ldquotied womenrdquo5See Everett (2013) for a review of the linguistics literature

4

(Ruggles et al 2015)

Table 1 Summary StatisticsFemale Immigrants Married Spouse Present Aged 25-49

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 377 66 25 49 480618 -10Years since immigration 148 93 0 50 480618 -00Age at immigration 228 89 0 49 480618 -10

Educational AttainmentCurrent student 006 024 0 1 480618 -002Years of schooling 124 37 0 17 480618 -13No schooling 005 021 0 1 480618 000Elementary 012 032 0 1 480618 010High school 037 048 0 1 480618 010College 046 050 0 1 480618 -019

Race and ethnicityAsian 029 046 0 1 480618 -065Black 002 014 0 1 480618 001White 047 050 0 1 480618 041Other 021 041 0 1 480618 024Hispanic 053 050 0 1 480618 063

Ability to speak EnglishNot at all 011 032 0 1 480618 008Not well 024 043 0 1 480618 002Well 025 043 0 1 480618 -008Very well 040 049 0 1 480618 -002

Labor market outcomesLabor participant 060 049 0 1 480618 -010Employed 055 050 0 1 480618 -012Yearly weeks worked (excl 0) 4416 132 7 51 302653 -16Yearly weeks worked (incl 0) 2718 239 0 51 480618 -58Weekly hours worked (excl 0) 3657 114 1 99 302653 -18Weekly hours worked (incl 0) 2251 199 0 99 480618 -51Labor income (thds) 253 285 00 5365 282057 -88

B Household characteristics

Years since married 124 75 0 39 352059 014Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 038Household size 426 162 2 20 480618 039Household income (thds) 610 596 -310 15303 480618 -208

Table 1 notes The summary statistics are computed using sample weights (PERWT) provided in theACS (Ruggles et al 2015) except those for household characteristics which are computed usinghousehold weights (HHWT) The last column reports the estimate β from regressions of the typeXi = α +β SBI+ ε where Xi is an individual level characteristic Robust standard errors are notreported See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

We restrict the analysis to female respondents aged 25 to 49 born abroad from non-American parents living in

married-couple family households in which the husband is present and who report speaking a language other than

English in the home Moreover we only keep respondents that report uniquely identifiable countries of birth and

5

languages and for which we have information on the grammatical structure of the language reported Appendix A11

provides an exhaustive list of these sample restrictions and details the precise process taken to construct the regression

sample It also provides precise definitions for all variables employed6

The main dependent variable used throughout the analysis is a measure of labor market engagement labor force

participation which is defined as an indicator equal to one if the respondent is either employed or actively looking for

a job7 In some specifications we also analyze other labor market outcomes such as yearly weeks worked and usual

weekly hours worked both including and excluding zeros to capture both the extensive and intensive margins of the

labor supply8

Table 1 presents summary statistics for the demographic and economic characteristics of the respondents in the

regression sample On average the typical married women in our sample is in her upper 30s immigrated around 15

years before and has over 12 years of education 60 of these respondents participate in the labor force with 55

reporting formal employment As can be seen from the last column of Table 1 there is a large difference in means be-

tween the labor market outcomes for sex-based speakers and non sex-based speakers with the former group exhibiting

far lower levels of economic engagement There is also sizeable variation in English proficiency as well as in racial

and ethnic composition for this sample Mean household income is just over $60000 and the average duration of the

current marriage is 12 years Many households have children with the mean being almost two Overall the popu-

lation studied contains rich variation with over 480000 adult female immigrants originating from 135 countries and

speaking 63 different languages Appendix Table C1 provides the distribution of languages spoken in the regression

sample and highlights the extensive variation in languages spoken by immigrants to the US Moreover while they are

not the primary groups of interest in this analysis we further provide similar summary statistics for the respondentsrsquo

husbands in Appendix Table C3 as well as various within-household gender gap measures in Appendix Table C4

both of which are used in subsequent analysis

212 Linguistic data

Next we follow Gay et al (2013) and Hicks et al (2015) and assign to each language measures that quantify the

presence and frequency of gender distinctions in its grammatical rules We construct these measures using information

compiled by linguists in the World Atlas of Language Structures (WALS Dryer amp Haspelmath 2011) We expand

the original WALS dataset in collaboration with linguists for several additional languages making the sample more

representative9

Our primary measure of gender in language is an indicator for whether a language employs a grammatical gender

6Formal checks suggest that the regression sample is not biased by the availability of the country of birth and the linguistic data Appendix TableC2 provides summary statistics comparable to those in Table 1 without dropping the respondents for which the country of birth or languageare not precisely identifiable which is about 678 of the uncorrected regression sample The last column of this table demonstrates that dataconstraintsmdashie needing to know language spoken and country of birthmdashdo not meaningfully bias the sample in any way along these observables

7This is the LABFORCE variable in the ACS (Ruggles et al 2015) See appendix A21 for more details8The yearly weeks worked correspond to the WKSWORK2 variable in the ACS (Ruggles et al 2015) Because this variable is given in intervals wedefine yearly weeks worked as the midpoint of those intervals The usual weekly hours worked correspond to the UHRSWORK variable in the ACS(Ruggles et al 2015) Appendix A21 provides additional details

9The languages for which some variables were compiled are detailed in appendix B For robustness we have also verified that the exclusion of ournewly assigned languages in favor of the original WALS set of languages does not alter the main findings of the analysis

6

system based on biological sex (SB) We also investigate gender distinctions in other features of the grammatical

structure For instance languages with only a male and a female gender force speakers to make more sex-based

distinctions than those which include a neuter gender NG is an indicator variable equal to one for languages with

exactly two genders and equal to zero otherwise Similarly there is heterogeneity across languages in the presence

and quantity of gendered personal pronouns This feature is given by the variable GP which captures rules related to

gender agreement with pronouns Finally some languages assign gender due to semantic reasons only while others

assign gender due to both semantic and formal reasons making gender more recurrent in the latter case This feature

is given by the variable GA which captures the rules for gender assignment10

Finally we employ a measure of grammatical gender intensity similar to the one built by Gay et al (2013) This

measure captures how many of the above features are present in a language It is defined as

Intensity= SBtimes (GP+GA+NG)

where Intensity is a categorical variable that ranges from 0 to 311 The Intensity measure allows us to capture

the ranking of intensity of female male distinctions in the grammatical rules as follows If a language has a sex-

based gender system its intensity score is the sum of its scores on the three other gender features Hence a strictly

positive score captures languages that have strong female male distinctions Table 2 provides summary statistics for

the language variables contained in the regression sample

Table 2 Summary Statistics Language VariablesFemale Immigrants Married Spouse Present Aged 25-49

Definition Mean Sd Min Max Obs

SB Sex-based 083 037 0 1 480618NG Number of genders 072 045 0 1 480618GA Gender assignment 076 043 0 1 480618GP Gender pronouns 057 050 0 1 478883

Intensity (GP + GA + NG) times SB 204 121 0 3 478883

Table 2 notes The summary statistics are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015)

There is substantial linguistic heterogeneity in terms of grammatical gender in the sample We demonstrate the

robustness of our main findings across all of these measures with additional checks presented in Appendix Table C7

22 Empirical Strategy

Our empirical strategy follows the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011

Blau et al 2011 Blau amp Kahn 2015) This approach compares outcomes across immigrants with varying geograph-

ical origins but living in a common institutional legal and social environment thereby allowing to separate cultural

10More detailed definitions of these individual measures can be found in Hicks et al (2015)11This measure should not be taken as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar The

discussion of the measure is extended in Appendix discussion of Table C7

7

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

1 Introduction

The labor force participation rate of married women has increased spectacularly during the second half of the 20th

century both in the US and around the world (Goldin 2014) Nevertheless wide differences across countries remain

with the gender gap closing faster in some countries than others and a number of developing nations such as India

lagging behind (Field et al 2016) Among the factors used to explain these differences the role of cultural norms

has been shown to play a crucial role (Fernaacutendez 2007 Fernaacutendez amp Fogli 2009 Alesina et al 2013 Farreacute amp Vella

2013 Fernaacutendez 2013) In particular scholars have begun employing the epidemiological approach to distinguish the

impact of cultural factors from that of other institutional forces (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011)1

When analyzing married couples both individual and spousal culture are thought to play an important role in

explaining labor market decisions Early collective models of labor supply emphasize the importance of taking into

account the distribution of bargaining power within the household (Chiappori et al 2002 Blundell et al 2007) Re-

cently however these models have been extended to investigate whether their main predictions depend not only on

bargaining power but also on the surrounding cultural context (Oreffice 2014) as well as on social and institutional

constraints related to traditional gender roles (Field et al 2016) These norms may influence women to allocate more

labor to household production relative to the formal market at either the intensive or extensive margin

This paper studies the impact of gender roles on the labor force participation of married female immigrants in

the US within a collective labor supply framework We explore culturally acquired gender roles as those embod-

ied by the presence of grammatical gender in language spoken Recent work has documented correlations between

linguistic structure and individual behavior (Lupyan amp Dale 2010 Chen 2013 Ladd et al 2015) with the presence

and intensity of gender in grammatical structures correlating with gender gaps in compensation and promotions divi-

sion of household labor educational attainment and political empowerment (Givati amp Troiano 2012 Santacreu-Vasut

et al 2013 Santacreu-Vasut et al 2014 Hicks et al 2015 Mavisakalyan 2015 van der Velde et al 2015 Davis amp

Reynolds 2016)

The mechanisms underlying these associations are largely unresolved Perhaps the most compelling question

whether language may causally influence behavior remains a subject of debate2 Linguistic correlations could reflect

historically acquired cultural norms of behavior that became codified in language or language itself could represent an

institutional force influencing and perpetuating a set of behaviors

From a methodological point of view because language is observed at the individual level it is possible to isolate

its effect from the influence of aggregate correlated historical cultural and biological forces This is an exercise

we undertake in this analysis following the epidemiological approach Furthermore because language acts a cultural

marker for gender roles we show that this barometer of norms allows us to examine labor market predictions of the

collective household model in the presence or absence of these cultural influences This yields new insights regarding

their relative influence

1Gay et al (2016) provides an in-depth discussion of the epidemiological approach in this context2Theoretically language structures could influence preference formation information processing categorization of social reality and the salienceof certain categories of words These impacts could alter a speakerrsquos decision making and behavior (Mavisakalyan amp Weber 2016)

2

To do this we examine labor market outcomes for nearly half a million married female immigrants to the US aged

25 to 49 in the American Community Surveys (ACS) between 2005 and 2015 In our analysis we focus on married

foreign born individuals who speak a language other than English in the home With the assistance of linguists and

drawing on information contained in the World Atlas of Linguistic Structures database (WALS) we assign to these

speakers consistent measures of the presence and intensity of gender in the grammatical structure of their language

There are several novel empirical advances to the approach taken in this paper First this analysis relies on an

incredibly detailed set of microeconomic data which provides a rich set of covariates and substantial variation to

control for many potential confounding factors These include many of the individual spousal and household level

variables analyzed in the past as well as measures which may influence the decision to migrate in the country of

origin and measures of the location to which the immigrant has moved Moreover the unique sample also allows us to

include country fixed effects to capture a wide array of unobservable cultural forces A second advantage is reliance

on the epidemiological approach in this setting which helps ameliorate a fundamental problem of identification when

examining the impact of language on behavior

The takeaways from this exercise are several First we show that married female immigrants speaking a language

with sex-based distinctions in its grammar are less likely to participate in the labor market This is true even after con-

trolling for observable characteristics such as traditional household measures husband characteristics and bargaining

power measures as well as when controlling for a vast set of unobservable cultural forces through country of origin

fixed effects3 When we empirically decompose the relationship between gender in language and gendered behavior

in this manner it suggests that roughly two thirds of this relationship can be explained by correlated cultural factors

with about one third potentially explained by language having a causal impact

Second using language as a measure of culturally acquired gender roles allows us to speak to and test the role

of bargaining power within the household relative to the impact of cultural norms We demonstrate that both are

important and that their impacts are distinct from one another

Third focusing on language spoken allows to also study the behavior of female immigrants in both linguistically

homogeneous and heterogeneous couples We exploit linguistic heterogeneity within the household to show that the

presence of gender in language has an association with a wifersquos behavior both when husbands and wives speak a

gendered language and when they speak languages with different structures Our findings suggest that while there

is an independent effect on a wifersquos behavior when she alone has a gender marked language gender marking in the

husbandrsquos language enhances this effect

Finally using language as a measure of gender roles but recognizing it as a network technology allows us to exam-

ine the role that ethnic enclaves play in influencing female labor force participation In theory enclaves may improve

labor market outcomes by providing information about formal jobs and reducing social stigma on employment At the

same time enclaves are likely to reinforce immigrant language usage and thus may enhance the impact of gender in

language or provide isolation from US norms We present evidence implying that the latter effect is present which

3One of the challenges of the literature on culture and economic behavior has been to measure culture An advantage of using language overexisting alternatives is that language is defined at the individual level and is not an outcome measure It allows us to control for country of origincharacteristics including time varying ones

3

suggests that the impact of language is stronger when it is shared with the surrounding community4

These findings speak to several literatures Mavisakalyan amp Weber (2016) review of the nascent field of linguistic

relativity and economics and point out that the mechanisms behind the associations between language and economic

behavior remain largely unresolved Despite the current studyrsquos ability to account for both time-invariant and time-

variant country of origin factors in a more rigorous manner than previous analyses the correlation between language

and behavior remains significant and negative This means that while we can demonstrate that most of this association

can be explained by language as a cultural market for correlated gender norms we cannot formally disprove that the

behavioral channel may explain some of the remaining correlation (Roberts amp Winters 2013 Hicks et al 2015 Roberts

et al 2015)5

Our research also directly contributes to literature that investigates whether the impact of intra-household bargain-

ing power on the labor supply depends on culture In particular our results imply that the standard prediction that

the spouse with higher bargaining power will substitute labor for leisure due to an income effect as in Oreffice (2014)

applies only to native born couples in the US and to immigrant couples with gender roles similar to the US Interest-

ingly our results confirm that in couples coming from countries classified as exhibiting strong traditional gender roles

the influence of bargaining power on household collective labor supply may be culturally dependent and that some of

these standard predictions may not hold for these groups

While Oreffice (2014) focuses on the intensive labor supply among couples where both spouses are working we

focus mainly on the extensive margin and how bargaining power influences female labor force participation Field

et al (2016) study the extensive margin of the labor supply among married female in India and model the impact

of bargaining power distribution within the household with social constraints related to traditional gender norms

Our findings complement these studies by empirically showing that the married female immigrants with stronger

bargaining power are more likely to work not less and that the impact of bargaining power is reinforced when

traditional gender roles are embodied in the language spoken

This paper is structured as follows Section 2 presents summary statistics for both the ACS and linguistic data

and details the empirical strategy Section 3 presents the empirical results - decomposing the impact of language from

other cultural influences presenting a wide range of robustness checks and then using gender in language as a cultural

marker to study labor supply within the household Section 4 concludes

2 Methodology

21 Data

211 Economic and demographic data

We combine data from several sources in our analysis First we obtain detailed demographic and economic data for

the immigrant population to the US from the American Community Surveys (ACS) 1 samples from 2005 to 2015

4This result is not solely attributable to selection since it holds for women married prior to migrationmdashsometimes referred to as ldquotied womenrdquo5See Everett (2013) for a review of the linguistics literature

4

(Ruggles et al 2015)

Table 1 Summary StatisticsFemale Immigrants Married Spouse Present Aged 25-49

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 377 66 25 49 480618 -10Years since immigration 148 93 0 50 480618 -00Age at immigration 228 89 0 49 480618 -10

Educational AttainmentCurrent student 006 024 0 1 480618 -002Years of schooling 124 37 0 17 480618 -13No schooling 005 021 0 1 480618 000Elementary 012 032 0 1 480618 010High school 037 048 0 1 480618 010College 046 050 0 1 480618 -019

Race and ethnicityAsian 029 046 0 1 480618 -065Black 002 014 0 1 480618 001White 047 050 0 1 480618 041Other 021 041 0 1 480618 024Hispanic 053 050 0 1 480618 063

Ability to speak EnglishNot at all 011 032 0 1 480618 008Not well 024 043 0 1 480618 002Well 025 043 0 1 480618 -008Very well 040 049 0 1 480618 -002

Labor market outcomesLabor participant 060 049 0 1 480618 -010Employed 055 050 0 1 480618 -012Yearly weeks worked (excl 0) 4416 132 7 51 302653 -16Yearly weeks worked (incl 0) 2718 239 0 51 480618 -58Weekly hours worked (excl 0) 3657 114 1 99 302653 -18Weekly hours worked (incl 0) 2251 199 0 99 480618 -51Labor income (thds) 253 285 00 5365 282057 -88

B Household characteristics

Years since married 124 75 0 39 352059 014Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 038Household size 426 162 2 20 480618 039Household income (thds) 610 596 -310 15303 480618 -208

Table 1 notes The summary statistics are computed using sample weights (PERWT) provided in theACS (Ruggles et al 2015) except those for household characteristics which are computed usinghousehold weights (HHWT) The last column reports the estimate β from regressions of the typeXi = α +β SBI+ ε where Xi is an individual level characteristic Robust standard errors are notreported See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

We restrict the analysis to female respondents aged 25 to 49 born abroad from non-American parents living in

married-couple family households in which the husband is present and who report speaking a language other than

English in the home Moreover we only keep respondents that report uniquely identifiable countries of birth and

5

languages and for which we have information on the grammatical structure of the language reported Appendix A11

provides an exhaustive list of these sample restrictions and details the precise process taken to construct the regression

sample It also provides precise definitions for all variables employed6

The main dependent variable used throughout the analysis is a measure of labor market engagement labor force

participation which is defined as an indicator equal to one if the respondent is either employed or actively looking for

a job7 In some specifications we also analyze other labor market outcomes such as yearly weeks worked and usual

weekly hours worked both including and excluding zeros to capture both the extensive and intensive margins of the

labor supply8

Table 1 presents summary statistics for the demographic and economic characteristics of the respondents in the

regression sample On average the typical married women in our sample is in her upper 30s immigrated around 15

years before and has over 12 years of education 60 of these respondents participate in the labor force with 55

reporting formal employment As can be seen from the last column of Table 1 there is a large difference in means be-

tween the labor market outcomes for sex-based speakers and non sex-based speakers with the former group exhibiting

far lower levels of economic engagement There is also sizeable variation in English proficiency as well as in racial

and ethnic composition for this sample Mean household income is just over $60000 and the average duration of the

current marriage is 12 years Many households have children with the mean being almost two Overall the popu-

lation studied contains rich variation with over 480000 adult female immigrants originating from 135 countries and

speaking 63 different languages Appendix Table C1 provides the distribution of languages spoken in the regression

sample and highlights the extensive variation in languages spoken by immigrants to the US Moreover while they are

not the primary groups of interest in this analysis we further provide similar summary statistics for the respondentsrsquo

husbands in Appendix Table C3 as well as various within-household gender gap measures in Appendix Table C4

both of which are used in subsequent analysis

212 Linguistic data

Next we follow Gay et al (2013) and Hicks et al (2015) and assign to each language measures that quantify the

presence and frequency of gender distinctions in its grammatical rules We construct these measures using information

compiled by linguists in the World Atlas of Language Structures (WALS Dryer amp Haspelmath 2011) We expand

the original WALS dataset in collaboration with linguists for several additional languages making the sample more

representative9

Our primary measure of gender in language is an indicator for whether a language employs a grammatical gender

6Formal checks suggest that the regression sample is not biased by the availability of the country of birth and the linguistic data Appendix TableC2 provides summary statistics comparable to those in Table 1 without dropping the respondents for which the country of birth or languageare not precisely identifiable which is about 678 of the uncorrected regression sample The last column of this table demonstrates that dataconstraintsmdashie needing to know language spoken and country of birthmdashdo not meaningfully bias the sample in any way along these observables

7This is the LABFORCE variable in the ACS (Ruggles et al 2015) See appendix A21 for more details8The yearly weeks worked correspond to the WKSWORK2 variable in the ACS (Ruggles et al 2015) Because this variable is given in intervals wedefine yearly weeks worked as the midpoint of those intervals The usual weekly hours worked correspond to the UHRSWORK variable in the ACS(Ruggles et al 2015) Appendix A21 provides additional details

9The languages for which some variables were compiled are detailed in appendix B For robustness we have also verified that the exclusion of ournewly assigned languages in favor of the original WALS set of languages does not alter the main findings of the analysis

6

system based on biological sex (SB) We also investigate gender distinctions in other features of the grammatical

structure For instance languages with only a male and a female gender force speakers to make more sex-based

distinctions than those which include a neuter gender NG is an indicator variable equal to one for languages with

exactly two genders and equal to zero otherwise Similarly there is heterogeneity across languages in the presence

and quantity of gendered personal pronouns This feature is given by the variable GP which captures rules related to

gender agreement with pronouns Finally some languages assign gender due to semantic reasons only while others

assign gender due to both semantic and formal reasons making gender more recurrent in the latter case This feature

is given by the variable GA which captures the rules for gender assignment10

Finally we employ a measure of grammatical gender intensity similar to the one built by Gay et al (2013) This

measure captures how many of the above features are present in a language It is defined as

Intensity= SBtimes (GP+GA+NG)

where Intensity is a categorical variable that ranges from 0 to 311 The Intensity measure allows us to capture

the ranking of intensity of female male distinctions in the grammatical rules as follows If a language has a sex-

based gender system its intensity score is the sum of its scores on the three other gender features Hence a strictly

positive score captures languages that have strong female male distinctions Table 2 provides summary statistics for

the language variables contained in the regression sample

Table 2 Summary Statistics Language VariablesFemale Immigrants Married Spouse Present Aged 25-49

Definition Mean Sd Min Max Obs

SB Sex-based 083 037 0 1 480618NG Number of genders 072 045 0 1 480618GA Gender assignment 076 043 0 1 480618GP Gender pronouns 057 050 0 1 478883

Intensity (GP + GA + NG) times SB 204 121 0 3 478883

Table 2 notes The summary statistics are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015)

There is substantial linguistic heterogeneity in terms of grammatical gender in the sample We demonstrate the

robustness of our main findings across all of these measures with additional checks presented in Appendix Table C7

22 Empirical Strategy

Our empirical strategy follows the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011

Blau et al 2011 Blau amp Kahn 2015) This approach compares outcomes across immigrants with varying geograph-

ical origins but living in a common institutional legal and social environment thereby allowing to separate cultural

10More detailed definitions of these individual measures can be found in Hicks et al (2015)11This measure should not be taken as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar The

discussion of the measure is extended in Appendix discussion of Table C7

7

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

To do this we examine labor market outcomes for nearly half a million married female immigrants to the US aged

25 to 49 in the American Community Surveys (ACS) between 2005 and 2015 In our analysis we focus on married

foreign born individuals who speak a language other than English in the home With the assistance of linguists and

drawing on information contained in the World Atlas of Linguistic Structures database (WALS) we assign to these

speakers consistent measures of the presence and intensity of gender in the grammatical structure of their language

There are several novel empirical advances to the approach taken in this paper First this analysis relies on an

incredibly detailed set of microeconomic data which provides a rich set of covariates and substantial variation to

control for many potential confounding factors These include many of the individual spousal and household level

variables analyzed in the past as well as measures which may influence the decision to migrate in the country of

origin and measures of the location to which the immigrant has moved Moreover the unique sample also allows us to

include country fixed effects to capture a wide array of unobservable cultural forces A second advantage is reliance

on the epidemiological approach in this setting which helps ameliorate a fundamental problem of identification when

examining the impact of language on behavior

The takeaways from this exercise are several First we show that married female immigrants speaking a language

with sex-based distinctions in its grammar are less likely to participate in the labor market This is true even after con-

trolling for observable characteristics such as traditional household measures husband characteristics and bargaining

power measures as well as when controlling for a vast set of unobservable cultural forces through country of origin

fixed effects3 When we empirically decompose the relationship between gender in language and gendered behavior

in this manner it suggests that roughly two thirds of this relationship can be explained by correlated cultural factors

with about one third potentially explained by language having a causal impact

Second using language as a measure of culturally acquired gender roles allows us to speak to and test the role

of bargaining power within the household relative to the impact of cultural norms We demonstrate that both are

important and that their impacts are distinct from one another

Third focusing on language spoken allows to also study the behavior of female immigrants in both linguistically

homogeneous and heterogeneous couples We exploit linguistic heterogeneity within the household to show that the

presence of gender in language has an association with a wifersquos behavior both when husbands and wives speak a

gendered language and when they speak languages with different structures Our findings suggest that while there

is an independent effect on a wifersquos behavior when she alone has a gender marked language gender marking in the

husbandrsquos language enhances this effect

Finally using language as a measure of gender roles but recognizing it as a network technology allows us to exam-

ine the role that ethnic enclaves play in influencing female labor force participation In theory enclaves may improve

labor market outcomes by providing information about formal jobs and reducing social stigma on employment At the

same time enclaves are likely to reinforce immigrant language usage and thus may enhance the impact of gender in

language or provide isolation from US norms We present evidence implying that the latter effect is present which

3One of the challenges of the literature on culture and economic behavior has been to measure culture An advantage of using language overexisting alternatives is that language is defined at the individual level and is not an outcome measure It allows us to control for country of origincharacteristics including time varying ones

3

suggests that the impact of language is stronger when it is shared with the surrounding community4

These findings speak to several literatures Mavisakalyan amp Weber (2016) review of the nascent field of linguistic

relativity and economics and point out that the mechanisms behind the associations between language and economic

behavior remain largely unresolved Despite the current studyrsquos ability to account for both time-invariant and time-

variant country of origin factors in a more rigorous manner than previous analyses the correlation between language

and behavior remains significant and negative This means that while we can demonstrate that most of this association

can be explained by language as a cultural market for correlated gender norms we cannot formally disprove that the

behavioral channel may explain some of the remaining correlation (Roberts amp Winters 2013 Hicks et al 2015 Roberts

et al 2015)5

Our research also directly contributes to literature that investigates whether the impact of intra-household bargain-

ing power on the labor supply depends on culture In particular our results imply that the standard prediction that

the spouse with higher bargaining power will substitute labor for leisure due to an income effect as in Oreffice (2014)

applies only to native born couples in the US and to immigrant couples with gender roles similar to the US Interest-

ingly our results confirm that in couples coming from countries classified as exhibiting strong traditional gender roles

the influence of bargaining power on household collective labor supply may be culturally dependent and that some of

these standard predictions may not hold for these groups

While Oreffice (2014) focuses on the intensive labor supply among couples where both spouses are working we

focus mainly on the extensive margin and how bargaining power influences female labor force participation Field

et al (2016) study the extensive margin of the labor supply among married female in India and model the impact

of bargaining power distribution within the household with social constraints related to traditional gender norms

Our findings complement these studies by empirically showing that the married female immigrants with stronger

bargaining power are more likely to work not less and that the impact of bargaining power is reinforced when

traditional gender roles are embodied in the language spoken

This paper is structured as follows Section 2 presents summary statistics for both the ACS and linguistic data

and details the empirical strategy Section 3 presents the empirical results - decomposing the impact of language from

other cultural influences presenting a wide range of robustness checks and then using gender in language as a cultural

marker to study labor supply within the household Section 4 concludes

2 Methodology

21 Data

211 Economic and demographic data

We combine data from several sources in our analysis First we obtain detailed demographic and economic data for

the immigrant population to the US from the American Community Surveys (ACS) 1 samples from 2005 to 2015

4This result is not solely attributable to selection since it holds for women married prior to migrationmdashsometimes referred to as ldquotied womenrdquo5See Everett (2013) for a review of the linguistics literature

4

(Ruggles et al 2015)

Table 1 Summary StatisticsFemale Immigrants Married Spouse Present Aged 25-49

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 377 66 25 49 480618 -10Years since immigration 148 93 0 50 480618 -00Age at immigration 228 89 0 49 480618 -10

Educational AttainmentCurrent student 006 024 0 1 480618 -002Years of schooling 124 37 0 17 480618 -13No schooling 005 021 0 1 480618 000Elementary 012 032 0 1 480618 010High school 037 048 0 1 480618 010College 046 050 0 1 480618 -019

Race and ethnicityAsian 029 046 0 1 480618 -065Black 002 014 0 1 480618 001White 047 050 0 1 480618 041Other 021 041 0 1 480618 024Hispanic 053 050 0 1 480618 063

Ability to speak EnglishNot at all 011 032 0 1 480618 008Not well 024 043 0 1 480618 002Well 025 043 0 1 480618 -008Very well 040 049 0 1 480618 -002

Labor market outcomesLabor participant 060 049 0 1 480618 -010Employed 055 050 0 1 480618 -012Yearly weeks worked (excl 0) 4416 132 7 51 302653 -16Yearly weeks worked (incl 0) 2718 239 0 51 480618 -58Weekly hours worked (excl 0) 3657 114 1 99 302653 -18Weekly hours worked (incl 0) 2251 199 0 99 480618 -51Labor income (thds) 253 285 00 5365 282057 -88

B Household characteristics

Years since married 124 75 0 39 352059 014Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 038Household size 426 162 2 20 480618 039Household income (thds) 610 596 -310 15303 480618 -208

Table 1 notes The summary statistics are computed using sample weights (PERWT) provided in theACS (Ruggles et al 2015) except those for household characteristics which are computed usinghousehold weights (HHWT) The last column reports the estimate β from regressions of the typeXi = α +β SBI+ ε where Xi is an individual level characteristic Robust standard errors are notreported See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

We restrict the analysis to female respondents aged 25 to 49 born abroad from non-American parents living in

married-couple family households in which the husband is present and who report speaking a language other than

English in the home Moreover we only keep respondents that report uniquely identifiable countries of birth and

5

languages and for which we have information on the grammatical structure of the language reported Appendix A11

provides an exhaustive list of these sample restrictions and details the precise process taken to construct the regression

sample It also provides precise definitions for all variables employed6

The main dependent variable used throughout the analysis is a measure of labor market engagement labor force

participation which is defined as an indicator equal to one if the respondent is either employed or actively looking for

a job7 In some specifications we also analyze other labor market outcomes such as yearly weeks worked and usual

weekly hours worked both including and excluding zeros to capture both the extensive and intensive margins of the

labor supply8

Table 1 presents summary statistics for the demographic and economic characteristics of the respondents in the

regression sample On average the typical married women in our sample is in her upper 30s immigrated around 15

years before and has over 12 years of education 60 of these respondents participate in the labor force with 55

reporting formal employment As can be seen from the last column of Table 1 there is a large difference in means be-

tween the labor market outcomes for sex-based speakers and non sex-based speakers with the former group exhibiting

far lower levels of economic engagement There is also sizeable variation in English proficiency as well as in racial

and ethnic composition for this sample Mean household income is just over $60000 and the average duration of the

current marriage is 12 years Many households have children with the mean being almost two Overall the popu-

lation studied contains rich variation with over 480000 adult female immigrants originating from 135 countries and

speaking 63 different languages Appendix Table C1 provides the distribution of languages spoken in the regression

sample and highlights the extensive variation in languages spoken by immigrants to the US Moreover while they are

not the primary groups of interest in this analysis we further provide similar summary statistics for the respondentsrsquo

husbands in Appendix Table C3 as well as various within-household gender gap measures in Appendix Table C4

both of which are used in subsequent analysis

212 Linguistic data

Next we follow Gay et al (2013) and Hicks et al (2015) and assign to each language measures that quantify the

presence and frequency of gender distinctions in its grammatical rules We construct these measures using information

compiled by linguists in the World Atlas of Language Structures (WALS Dryer amp Haspelmath 2011) We expand

the original WALS dataset in collaboration with linguists for several additional languages making the sample more

representative9

Our primary measure of gender in language is an indicator for whether a language employs a grammatical gender

6Formal checks suggest that the regression sample is not biased by the availability of the country of birth and the linguistic data Appendix TableC2 provides summary statistics comparable to those in Table 1 without dropping the respondents for which the country of birth or languageare not precisely identifiable which is about 678 of the uncorrected regression sample The last column of this table demonstrates that dataconstraintsmdashie needing to know language spoken and country of birthmdashdo not meaningfully bias the sample in any way along these observables

7This is the LABFORCE variable in the ACS (Ruggles et al 2015) See appendix A21 for more details8The yearly weeks worked correspond to the WKSWORK2 variable in the ACS (Ruggles et al 2015) Because this variable is given in intervals wedefine yearly weeks worked as the midpoint of those intervals The usual weekly hours worked correspond to the UHRSWORK variable in the ACS(Ruggles et al 2015) Appendix A21 provides additional details

9The languages for which some variables were compiled are detailed in appendix B For robustness we have also verified that the exclusion of ournewly assigned languages in favor of the original WALS set of languages does not alter the main findings of the analysis

6

system based on biological sex (SB) We also investigate gender distinctions in other features of the grammatical

structure For instance languages with only a male and a female gender force speakers to make more sex-based

distinctions than those which include a neuter gender NG is an indicator variable equal to one for languages with

exactly two genders and equal to zero otherwise Similarly there is heterogeneity across languages in the presence

and quantity of gendered personal pronouns This feature is given by the variable GP which captures rules related to

gender agreement with pronouns Finally some languages assign gender due to semantic reasons only while others

assign gender due to both semantic and formal reasons making gender more recurrent in the latter case This feature

is given by the variable GA which captures the rules for gender assignment10

Finally we employ a measure of grammatical gender intensity similar to the one built by Gay et al (2013) This

measure captures how many of the above features are present in a language It is defined as

Intensity= SBtimes (GP+GA+NG)

where Intensity is a categorical variable that ranges from 0 to 311 The Intensity measure allows us to capture

the ranking of intensity of female male distinctions in the grammatical rules as follows If a language has a sex-

based gender system its intensity score is the sum of its scores on the three other gender features Hence a strictly

positive score captures languages that have strong female male distinctions Table 2 provides summary statistics for

the language variables contained in the regression sample

Table 2 Summary Statistics Language VariablesFemale Immigrants Married Spouse Present Aged 25-49

Definition Mean Sd Min Max Obs

SB Sex-based 083 037 0 1 480618NG Number of genders 072 045 0 1 480618GA Gender assignment 076 043 0 1 480618GP Gender pronouns 057 050 0 1 478883

Intensity (GP + GA + NG) times SB 204 121 0 3 478883

Table 2 notes The summary statistics are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015)

There is substantial linguistic heterogeneity in terms of grammatical gender in the sample We demonstrate the

robustness of our main findings across all of these measures with additional checks presented in Appendix Table C7

22 Empirical Strategy

Our empirical strategy follows the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011

Blau et al 2011 Blau amp Kahn 2015) This approach compares outcomes across immigrants with varying geograph-

ical origins but living in a common institutional legal and social environment thereby allowing to separate cultural

10More detailed definitions of these individual measures can be found in Hicks et al (2015)11This measure should not be taken as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar The

discussion of the measure is extended in Appendix discussion of Table C7

7

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

suggests that the impact of language is stronger when it is shared with the surrounding community4

These findings speak to several literatures Mavisakalyan amp Weber (2016) review of the nascent field of linguistic

relativity and economics and point out that the mechanisms behind the associations between language and economic

behavior remain largely unresolved Despite the current studyrsquos ability to account for both time-invariant and time-

variant country of origin factors in a more rigorous manner than previous analyses the correlation between language

and behavior remains significant and negative This means that while we can demonstrate that most of this association

can be explained by language as a cultural market for correlated gender norms we cannot formally disprove that the

behavioral channel may explain some of the remaining correlation (Roberts amp Winters 2013 Hicks et al 2015 Roberts

et al 2015)5

Our research also directly contributes to literature that investigates whether the impact of intra-household bargain-

ing power on the labor supply depends on culture In particular our results imply that the standard prediction that

the spouse with higher bargaining power will substitute labor for leisure due to an income effect as in Oreffice (2014)

applies only to native born couples in the US and to immigrant couples with gender roles similar to the US Interest-

ingly our results confirm that in couples coming from countries classified as exhibiting strong traditional gender roles

the influence of bargaining power on household collective labor supply may be culturally dependent and that some of

these standard predictions may not hold for these groups

While Oreffice (2014) focuses on the intensive labor supply among couples where both spouses are working we

focus mainly on the extensive margin and how bargaining power influences female labor force participation Field

et al (2016) study the extensive margin of the labor supply among married female in India and model the impact

of bargaining power distribution within the household with social constraints related to traditional gender norms

Our findings complement these studies by empirically showing that the married female immigrants with stronger

bargaining power are more likely to work not less and that the impact of bargaining power is reinforced when

traditional gender roles are embodied in the language spoken

This paper is structured as follows Section 2 presents summary statistics for both the ACS and linguistic data

and details the empirical strategy Section 3 presents the empirical results - decomposing the impact of language from

other cultural influences presenting a wide range of robustness checks and then using gender in language as a cultural

marker to study labor supply within the household Section 4 concludes

2 Methodology

21 Data

211 Economic and demographic data

We combine data from several sources in our analysis First we obtain detailed demographic and economic data for

the immigrant population to the US from the American Community Surveys (ACS) 1 samples from 2005 to 2015

4This result is not solely attributable to selection since it holds for women married prior to migrationmdashsometimes referred to as ldquotied womenrdquo5See Everett (2013) for a review of the linguistics literature

4

(Ruggles et al 2015)

Table 1 Summary StatisticsFemale Immigrants Married Spouse Present Aged 25-49

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 377 66 25 49 480618 -10Years since immigration 148 93 0 50 480618 -00Age at immigration 228 89 0 49 480618 -10

Educational AttainmentCurrent student 006 024 0 1 480618 -002Years of schooling 124 37 0 17 480618 -13No schooling 005 021 0 1 480618 000Elementary 012 032 0 1 480618 010High school 037 048 0 1 480618 010College 046 050 0 1 480618 -019

Race and ethnicityAsian 029 046 0 1 480618 -065Black 002 014 0 1 480618 001White 047 050 0 1 480618 041Other 021 041 0 1 480618 024Hispanic 053 050 0 1 480618 063

Ability to speak EnglishNot at all 011 032 0 1 480618 008Not well 024 043 0 1 480618 002Well 025 043 0 1 480618 -008Very well 040 049 0 1 480618 -002

Labor market outcomesLabor participant 060 049 0 1 480618 -010Employed 055 050 0 1 480618 -012Yearly weeks worked (excl 0) 4416 132 7 51 302653 -16Yearly weeks worked (incl 0) 2718 239 0 51 480618 -58Weekly hours worked (excl 0) 3657 114 1 99 302653 -18Weekly hours worked (incl 0) 2251 199 0 99 480618 -51Labor income (thds) 253 285 00 5365 282057 -88

B Household characteristics

Years since married 124 75 0 39 352059 014Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 038Household size 426 162 2 20 480618 039Household income (thds) 610 596 -310 15303 480618 -208

Table 1 notes The summary statistics are computed using sample weights (PERWT) provided in theACS (Ruggles et al 2015) except those for household characteristics which are computed usinghousehold weights (HHWT) The last column reports the estimate β from regressions of the typeXi = α +β SBI+ ε where Xi is an individual level characteristic Robust standard errors are notreported See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

We restrict the analysis to female respondents aged 25 to 49 born abroad from non-American parents living in

married-couple family households in which the husband is present and who report speaking a language other than

English in the home Moreover we only keep respondents that report uniquely identifiable countries of birth and

5

languages and for which we have information on the grammatical structure of the language reported Appendix A11

provides an exhaustive list of these sample restrictions and details the precise process taken to construct the regression

sample It also provides precise definitions for all variables employed6

The main dependent variable used throughout the analysis is a measure of labor market engagement labor force

participation which is defined as an indicator equal to one if the respondent is either employed or actively looking for

a job7 In some specifications we also analyze other labor market outcomes such as yearly weeks worked and usual

weekly hours worked both including and excluding zeros to capture both the extensive and intensive margins of the

labor supply8

Table 1 presents summary statistics for the demographic and economic characteristics of the respondents in the

regression sample On average the typical married women in our sample is in her upper 30s immigrated around 15

years before and has over 12 years of education 60 of these respondents participate in the labor force with 55

reporting formal employment As can be seen from the last column of Table 1 there is a large difference in means be-

tween the labor market outcomes for sex-based speakers and non sex-based speakers with the former group exhibiting

far lower levels of economic engagement There is also sizeable variation in English proficiency as well as in racial

and ethnic composition for this sample Mean household income is just over $60000 and the average duration of the

current marriage is 12 years Many households have children with the mean being almost two Overall the popu-

lation studied contains rich variation with over 480000 adult female immigrants originating from 135 countries and

speaking 63 different languages Appendix Table C1 provides the distribution of languages spoken in the regression

sample and highlights the extensive variation in languages spoken by immigrants to the US Moreover while they are

not the primary groups of interest in this analysis we further provide similar summary statistics for the respondentsrsquo

husbands in Appendix Table C3 as well as various within-household gender gap measures in Appendix Table C4

both of which are used in subsequent analysis

212 Linguistic data

Next we follow Gay et al (2013) and Hicks et al (2015) and assign to each language measures that quantify the

presence and frequency of gender distinctions in its grammatical rules We construct these measures using information

compiled by linguists in the World Atlas of Language Structures (WALS Dryer amp Haspelmath 2011) We expand

the original WALS dataset in collaboration with linguists for several additional languages making the sample more

representative9

Our primary measure of gender in language is an indicator for whether a language employs a grammatical gender

6Formal checks suggest that the regression sample is not biased by the availability of the country of birth and the linguistic data Appendix TableC2 provides summary statistics comparable to those in Table 1 without dropping the respondents for which the country of birth or languageare not precisely identifiable which is about 678 of the uncorrected regression sample The last column of this table demonstrates that dataconstraintsmdashie needing to know language spoken and country of birthmdashdo not meaningfully bias the sample in any way along these observables

7This is the LABFORCE variable in the ACS (Ruggles et al 2015) See appendix A21 for more details8The yearly weeks worked correspond to the WKSWORK2 variable in the ACS (Ruggles et al 2015) Because this variable is given in intervals wedefine yearly weeks worked as the midpoint of those intervals The usual weekly hours worked correspond to the UHRSWORK variable in the ACS(Ruggles et al 2015) Appendix A21 provides additional details

9The languages for which some variables were compiled are detailed in appendix B For robustness we have also verified that the exclusion of ournewly assigned languages in favor of the original WALS set of languages does not alter the main findings of the analysis

6

system based on biological sex (SB) We also investigate gender distinctions in other features of the grammatical

structure For instance languages with only a male and a female gender force speakers to make more sex-based

distinctions than those which include a neuter gender NG is an indicator variable equal to one for languages with

exactly two genders and equal to zero otherwise Similarly there is heterogeneity across languages in the presence

and quantity of gendered personal pronouns This feature is given by the variable GP which captures rules related to

gender agreement with pronouns Finally some languages assign gender due to semantic reasons only while others

assign gender due to both semantic and formal reasons making gender more recurrent in the latter case This feature

is given by the variable GA which captures the rules for gender assignment10

Finally we employ a measure of grammatical gender intensity similar to the one built by Gay et al (2013) This

measure captures how many of the above features are present in a language It is defined as

Intensity= SBtimes (GP+GA+NG)

where Intensity is a categorical variable that ranges from 0 to 311 The Intensity measure allows us to capture

the ranking of intensity of female male distinctions in the grammatical rules as follows If a language has a sex-

based gender system its intensity score is the sum of its scores on the three other gender features Hence a strictly

positive score captures languages that have strong female male distinctions Table 2 provides summary statistics for

the language variables contained in the regression sample

Table 2 Summary Statistics Language VariablesFemale Immigrants Married Spouse Present Aged 25-49

Definition Mean Sd Min Max Obs

SB Sex-based 083 037 0 1 480618NG Number of genders 072 045 0 1 480618GA Gender assignment 076 043 0 1 480618GP Gender pronouns 057 050 0 1 478883

Intensity (GP + GA + NG) times SB 204 121 0 3 478883

Table 2 notes The summary statistics are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015)

There is substantial linguistic heterogeneity in terms of grammatical gender in the sample We demonstrate the

robustness of our main findings across all of these measures with additional checks presented in Appendix Table C7

22 Empirical Strategy

Our empirical strategy follows the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011

Blau et al 2011 Blau amp Kahn 2015) This approach compares outcomes across immigrants with varying geograph-

ical origins but living in a common institutional legal and social environment thereby allowing to separate cultural

10More detailed definitions of these individual measures can be found in Hicks et al (2015)11This measure should not be taken as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar The

discussion of the measure is extended in Appendix discussion of Table C7

7

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

(Ruggles et al 2015)

Table 1 Summary StatisticsFemale Immigrants Married Spouse Present Aged 25-49

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 377 66 25 49 480618 -10Years since immigration 148 93 0 50 480618 -00Age at immigration 228 89 0 49 480618 -10

Educational AttainmentCurrent student 006 024 0 1 480618 -002Years of schooling 124 37 0 17 480618 -13No schooling 005 021 0 1 480618 000Elementary 012 032 0 1 480618 010High school 037 048 0 1 480618 010College 046 050 0 1 480618 -019

Race and ethnicityAsian 029 046 0 1 480618 -065Black 002 014 0 1 480618 001White 047 050 0 1 480618 041Other 021 041 0 1 480618 024Hispanic 053 050 0 1 480618 063

Ability to speak EnglishNot at all 011 032 0 1 480618 008Not well 024 043 0 1 480618 002Well 025 043 0 1 480618 -008Very well 040 049 0 1 480618 -002

Labor market outcomesLabor participant 060 049 0 1 480618 -010Employed 055 050 0 1 480618 -012Yearly weeks worked (excl 0) 4416 132 7 51 302653 -16Yearly weeks worked (incl 0) 2718 239 0 51 480618 -58Weekly hours worked (excl 0) 3657 114 1 99 302653 -18Weekly hours worked (incl 0) 2251 199 0 99 480618 -51Labor income (thds) 253 285 00 5365 282057 -88

B Household characteristics

Years since married 124 75 0 39 352059 014Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 038Household size 426 162 2 20 480618 039Household income (thds) 610 596 -310 15303 480618 -208

Table 1 notes The summary statistics are computed using sample weights (PERWT) provided in theACS (Ruggles et al 2015) except those for household characteristics which are computed usinghousehold weights (HHWT) The last column reports the estimate β from regressions of the typeXi = α +β SBI+ ε where Xi is an individual level characteristic Robust standard errors are notreported See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

We restrict the analysis to female respondents aged 25 to 49 born abroad from non-American parents living in

married-couple family households in which the husband is present and who report speaking a language other than

English in the home Moreover we only keep respondents that report uniquely identifiable countries of birth and

5

languages and for which we have information on the grammatical structure of the language reported Appendix A11

provides an exhaustive list of these sample restrictions and details the precise process taken to construct the regression

sample It also provides precise definitions for all variables employed6

The main dependent variable used throughout the analysis is a measure of labor market engagement labor force

participation which is defined as an indicator equal to one if the respondent is either employed or actively looking for

a job7 In some specifications we also analyze other labor market outcomes such as yearly weeks worked and usual

weekly hours worked both including and excluding zeros to capture both the extensive and intensive margins of the

labor supply8

Table 1 presents summary statistics for the demographic and economic characteristics of the respondents in the

regression sample On average the typical married women in our sample is in her upper 30s immigrated around 15

years before and has over 12 years of education 60 of these respondents participate in the labor force with 55

reporting formal employment As can be seen from the last column of Table 1 there is a large difference in means be-

tween the labor market outcomes for sex-based speakers and non sex-based speakers with the former group exhibiting

far lower levels of economic engagement There is also sizeable variation in English proficiency as well as in racial

and ethnic composition for this sample Mean household income is just over $60000 and the average duration of the

current marriage is 12 years Many households have children with the mean being almost two Overall the popu-

lation studied contains rich variation with over 480000 adult female immigrants originating from 135 countries and

speaking 63 different languages Appendix Table C1 provides the distribution of languages spoken in the regression

sample and highlights the extensive variation in languages spoken by immigrants to the US Moreover while they are

not the primary groups of interest in this analysis we further provide similar summary statistics for the respondentsrsquo

husbands in Appendix Table C3 as well as various within-household gender gap measures in Appendix Table C4

both of which are used in subsequent analysis

212 Linguistic data

Next we follow Gay et al (2013) and Hicks et al (2015) and assign to each language measures that quantify the

presence and frequency of gender distinctions in its grammatical rules We construct these measures using information

compiled by linguists in the World Atlas of Language Structures (WALS Dryer amp Haspelmath 2011) We expand

the original WALS dataset in collaboration with linguists for several additional languages making the sample more

representative9

Our primary measure of gender in language is an indicator for whether a language employs a grammatical gender

6Formal checks suggest that the regression sample is not biased by the availability of the country of birth and the linguistic data Appendix TableC2 provides summary statistics comparable to those in Table 1 without dropping the respondents for which the country of birth or languageare not precisely identifiable which is about 678 of the uncorrected regression sample The last column of this table demonstrates that dataconstraintsmdashie needing to know language spoken and country of birthmdashdo not meaningfully bias the sample in any way along these observables

7This is the LABFORCE variable in the ACS (Ruggles et al 2015) See appendix A21 for more details8The yearly weeks worked correspond to the WKSWORK2 variable in the ACS (Ruggles et al 2015) Because this variable is given in intervals wedefine yearly weeks worked as the midpoint of those intervals The usual weekly hours worked correspond to the UHRSWORK variable in the ACS(Ruggles et al 2015) Appendix A21 provides additional details

9The languages for which some variables were compiled are detailed in appendix B For robustness we have also verified that the exclusion of ournewly assigned languages in favor of the original WALS set of languages does not alter the main findings of the analysis

6

system based on biological sex (SB) We also investigate gender distinctions in other features of the grammatical

structure For instance languages with only a male and a female gender force speakers to make more sex-based

distinctions than those which include a neuter gender NG is an indicator variable equal to one for languages with

exactly two genders and equal to zero otherwise Similarly there is heterogeneity across languages in the presence

and quantity of gendered personal pronouns This feature is given by the variable GP which captures rules related to

gender agreement with pronouns Finally some languages assign gender due to semantic reasons only while others

assign gender due to both semantic and formal reasons making gender more recurrent in the latter case This feature

is given by the variable GA which captures the rules for gender assignment10

Finally we employ a measure of grammatical gender intensity similar to the one built by Gay et al (2013) This

measure captures how many of the above features are present in a language It is defined as

Intensity= SBtimes (GP+GA+NG)

where Intensity is a categorical variable that ranges from 0 to 311 The Intensity measure allows us to capture

the ranking of intensity of female male distinctions in the grammatical rules as follows If a language has a sex-

based gender system its intensity score is the sum of its scores on the three other gender features Hence a strictly

positive score captures languages that have strong female male distinctions Table 2 provides summary statistics for

the language variables contained in the regression sample

Table 2 Summary Statistics Language VariablesFemale Immigrants Married Spouse Present Aged 25-49

Definition Mean Sd Min Max Obs

SB Sex-based 083 037 0 1 480618NG Number of genders 072 045 0 1 480618GA Gender assignment 076 043 0 1 480618GP Gender pronouns 057 050 0 1 478883

Intensity (GP + GA + NG) times SB 204 121 0 3 478883

Table 2 notes The summary statistics are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015)

There is substantial linguistic heterogeneity in terms of grammatical gender in the sample We demonstrate the

robustness of our main findings across all of these measures with additional checks presented in Appendix Table C7

22 Empirical Strategy

Our empirical strategy follows the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011

Blau et al 2011 Blau amp Kahn 2015) This approach compares outcomes across immigrants with varying geograph-

ical origins but living in a common institutional legal and social environment thereby allowing to separate cultural

10More detailed definitions of these individual measures can be found in Hicks et al (2015)11This measure should not be taken as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar The

discussion of the measure is extended in Appendix discussion of Table C7

7

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

languages and for which we have information on the grammatical structure of the language reported Appendix A11

provides an exhaustive list of these sample restrictions and details the precise process taken to construct the regression

sample It also provides precise definitions for all variables employed6

The main dependent variable used throughout the analysis is a measure of labor market engagement labor force

participation which is defined as an indicator equal to one if the respondent is either employed or actively looking for

a job7 In some specifications we also analyze other labor market outcomes such as yearly weeks worked and usual

weekly hours worked both including and excluding zeros to capture both the extensive and intensive margins of the

labor supply8

Table 1 presents summary statistics for the demographic and economic characteristics of the respondents in the

regression sample On average the typical married women in our sample is in her upper 30s immigrated around 15

years before and has over 12 years of education 60 of these respondents participate in the labor force with 55

reporting formal employment As can be seen from the last column of Table 1 there is a large difference in means be-

tween the labor market outcomes for sex-based speakers and non sex-based speakers with the former group exhibiting

far lower levels of economic engagement There is also sizeable variation in English proficiency as well as in racial

and ethnic composition for this sample Mean household income is just over $60000 and the average duration of the

current marriage is 12 years Many households have children with the mean being almost two Overall the popu-

lation studied contains rich variation with over 480000 adult female immigrants originating from 135 countries and

speaking 63 different languages Appendix Table C1 provides the distribution of languages spoken in the regression

sample and highlights the extensive variation in languages spoken by immigrants to the US Moreover while they are

not the primary groups of interest in this analysis we further provide similar summary statistics for the respondentsrsquo

husbands in Appendix Table C3 as well as various within-household gender gap measures in Appendix Table C4

both of which are used in subsequent analysis

212 Linguistic data

Next we follow Gay et al (2013) and Hicks et al (2015) and assign to each language measures that quantify the

presence and frequency of gender distinctions in its grammatical rules We construct these measures using information

compiled by linguists in the World Atlas of Language Structures (WALS Dryer amp Haspelmath 2011) We expand

the original WALS dataset in collaboration with linguists for several additional languages making the sample more

representative9

Our primary measure of gender in language is an indicator for whether a language employs a grammatical gender

6Formal checks suggest that the regression sample is not biased by the availability of the country of birth and the linguistic data Appendix TableC2 provides summary statistics comparable to those in Table 1 without dropping the respondents for which the country of birth or languageare not precisely identifiable which is about 678 of the uncorrected regression sample The last column of this table demonstrates that dataconstraintsmdashie needing to know language spoken and country of birthmdashdo not meaningfully bias the sample in any way along these observables

7This is the LABFORCE variable in the ACS (Ruggles et al 2015) See appendix A21 for more details8The yearly weeks worked correspond to the WKSWORK2 variable in the ACS (Ruggles et al 2015) Because this variable is given in intervals wedefine yearly weeks worked as the midpoint of those intervals The usual weekly hours worked correspond to the UHRSWORK variable in the ACS(Ruggles et al 2015) Appendix A21 provides additional details

9The languages for which some variables were compiled are detailed in appendix B For robustness we have also verified that the exclusion of ournewly assigned languages in favor of the original WALS set of languages does not alter the main findings of the analysis

6

system based on biological sex (SB) We also investigate gender distinctions in other features of the grammatical

structure For instance languages with only a male and a female gender force speakers to make more sex-based

distinctions than those which include a neuter gender NG is an indicator variable equal to one for languages with

exactly two genders and equal to zero otherwise Similarly there is heterogeneity across languages in the presence

and quantity of gendered personal pronouns This feature is given by the variable GP which captures rules related to

gender agreement with pronouns Finally some languages assign gender due to semantic reasons only while others

assign gender due to both semantic and formal reasons making gender more recurrent in the latter case This feature

is given by the variable GA which captures the rules for gender assignment10

Finally we employ a measure of grammatical gender intensity similar to the one built by Gay et al (2013) This

measure captures how many of the above features are present in a language It is defined as

Intensity= SBtimes (GP+GA+NG)

where Intensity is a categorical variable that ranges from 0 to 311 The Intensity measure allows us to capture

the ranking of intensity of female male distinctions in the grammatical rules as follows If a language has a sex-

based gender system its intensity score is the sum of its scores on the three other gender features Hence a strictly

positive score captures languages that have strong female male distinctions Table 2 provides summary statistics for

the language variables contained in the regression sample

Table 2 Summary Statistics Language VariablesFemale Immigrants Married Spouse Present Aged 25-49

Definition Mean Sd Min Max Obs

SB Sex-based 083 037 0 1 480618NG Number of genders 072 045 0 1 480618GA Gender assignment 076 043 0 1 480618GP Gender pronouns 057 050 0 1 478883

Intensity (GP + GA + NG) times SB 204 121 0 3 478883

Table 2 notes The summary statistics are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015)

There is substantial linguistic heterogeneity in terms of grammatical gender in the sample We demonstrate the

robustness of our main findings across all of these measures with additional checks presented in Appendix Table C7

22 Empirical Strategy

Our empirical strategy follows the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011

Blau et al 2011 Blau amp Kahn 2015) This approach compares outcomes across immigrants with varying geograph-

ical origins but living in a common institutional legal and social environment thereby allowing to separate cultural

10More detailed definitions of these individual measures can be found in Hicks et al (2015)11This measure should not be taken as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar The

discussion of the measure is extended in Appendix discussion of Table C7

7

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

system based on biological sex (SB) We also investigate gender distinctions in other features of the grammatical

structure For instance languages with only a male and a female gender force speakers to make more sex-based

distinctions than those which include a neuter gender NG is an indicator variable equal to one for languages with

exactly two genders and equal to zero otherwise Similarly there is heterogeneity across languages in the presence

and quantity of gendered personal pronouns This feature is given by the variable GP which captures rules related to

gender agreement with pronouns Finally some languages assign gender due to semantic reasons only while others

assign gender due to both semantic and formal reasons making gender more recurrent in the latter case This feature

is given by the variable GA which captures the rules for gender assignment10

Finally we employ a measure of grammatical gender intensity similar to the one built by Gay et al (2013) This

measure captures how many of the above features are present in a language It is defined as

Intensity= SBtimes (GP+GA+NG)

where Intensity is a categorical variable that ranges from 0 to 311 The Intensity measure allows us to capture

the ranking of intensity of female male distinctions in the grammatical rules as follows If a language has a sex-

based gender system its intensity score is the sum of its scores on the three other gender features Hence a strictly

positive score captures languages that have strong female male distinctions Table 2 provides summary statistics for

the language variables contained in the regression sample

Table 2 Summary Statistics Language VariablesFemale Immigrants Married Spouse Present Aged 25-49

Definition Mean Sd Min Max Obs

SB Sex-based 083 037 0 1 480618NG Number of genders 072 045 0 1 480618GA Gender assignment 076 043 0 1 480618GP Gender pronouns 057 050 0 1 478883

Intensity (GP + GA + NG) times SB 204 121 0 3 478883

Table 2 notes The summary statistics are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015)

There is substantial linguistic heterogeneity in terms of grammatical gender in the sample We demonstrate the

robustness of our main findings across all of these measures with additional checks presented in Appendix Table C7

22 Empirical Strategy

Our empirical strategy follows the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011

Blau et al 2011 Blau amp Kahn 2015) This approach compares outcomes across immigrants with varying geograph-

ical origins but living in a common institutional legal and social environment thereby allowing to separate cultural

10More detailed definitions of these individual measures can be found in Hicks et al (2015)11This measure should not be taken as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar The

discussion of the measure is extended in Appendix discussion of Table C7

7

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

influences acquired prior to migration from confounding institutional forces In the baseline specifications we include

a set of controls that are common in analyzing decision making in the collective household framework Moreover to

help isolate the role of language from other cultural forces we include fixed effects by country of birth of the respon-

dents These fixed effects allow us to obtain identification off heterogeneity in the structure of languages spoken across

immigrants from the same country of birth We generate our core results by estimating the following specification

Yi jlcst = α +β SBil +γγγprimeXi j +δδδ

primeWc +ηηηprimeSs +θθθ

primeZt + εi jlcst (1)

where Yi jlcst is a measure of labor market participation Subscript i indexes respondents l languages spoken j

households c countries of birth s states of residence t ACS survey years Xi j corresponds to the characteristics

of respondent i in household j This vector contains the following variables age age squared five-years age group

indicators race indicators a Hispanic indicator age at immigration to the US years since in the US a student

indicator years of schooling level of English proficiency decade of immigration indicators number of children aged

less than 5 years old in the household and household size Wc corresponds to a vector of indicators for the respondentrsquos

country of birth c Ss corresponds to a vector of indicators for the respondentrsquos state of residence s and Zt corresponds

to a vector of indicators for the ACS survey year t Appendices A22 and A23 provide details on how these variables

are constructed

To help the interpretation of regression coefficients we additionally report in all regression tables the average of

the outcome variable for the relevant sample Moreover to facilitate the comparison of magnitudes across coefficients

from a given regression we also report the coefficients when standardizing continuous variables between zero and

onemdashwe keep indicator variables unstandardized for ease of interpretation

3 Empirical results

31 Decomposing language from other cultural influences

311 Main results

Table 3 contains our baseline empirical analysis In the first column we examine the naiumlve association between gender

in language and labor force participation These results are naiumlve in the sense that we are not yet accounting for any

confounding factors and simply represent the difference in means across groups The raw correlation implies that

married immigrant women who speak a language that has a sex-based gender structure are 10 percentage points less

likely to participate in the formal labor market This is about 17 of the average labor force participation rate for the

full regression sample

Moving across columns in the table the analysis includes an increasingly stringent set of controls for both ob-

servable and unobservable factors which may impact a womanrsquos decision to participate in the labor market The

specification reported in column (2) controls for the individual characteristics of the vector Xi j described in Section

22 After this inclusion the impact of the SB variable remains statistically significant and negative while its mag-

8

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table 3 Gender in Language and Economic Participation Individual LevelFemale Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6) (7)

Sex-based -0101 -0069 -0045 -0027 -0024 -0021[0002] [0003] [0004] [0007] [0007] [0008]

β -coef -0045

COB LFP 0002 0002[0000] [0000]

β -coef 0041 0039

Past migrants LFP 0005 0005[0000] [0000]

β -coef 0063 0062

Respondent charAge No Yes Yes Yes Yes Yes YesRace and ethnicity No Yes Yes Yes Yes Yes YesEducation No Yes Yes Yes Yes Yes YesYears since immig No Yes Yes Yes Yes Yes YesAge at immig No Yes Yes Yes Yes Yes YesDecade of immig No Yes Yes Yes Yes Yes YesEnglish prof No Yes Yes Yes Yes Yes Yes

Household charChildren aged lt 5 No Yes Yes Yes Yes Yes YesHousehold size No Yes Yes Yes Yes Yes YesState No Yes Yes Yes Yes Yes Yes

COB charGeography No No Yes Yes No No NoFertility No No Yes Yes No No NoMigration rate No No Yes Yes No No NoEducation No No Yes Yes No No NoGDP per capita No No Yes Yes No No NoCommon language No No Yes Yes No No NoGenetic distance No No Yes Yes No No No

COB FE No No No No Yes Yes Yes

COB times decade mig FE No No No No No Yes Yes

County FE No No No No No No Yes

Observations 480618 480618 451048 451048 480618 480618 382556R2 0006 0110 0129 0129 0132 0138 0145

Mean 060 060 060 060 060 060 060SB residual variance 0150 0098 0049 0013 0012 0012

Table 3 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) β -coefdenotes the coefficients from using standardized values of continuous variables in the regressions COB denotes country ofbirth LFP denotes female labor participation and FE denotes fixed effects See appendix A for details on variable sourcesand definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

nitude decreases This suggests that part of the correlation between language and labor force participation emanates

from the influence of individual characteristics correlated with both language structure and behavior The decline in

the magnitude of the coefficient is largely driven by the inclusion of controls for race not by the addition of the other

9

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

respondent or household characteristics12 Note that while we only report the coefficient of interest the estimates on

the other variables have the expected sign and magnitude For instance more educated respondents are more likely to

be in the labor force and those with more children aged less than five years old are less likely to be in the labor force

The coefficients for these variables are shown in Appendix Table C6

Table 3 also reports a measure of residual variance in the SB variable in the last row This measure gives a sense

of the variation left in the independent variable that is used in the identification of the coefficients after its correlation

with other regressors has been accounted for For instance while the initial variance in the SB variable is 0150mdashsee

column (1)mdash adding the rich set of household and respondent characteristics in column (2) removes roughly one third

of its variation

As the historical development of languages was intertwined with cultural and biological forces the observed

associations in column (2) could reflect the impact of language the influence of environmental gender norms acquired

prior to migration through other channels or both As a first step in disentangling these potential channels column

(3) controls for the average female labor participation rate in the respondentsrsquo country of birth This variable has

been the most widely used proxy to capture labor-related gender norms in an immigrantrsquos country of birth by the

epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Fernaacutendez 2011 Blau amp Kahn 2015)13 To ensure

the consistency of this measure across time and countries we use the ratio of female to male labor participation rates

rather than the raw female labor participation rates Moreover we assign the value of this variable at the time of

immigration of the respondent to better capture the conditions in which she formed her preferences regarding gender

roles We also control for the average labor participation rate of married female immigrants to the US from the

respondentrsquos country of birth a decade before the time of her migration We construct this variable using the US

censuses from 1940 to 2000 and the ACS from 2010 to 2014 (Ruggles et al 2015) This approach allows us to address

potential forms of selection regarding the culture of the country of origin of respondents that may be different from the

culture of the average citizen since immigrants are a selected pool It also allows us to capture some degree of oblique

cultural transmission by looking at whether the behavior of same country immigrants that arrived to the US a decade

earlier than the respondent could play an independent role To account for the influence of historical contact across

populations and the development of gender norms among groups we also control for a measure of genetic distance

from the US (Spolaore amp Wacziarg 2009 2016)14 We further control for various country-level characteristics such

as GDP per capita total fertility rate an indicator for whether the country of birth shares a common language with

12Indeed as can be seen in the last column of Table 1 Asian immigrants are more likely to speak a non sex-based language Since they have onaverage higher levels of labor force participation than other respondents this explains part of the decline in magnitude

13To capture cultural variation in gender roles existing studies have proxied culture with female outcomes in the country of origin For instanceFernaacutendez amp Fogli (2009) use for country of origin female labor force participation to capture the culture of second generation immigrants to theUS Blau amp Kahn (2015) additionally control for individual labor force participation prior to migration to separate culture from social capitalOreffice (2014) create an index of gender roles in the country of origin as a function of several gender outcomes This literature posits thatimmigrants carry with them some of the attitudes from their home country to the US and in the case of second generation immigrants thatimmigrants transmit some of these attitudes to their children

14Alternatively we checked the robustness of our results to measures of linguistic distance between languages from Adsera amp Pytlikova (2015)including a Linguistic proximity index constructed using data from Ethnologue and a Levenshtein distance measure developed by the Max PlanckInstitute for Evolutionary Anthropology This data covers 42 languages out of the 63 in our regression sample Even with a reduced sample size(435899 observations instead of 480619 observations) the results from the baseline regressionmdashthe regression from Table 3 column (5) withcountry of birth fixed effectsmdashare essentially unchanged Because of such a lower coverage of languages the use of linguistic distance as a controlwould entail we do not include these measures throughout the analysis

10

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

those spoken by at least 9 of the population in the USmdashEnglish and Spanishmdash years of schooling in the country

of birth and various geographic measuresmdashlatitude longitude and bilateral distance to the US15 It is important to

include these factors since they can control for some omitted variables that correlate with country of origin gender

norms

We find that a one standard deviation increase in female labor force participation rates in onersquos country of birth

(18) is associated with a 4 percentage points increase in onersquos labor force participation Our results imply a larger

magnitude for the correlation with the labor force participation of married female immigrants that migrated a decade

prior to the respondent wherein a one standard deviation increase in their average labor participation rate (12) is

associated with a 5 percentage points increase in labor force participation These results largely confirm previous

findings in the literature using the epidemiological approach to culture (Fernaacutendez amp Fogli 2009 Blau amp Kahn 2015)

In column (4) we add the SB measure to compare the magnitude of its impact on individual behavior to that of

the usual country-level proxies used in the literature Including these variables altogether imposes a very stringent

test on the data To see this refer again to the bottom of Table 3 which reports the residual variation in SB used for

identification This metric is cut in half as we move from column (2) to column (4) implying a correlation between

language structure and the usual country-level proxies of gender roles used in the literature16 In spite of this while

the coefficient on sex-based language diminishes somewhat in magnitude it remains highly statistically significant

sizeable and negative even after adding these controls This suggests that language structure may capture unobservable

cultural characteristics at the country-level beyond the proxies used in the literature This is promising as it shows that

language likely captures some cultural features not previously uncovered

Nevertheless while column (4) controls for an exhaustive list of country of birth characteristics as well as language

structure there may still be some unobservable cultural components that vary systematically across countries and that

can be captured neither by language nor by country of birth controls Since immigrants from the same country may

speak languages with varying structures our analysis can uniquely address this issue by further including a set of

country of birth fixed effects and exploiting within country variation in the structure of language spoken in the pool of

immigrants in the US Because these fixed effects absorb the impact of all time invariant factors at the country level

this means that identification of any language effect relies on heterogeneity in the structure of language spoken within

a pool of immigrants from the same country of origin The addition of 134 country of birth fixed effects noticeably

reduce the potential sources of identification as the residual variance in SB is only one tenth of its original value Yet

again the impact of language remains highly significant and economically meaningful Gender assigned females are

27 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This

suggests that while the cultural components common to all immigrants from the same origin country carried in the

structure of language drive a large part the results compared to the estimates in column (2) about one third of the

effect can still be attributed to either more local components of culture or to alternative channels such as a cognitive

mechanism through which language would impact behavioral outcomes directly15See Appendix A26 for more details about the sources and the construction of the country-level variables See also appendix Table C5 which

reports summary statistics for these variables16This is not surprising given the cross-country results in Gay et al (2013) which show that country-level female labor participation rates are

correlated with the linguistic structure of the majority language

11

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Note that because identification now comes from within country variation in the structure of languages spoken

the estimates while gaining in credibility could be decreasing in representativeness For instance they no longer

provide information about the impact of language structure on the working behavior of female immigrants that are

from linguistically homogeneous countries To better understand the extent to which each country of birth contributes

to the identification we adapt Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance in the

SB variable for each country of birth in the sample Not surprisingly we find that immigrants born in linguistically

heterogeneous countries are the prime contributors in building the estimate In fact empirical identification in the full

fixed effect regressions comes from counties such as India the Philippines Vietnam China Afghanistan and Canada

Appendix Figure D1 makes this point clearer by mapping these regression weights

Despite the fact that culture is a slow-moving institution it can evolve and language may itself constrain or

facilitate cultural evolution in certain directions We check that this does not impact our results by adding country of

birth fixed effects interacted with decade of migration in column (6) This allows us to capture to some extent country-

level cultural aspects that could be time variant The results are largely unchanged by this addition suggesting that

language structure captures permanent aspects that operate at the individual level

Finally note that all the regressions reported in Table 3 include state of residence fixed effects to account for

the possibility that location choices are endogenous to the language spoken by the community of immigrants that

the respondent belongs to However this phenomenon could operate at a lower geographic level than that of the

state Therefore we include county of residence fixed effects in column (7) so that we can effectively compare female

immigrants that reside in the same county but speak a language with a varying structure Unfortunately the ACS do not

systematically provide respondentsrsquo county of residence so that we are only able to include 80 of the respondents in

the original regression samplemdashsee Appendix A23 for more details As a results the regression coefficient in column

(7) is not fully comparable with the results in other specifications Nevertheless the magnitude and significance of the

estimate on the SB variable remains largely similar to the one in column (6) suggesting that if there is selection into

location it does not drive our main results

Overall our findings strongly suggest that while sex-based distinctions in language are deeply rooted in historical

cultural forces gender in language appears to retain a distinct association with gender in behavior which is independent

of these other factors

312 Robustness checks

Alternative language variables and labor market outcomes This section presents an extensive list of robustness

checks of the main results in Table 3 column (5)mdasha specification that includes country of birth fixed effects First

Table 4 reports the results when using alternative measures of female labor participation together with alternative

language variables

Columns (1) and (2) of Table 4 show the results for two measures of the extensive margin labor force participation

and an indicator for being employed respectively We obtain quantitatively and statistically comparable results for

12

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table 4 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 All Language Variables

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Sex-based

SB -0027 -0028 -101 -0813 -024 -0165[0007] [0007] [034] [0297] [024] [0229]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel B Number of genders

NG -0018 -0023 -100 -0708 -051 -0252[0005] [0005] [022] [0193] [016] [0136]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0154 0138 0056 0034

Panel C Gender assignment

GA -0011 -0017 -069 -0468 -051 -0335[0005] [0005] [025] [0219] [018] [0155]

Observations 480618 480618 480618 480618 302653 302653R2 0132 0135 0153 0138 0056 0034

Panel D Gender pronouns

GP -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Intensity

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table 4 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

both measures when using our main language variable SB which confirms our previous analysis Regarding the

intensive margin we also use the number of weeks worked and hours worked as dependent variables including zeros

in columns (3) and (4) and excluding zeros in columns (5) and (6) This provides various measures for whether

language influences not only whether women work but also how much The impact of language is stronger for the

extensive margin than for the intensive margin For instance although the impact of language is not precisely estimated

when the dependent variable is the number of weeks worked excluding zeros in column (5) the magnitude is very small

13

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

as female immigrants with a gender-based language work on average one and a half day less per year This corresponds

to only half a percent of the sample mean This suggests that language acts as a vehicle for traditional gender roles

that tend to ascribe women to the household and to exclude them from the labor market Once they overcome such

roles by participating in the labor force the impact of language remains although it is weaker This is potentially due

to gender norms that unevenly distribute the burden of household tasks even among couples where both partners work

decreasing the labor supply of those female workers (Hicks et al 2015)

In panels B C and D of Table 4 we replicate our analysis of panel A with each of the individual measures

of gender marking discussed in Section 21 In all cases we obtain consistent results married female immigrants

speaking a language with gender distinctions are less likely to work and conditional on working they are doing so

less intensivelymdashalthough the magnitude of the impact of language on the intensive margin is smaller than that on

the extensive margin Finally panel E reports the results when using the composite index Intensity described in

section 21 The results in column (1) show that in comparison to those speaking a gender marked language with

the lowest intensity (Intensity = 0) female immigrants speaking a language with the highest gender intensity

(Intensity = 3) are 3 percentage points less likely to be in the labor force The results are similar when using

alternative outcome variables Moreover the estimates are more precisely estimated than with the indicator variables

for language structures In Appendix Table C7 we show that the results in panel E are not sensitive to the specification

of the Intensity measure as the results hold with four alternative specifications of the index

Alternative samples Appendix table C8 explores the robustness of our main results from column (5) of Table 3 to

alternative samples17 We use a wider age window (15-59) for the sample in column (2) Results are similar to the

baseline suggesting that education and retirement decisions are not impacted by language in a systematic direction

In column (3) we check that identification is not driven by peculiar migrants by restricting the sample to respondents

speaking a language that is indigenous to their country of birth where we define a language as not indigenous to a

country if it is not listed as a principal language spoken in a country in the Encyclopedia Britannica Book of the Year

(2010 pp 766-770) We also checked that the results are not driven by outliers and robust to excluding respondents

from countries with less than 100 observationsmdashthis is the case for 1064 respondentsmdashand respondents speaking

a language that is spoken by less than 100 observationsmdashthis is the case for 86 respondents The results are again

similar We run the baseline specification on other subsamples as well we include English speaking immigrants in

column (4) exclude Mexican immigrants in column (5) include all types of households in column (6) and exclude

languages that have been imputed as indicated by the quality flag QULANGUAG in the ACS (Ruggles et al 2015) The

results are robust to these alternative samples

Alternative functional forms We also undertook robustness checks concerning the empirical specification In par-

ticular we replicate in Appendix Table C9 the main results in columns (1) and (5) of Table 3 using both a probit and

logit model The marginal estimates evaluated at the mean are remarkably consistent with the estimates obtained via

17See Appendix A12 for more details on how we constructed these subsamples

14

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

the OLS linear probability models We take this as evidence that the functional form is not critical18

Respondents husbandsrsquo labor supply Another important robustness check concerns the impact of language on

the labor supply of the respondentsrsquo husbands It is important to rule out the possibility that we observe the same

effects than for the female respondents namely that sex-based language speaking husbands are less likely to engage

in the labor force as well This would indicate that our results are spurious and unrelated to traditional gender norms

To verify this Appendix Table C10 replicates columns (1) (2) and (5) of Table 3 with the labor supply of the

respondentsrsquo husbands as the dependent variable and with their characteristics as controls So that the sample is

qualitatively similar to the one used in the baseline analysis we exclude native husbands as well as English speaking

husbands We find a significant positive association between husbandsrsquo SB language variable and their labor force

participation for the specifications without husband country of birth fixed effects This suggests that our language

variable captures traditional gender roles in that it leads couples to a traditional division of labor where wives stay

home and husbands work Yet once we control for husband country of birth fixed effects the association is no longer

statistically significant suggesting that the influence of this cultural trait is larger for women Overall this analysis

reassures us that our results are not driven by some correlated factor which would lead speakers of sex-based language

to decrease their labor market engagement regardless of their own gender

Heterogeneity across marital status Finally we also explore potential heterogeneity in the impact of language

structure across marital status Although married women represent 83 of the original uncorrected sample it is worth

analyzing whether the impact of language is similar for unmarried women In Appendix Table C11 we replicate

column (5) of Table 3 with an additional indicator variable for unmarried respondents as well as the interaction of this

indicator with the SB variable Depending on the specification it reveals that unmarried women are 8 to 14 percentage

points more likely to participate in the labor force compared to married women Second sex-based speakers are

less likely than their counterparts to be in the labor force when they are married but the reverse is true when they

are unmarried while married women speaking a sex-based language are 5 percentage points less likely to be in the

labor force than their counterparts unmarried women speaking a sex-based language are 8 percentage points more

likely to work than their counterparts When paired with the findings for single women in Hicks et al (2015) this

result suggests that some forms of gender roles may be ldquodormantrdquo when unmarriedmdashthe pressure to not work to raise

children to provide household goodsmdashand that these forces may activate for married women but not be present for

unmarried women Other gender norms and choices such as deciding how much time to devote to household chores

such as cleaning may be established earlier in life and may appear even in unmarried households (Hicks et al 2015)

32 Language and the household

Given the failure of the unitary model of household decision-making standard theory has developed frameworks where

bargaining is key to explaining household behavior (Chiappori et al 2002 Blundell et al 2007) In this section we

18We maintain the OLS throughout the paper however as it is computationally too intensive to run these models with the inclusion of hundreds offixed effects in most of our specifications

15

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

analyze whether the impact of language is mediated by household characteristics in the following ways First house-

hold bargaining power may mediate or influence the impact of language For example females with high bargaining

power may not be bound by the gender roles that languages can embody To shed light on the mechanism behind the

association between language and female labor supply we analyze in section 321 the impact of the distribution of

bargaining power within the household Second we investigate in section 322 the impact of the language spoken by

the husband While the majority of immigrant households in the data are linguistically homogeneous roughly 20

are not This variation allows us to analyze the relative role of the language structures of both spouses potentially

shedding additional light on whether language use within the household matters Since marriage is more likely among

individuals who share the same language we need to rule out the possibility that our results overestimate the impact of

language via selection effects This could be the case if marriages into linguistically homogeneous languages reflected

attachment to onersquos own culture To deal with such selection issues we compare the behavior of these two types of

households and exploit information on whether marriage predates migration

321 Language and household bargaining power

In this section we examine the impact of language taking into account bargaining power characteristics within the

household Throughout we restrict the sample to households where both spouses speak the same language We

consider these households to avoid conflating other potential language effects This exercise complements recent

theoretical advances by analyzing whether the impact of bargaining power in the household is culturally dependent

In particular the collective model of labor supply predicts that women with lower bargaining power work more while

those with higher bargaining power work less since they are able to substitute leisure for work At the same time these

model typically do not consider social norms This may be problematic as Field et al (2016) show that including

social norms into a collective labor supply model can lead to the opposite prediction namely that women with lower

bargaining power work less19

For the sake of comparison we report in column (1) of Table 5 the results when replicating column (5) of Table

3 with this alternative sample of linguistically homogeneous households Women speaking a sex-based language are

31 percentage points less likely to participate in the labor force than their non-gender assigned counterparts This is

slightly larger in magnitude than in the unrestricted sample where the effect is of 27 percentage points

To capture bargaining power within the household we follow Oreffice (2014) and control for the age gap as well

as the non-labor income gap between spouses20 In column (2) we exclude the language variable to derive a baseline

when using these new controls These regressions include controls for husbands characteristics similar to those of the

respondents used in Table 321 Consistent with previous work we find that the larger the age gap and the larger the

19Field et al (2016) implement a field experiment in India where traditional gender norms bound women away from the labor market and investigatehow a change in their bargaining power stemming from an increase in their control over their earnings can allow them to free themselves from thetraditional gender norms

20We focus on the non-labor income gap because it is relatively less endogenous than the labor income gap (Lundberg et al 1997) We do notinclude the education gap for the same reason Appendix Table C4 presents descriptive statistics for various gender gap measures within thehousehold Other variables such as physical attributes have been shown to influence female labor supply (Oreffice amp Quintana-Domeque 2012)Unfortunately they are not available in the ACS (Ruggles et al 2015)

21Appendix Table C3 presents descriptive statistics for these husband characteristics

16

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table 5 Language Structure and Household BargainingSame Language Married Couples (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0031 -0027 -0022 -0021 -0023[0008] [0008] [0009] [0013] [0009]

β -coef -0031 -0027 -0022 -0021 -0022

Age gap -0003 -0003 -0006 0003 -0007[0001] [0001] [0001] [0002] [0001]

β -coef -0018 -0018 -0034 0013 -0035

Non-labor income gap -0001 -0001 -0001 -0001 -0001[0000] [0000] [0000] [0000] [0000]

β -coef -0021 -0021 -0021 -0018 -0015

Age gap times SB 0000[0000]

β -coef 0002

Non-labor income gap times SB -0000[0000]

β -coef -0008

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Husband char No Yes Yes Yes Yes YesHusband COB FE No No No Yes Yes Yes

Sample mar before mig No No No No Yes No

Observations 409017 409017 409017 409017 154983 409017R2 0134 0145 0145 0147 0162 0147

Mean 059 059 059 059 055 059SB residual variance 0011 0011 0010 0012 0013

Table 5 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

non-labor income gap the less likely women are to participate in the labor force a one standard deviation increase

in the age gap is associated with a reduction in the wifersquos labor force participation of 18 percentage point Similarly

a one standard deviation increase in the non-labor income gap is associated with a reduction in the wifersquos labor force

participation of 21 percentage points These results corroborate Oreffice (2014) and Field et al (2016) which show

that the impact of bargaining power is different in households with traditional gender roles relative to the US and

that female immigrants with low bargaining power are less likely to participate in the labor market22

In column (3) we add the SB variable The coefficient is largely unchanged compared to column (1) as women

speaking a sex-based language are 27 percentage points less likely to participate in the labor force than their non-

gender assigned counterparts This suggests that language has an impact that is not mediated by the distribution of

22This also consistent with findings in Hicks et al (2015) wherein the division of household labor was shown to be heavily skewed against femalesin households coming from countries with a dominantly sex-based language suggesting that languages could potentially constrain females to atraditional role within the household

17

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

bargaining power within the household Furthermore the impact of language is of comparable magnitude as that of

either the non-labor income gap or the age gap suggesting that cultural forces are equally important than bargaining

measures in determining female labor forces participation

In 16 of households where both spouses speak the same language spouses were born in different countries

To account for any impact this may have on the estimates we add husband country of birth fixed effects in column

(4) Reassuringly the results are largely unchanged by this addition Similarly some of the observed effect may

result from selection into same culture marriages To assess whether such selection effect drives the results we run

the specification of column (4) on the subsample of spouses that married before migration Since female immigrants

married to their husbands prior to migration (ldquotied womenrdquo) may have different motivations this subsample should

provide a window into whether we should worry about selection among couples after migration Note that because this

information is only available in the ACS after 2008 the resulting estimate is not fully comparable to others in Table

5 Nevertheless the magnitude of the estimate on the language variable remains largely unchangedmdashalthough the

dramatic reducation in the sample size reduces our statistical power The impact of non-labor income gap also remains

roughly similar while the age gap coefficient becomes insignificant and positive Overall these results suggest that

our main findings are not driven by selection into linguistically homogeneous marriages

Again motivated by Oreffice (2014) and Field et al (2016) we investigate the extent to which gender roles as

embodied by gender in language are reinforced in households where the wife has weak bargaining power or vice

versa In column (6) of Table 5 we add interaction terms between the SB language variable and the non-labor income

gap as well as the age gap After this addition the estimate on the language measure remains identical to the one in

column (4) suggesting that language has a direct effect that is not completely mediated through bargaining power

The only significant interaction is between the language variable and the non-labor income gap In particular a one

standard deviation increase in the size of the non-labor income gap leads to an additional decrease of 08 percentage

points in the labor participation of women speaking a sex-based language compared to others that does not While

the effect is arguably small in magnitude it does confirm Field et al (2016) insofar as married females with low

bargaining power are more likely to be bound by traditional gender roles and less likely to participate in the labor

market Appendix Table C12 replicates the analysis carried out in column (6) alternative measures of labor market

engagement As in previous analyses the results are similar for the extensive margin but less clear for the intensive

margin

322 Evidence from linguistic heterogeneity within the household

So far we have shown that the impact of gender in language on gender norms regarding labor market participation

was robust to controlling for husband characteristics as well as husband country of origin fixed effects In this section

we analyze the role of gender norms embodied in a husbandrsquos language on his wifersquos labor participation Indeed

Fernaacutendez amp Fogli (2009) find that gender roles in the husbandrsquos country of origin characteristics play an important

role in determining the working behavior of his wife

In Table 6 we pool together all households and compare households where both spouses speak a sex-based

18

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

language to households where only one speaker does so paying attention to whether this speaker is the husband or the

wife23 We also consider husbands that speak English Because English speaking husbands may provide assimilation

services to their wives we include an indicator for English speaking husbands to sort out the impact of the language

structure from these assimilation services

Table 6 Linguistic Heterogeneity in the HouseholdImmigrant Households (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Same times wifersquos sex-based -0068 -0069 -0070 -0040 -0015[0003] [0003] [0003] [0005] [0008]

Different times wifersquos sex-based -0043 -0044 -0015 0006[0014] [0014] [0015] [0016]

Different times husbandrsquos sex-based -0031 -0032 -0008 0001[0011] [0011] [0012] [0011]

Respondent char Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes

Respondent COB char No No No Yes NoRespondent COB FE No No No No Yes

Observations 387037 387037 387037 364383 387037R2 0122 0122 0122 0143 0146

Mean 059 059 059 059 059SB residual variance 0095 0095 0095 0043 0012

Table 6 notes The estimates are computed using sample weights (perwt) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level

In column (1) of Table 6 the wifersquos language variable SB is interacted with two indicators that capture whether a

wife and her spouse both speak a language that has the same grammatical structure24 Women speaking a sex-based

language in a household with a husband speaking a sex-based language as well are almost 7 percentage points less

likely to be in the labor force Conversely women speaking a sex-based language in a household with a husband

speaking a non sex-based language are only 43 percentage points less likely to be part of the labor force To further

explore the role of the husbandsrsquo languages we run in column (2) the same specification as in column (1) except that

we use the husbandrsquos language variable rather than the wifersquos The results are very close to those in column (1) but

suggest that while a husbandrsquos gender norms matter for a wifersquos behavior they seem to play a slightly weaker albeit

still significant role Column (3) includes both spousesrsquo language variables Overall this suggests that the impact

of language structure is stronger when the languages of both spouses are sex-based Furthermore when speaking

23Throughout the table we sequentially add female country of birth characteristics and female country of birth fixed effects We do not add therespondentrsquos husband country of birth fixed characteristics because only 20 of the sample features husbands and wives speaking differentstructure languages In that case our analysis does not have sufficient statistical power precisely identify the coefficients While many non-English speaking couples share the same language in the household we focus on the role of gender marking to learn about the impact of husbandsand wives gender norms as embodied in the structure of their language

24Instead of simply comparing households where husbands and wives speak the same language versus those without this approach allows us bothto include households with English speakers and to understand whether the observed mechanisms operate through grammatical gender

19

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

languages with different gender structures the impact of the respondentrsquos language is bigger than the impact of the

husbandrsquos language suggesting cultural spillovers within the household Finally columns (4) and (5) of Table 6 add

country of birth characteristics and country of birth fixed effects respectively The impact of the husbandrsquos language

remains significant in linguistically homogeneous households but not in heterogeneous ones Nevertheless it seems

that part of this loss in precision is due to the steep decline in statistical power as testified by the decline in the residual

variance in SB25

33 Language and social interactions

Immigrants tend to cluster in ethnic enclaves partly because doing so allows them to access a network were infor-

mation is exchanged (Edin et al 2003 Munshi 2003) Within those networks speaking a common language may

facilitate such exchanges Furthermore language itself is a network technology which value increases with the num-

ber of speakers Being able to communicate within a dense ethnic network may be particularly important for female

immigrants to share information communicate about job opportunities and reduce information asymmetries between

job seekers and potential employers This effect may encourage female labor force participation At the same time

sharing the same linguistic and cultural background within a dense ethnic network may reinforce the social norms that

the language act as a vehicle for This is even more so if increasing language use makes gender categories more salient

or if sharing the same cultural background makes social norms bind more strongly As a result female immigrants

living in an ethnic enclave may face a trade-off between on the one hand increased job opportunities through informal

network channels and on the other hand increased peer pressure to conform with social norms This second effect

may itself depend on the extent to which the ethnic group social norms are biased in favor of traditional gender roles

that encourage women to stay out of the labor force

This potential trade-off guides the empirical analysis presented in Table 7 In what follows we rely on the subsam-

ple of immigrants residing in the counties that are identifiable in the ACS This corresponds to roughly 80 percent of

the initial sample26 To capture the impact of social interactions we build two measures of local ethnic and linguistic

network density Density COB is the ratio of the immigrant population born in the respondentrsquos country of birth and

residing in the respondentrsquos county to the total number of immigrants residing in the respondentrsquos county Similarly

Density language is the ratio of the immigrant population speaking the respondentrsquos language and residing in the

respondentrsquos county to the total number of immigrants residing in the respondentrsquos county27

Throughout Table 7 we control for respondent characteristics and household characteristics In the full specifica-

tion we also control for respondent country of birth fixed effects In column (1) of Table 7 we exclude the language

variable and include the density of the respondentrsquos country of birth network The density of the network has a neg-

ative impact on the respondentrsquos labor participation suggesting that peer pressure from immigrants coming from the

25In all cases however when a wife speaks a sex-based language she exhibits on average lower female labor force participation26See Appendix A23 for more details on the number of identifiable counties in the ACS Although estimates on this subsample may not be

comparable with those obtained with the full sample the results column (7) of Table 3 being so similar to those in column (6) gives us confidencethat selection into county does not drive the results

27In both measures we use the total number of immigrants that are in the workforce because it is more relevant for networking and reducinginformation asymmetries regarding labor market opportunities See Appendix A27 for more details

20

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

same country of origin to comply with social norms may be stronger than improvements in access to job opportuni-

ties In column (2) we add both the language variable and the interaction term between language and the network

density measure The magnitude of the coefficients imply that language and network densities play different roles that

reinforce each other In particular the results suggest that peer pressure to comply with social norms may be stronger

among female immigrants that speak a sex-based language as the interaction term is negative Also the impact of

the respondentrsquos network is now strongly positive suggesting that on its own living in a county with a dense ethnic

network does increase labor force participation once cultural factors are taken into account Finally the coefficient

on the language variable is still negative and significant and has the same magnitude as in all other specifications In

columns (3) and (4) we repeat the same exercise using instead the network density in language spoken as criteria to

define ethnic density The results and interpretation are broadly similar to the previous ones Finally we add both

types of network density measures together in columns (5) and (6) The results suggest that ethnic networks defined

using country of birth matter more than networks defined purely along linguistic lines

Table 7 Language and Social InteractionsRespondents In Identifiable Counties (2005-2015)

Dependent variable Labor force participation

(1) (2) (3) (4) (5) (6)

Sex-based -0023 -0037 -0026 -0016[0003] [0003] [0003] [0008]

β -coef -0225 -0245 -0197 -0057

Density COB -0001 0009 0009 0003[0000] [0000] [0001] [0001]

β -coef -0020 0223 0221 0067

Density COB times SB -0009 -0010 -0002[0000] [0001] [0001]

β -coef -0243 -0254 -0046

Density language 0000 0007 -0000 -0000[0000] [0000] [0001] [0001]

β -coef 0006 0203 -0001 -0007

Density language times SB -0007 0001 -0000[0000] [0001] [0001]

β -coef -0201 0038 -0004

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE No No No No No Yes

Observations 382556 382556 382556 382556 382556 382556R2 0110 0113 0110 0112 0114 0134

Mean 060 060 060 060 060 060SB residual variance 0101 0101 0101 0013

Table 7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

21

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

4 Conclusion

This paper contributes to the existing literature on the relationship between grammatical features of language and

economic behavior by examining the behavior of immigrants who travel with acquired cultural baggage including

their language While no quasi-experimental study is likely to rule out all potential sources of endogeneity our data

driven fixed effects epidemiological analysis advances the existing frontier in the economic analysis of language and

provides suggestive evidence that the study of language deserves further attention

We provide support for the nascent strand of literature in which languages serve not only to reflect but also

possibly to reinforce and transmit culture In particular our quantitative exercise isolates the fraction of this association

attributable to gendered language from the portion associated with other cultural and gender norms correlated with

language We find that about two thirds of the correlation between language and labor market outcomes can be

attributed to the latter while at most one third can be attributed to the direct impact of language structure or other

time-variant cultural forces not captured by traditional observables or by the wide array of additional checks we

employ Whether by altering preference formation or by perpetuating inefficient social norms language and other

social constructs clearly have the potential to hinder economic development and stymie progress of gender equality

We frame our analysis within a collective household labor supply model and demonstrate that language has a direct

effect that is not strongly influenced by either husband characteristics or the distribution of bargaining power within

the household This suggests that language and more broadly acquired gender norms should be considered in their

own right in analyses of female labor force participation In this regard language is especially promising since it

allows researchers to study a cultural trait which is observable quantifiable and varies at the individual level

Furthermore we show that the labor market associations with language are larger in magnitude than some factors

traditionally considered to capture bargaining power in line with Oreffice (2014) who finds that culture can mediate

the relationship between bargaining power and the labor supply Indeed our findings regarding the impact of language

in linguistically homogeneous and heterogeneous households suggests that the impact of onersquos own language is the

most robust predictor of behavior although the spousersquos language is also associated with a partnerrsquos decision making

Finally recognizing that language is a network technology allows us to examine the role that ethnic enclaves play

in influencing female labor force participation In theory enclaves may improve labor market outcomes by providing

information about formal jobs and reducing social stigma on employment At the same time enclaves are likely to

provide isolation from US norms and to reinforce gender norms that languages capture enhancing the impact of

gender in language We present evidence suggesting that the latter effect is present Explaining the role of language

within social networks therefore may shed new light on how networks may impose not only benefits but also costs on

its members by reinforcing cultural norms

Our results have important implications for policy Specifically programs designed to promote female labor force

participation and immigrant assimilation could be more appropriately designed and targeted by recognizing the exis-

tence of stronger gender norms among subsets of speakers Future research may consider experimental approaches to

further analyze the impact of language on behavior and in particular to better understand the policy implications of

movements for gender neutrality in language Another interesting avenue for research might be to study the impact of

22

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

gendered grammatical features in a marriage market framework as in Grossbard (2015) which studies intermarriage

among immigrants along linguistic lines

References

Adsera A amp Pytlikova M (2015) lsquoThe role of language in shaping international migrationrsquo The Economic Journal

125(586) F49ndashF81

Alesina A Giuliano P amp Nunn N (2013) lsquoOn the origins of gender roles women and the ploughrsquo The Quarterly

Journal of Economics 128(2) pp 469ndash530

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Blau F D amp Kahn L M (2015) lsquoSubstitution between individual and source country characteristics Social capital

culture and us labor market outcomes among immigrant womenrsquo Journal of Human Capital 9(4) pp 439ndash482

Blau F Kahn L amp Papps K (2011) lsquoGender source country characteristics and labor market assimilation among

immigrantsrsquo The Review of Economics and Statistics 93(1) pp 43ndash58

Blundell R Chiappori P-A Magnac T amp Meghir C (2007) lsquoCollective labour supply Heterogeneity and non-

participationrsquo The Review of Economic Studies 74(2) pp 417ndash445

Chen M K (2013) lsquoThe effect of language on economic behavior Evidence from savings rates health behaviors

and retirement assetsrsquo American Economic Review 103(2) pp 690ndash731

Chiappori P-A Fortin B amp Lacroix G (2002) lsquoMarriage market divorce legislation and household labor supplyrsquo

Journal of Political Economy 110(1) pp 37ndash72

Davis L amp Reynolds M (2016) Gendered language and the educational gender gap Technical report

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Edin P-A Fredriksson P amp Aringslund O (2003) lsquoEthnic enclaves and the economic success of immigrantsmdashevidence

from a natural experimentrsquo The Quarterly Journal of Economics 118(1) 329ndash357

Everett C (2013) Linguistic relativity Evidence across languages and cognitive domains Vol 25 Walter de Gruyter

Farreacute L amp Vella F (2013) lsquoThe intergenerational transmission of gender role attitudes and its implications for female

labour force participationrsquo Economica 80(318) pp 219ndash247

Fernaacutendez R (2007) lsquoAlfred marshall lecture women work and culturersquo Journal of the European Economic Asso-

ciation 5(2-3) pp 305ndash332

23

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Fernaacutendez R (2011) Chapter 11 - does culture matter Vol 1 of Handbook of social economics pp 481ndash510

Fernaacutendez R (2013) lsquoCultural change as learning The evolution of female labor force participation over a centuryrsquo

The American Economic Review 103(1) pp 472ndash500

Fernaacutendez R amp Fogli A (2009) lsquoCulture An empirical investigation of beliefs work and fertilityrsquo American

Economic Journal Macroeconomics 1(1) pp 146ndash177

Field E Pande R Rigol N Schaner S amp Moore C T (2016) On her account Can strengthening womenrsquos

financial control boost female labor supply Technical report

Gay V Hicks D amp Santacreu-Vasut E (2016) Migration as a window into the coevolution between language

and behavior in S Roberts C Cuskley L McCrohon L Barceloacute-Coblijn O Feheacuter amp T Verhoef eds lsquoThe

evolution of language Proceedings of the 11th international conference (EVOLANGX11)rsquo Online at http

evolangorgneworleanspapers120html

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Givati Y amp Troiano U (2012) lsquoLaw economics and culture Theory of mandated benefits and evidence from

maternity leave policiesrsquo Journal of Law and Economics 55(2) pp 339ndash364

Goldin C (2014) lsquoA grand gender convergence Its last chapterrsquo The American Economic Review 104(4) pp 1091ndash

1119

Grossbard S (2015) The marriage motive A price theory of marriage How marriage markets affect employment

consumption and savings Springer

Hicks D L Santacreu-Vasut E amp Shoham A (2015) lsquoDoes mother tongue make for womenrsquos work linguistics

household labor and gender identityrsquo Journal of Economic Behavior amp Organization 110 pp 19ndash44

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Ladd D R Roberts S G amp Dediu D (2015) lsquoCorrelational studies in typological and historical linguisticsrsquo Annu

Rev Linguist 1(1) pp 221ndash241

Lundberg S J Pollak R A amp Wales T J (1997) lsquoDo husbands and wives pool their resources evidence from the

united kingdom child benefitrsquo The Journal of Human Resources 323 pp 463ndash480

Lupyan G amp Dale R (2010) lsquoLanguage structure is partly determined by social structurersquo PloS one 5(1)

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mavisakalyan A amp Weber C (2016) Linguistic relativity and economics Technical report

24

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Munshi K (2003) lsquoNetworks in the modern economy Mexican migrants in the us labor marketrsquo The Quarterly

Journal of Economics 118(2) 549ndash599

Oreffice S (2014) lsquoCulture and household decision making balance of power and labor supply choices of us-born

and foreign-born couplesrsquo Journal of Labor Research 35(2) pp 162ndash184

Oreffice S amp Quintana-Domeque C (2012) lsquoFat spouses and hours of work are body and pareto weights corre-

latedrsquo IZA Journal of Labor Economics 1(1) 6

Roberts S G Winters J amp Chen K (2015) lsquoFuture tense and economic decisions controlling for cultural evolu-

tionrsquo PloS one 10(7)

Roberts S amp Winters J (2013) lsquoLinguistic diversity and traffic accidents Lessons from statistical studies of cultural

traitsrsquo PloS one 8(8)

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Santacreu-Vasut E Shenkar O amp Shoham A (2014) lsquoLinguistic gender marking and its international business

ramificationsrsquo Journal of International Business Studies 45(9) pp 1170ndash1178

Santacreu-Vasut E Shoham A amp Gay V (2013) lsquoDo femalemale distinctions in language matter evidence from

gender political quotasrsquo Applied Economics Letters 20(5) pp 495ndash498

Spolaore E amp Wacziarg R (2009) lsquoThe diffusion of developmentrsquo The Quarterly Journal of Economics 124(2) pp

469ndash529

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

van der Velde L Tyrowicz J amp Siwinska J (2015) lsquoLanguage and (the estimates of) the gender wage gaprsquo Eco-

nomics Letters 136 pp 165ndash170

25

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Decomposing culture An analysis of gender language and

labor supply in the household

Appendix

Victor Gaylowast Danliel L Hicksdagger Estefania Santacreu-VasutDagger Amir Shohamsect

A Data appendix 2

A1 Samples 2

A11 Regression sample 2

A12 Alternative samples 4

A2 Variables 5

A21 Outcome variables 5

A22 Respondent control variables 6

A23 Household control variables 8

A24 Husband control variables 8

A25 Bargaining power measures 9

A26 Country of birth characteristics 9

A27 County-level variables 12

B Missing language structures 12

C Appendix tables 13

D Appendix figure 25

lowastDepartment of Economics The University of Chicago 5757 S University Ave Chicago IL 60637 USAdaggerDepartment of Economics University of Oklahoma 308 Cate Center Drive Norman OK 73019 USADaggerCorresponding Author Department of Economics ESSEC Business School and THEMA 3 Avenue Bernard Hirsch 95021 Cergy-Pontoise

France (santacreuvasutessecedu)sectTemple University and COMAS 1801 N Broad St Philadelphia PA 19122 USA

1

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

A Data appendix

A1 Samples

A11 Regression sample

To build the regression sample we use the following restrictions on the variables from the 1 ACS samples between

2005 and 2015 (Ruggles et al 2015)

bull Immigrants to the US

We keep respondents born abroad ie those for which BPLD ge 15000 We also drop respondents born in

US Pacific Trust Territories (BPLD between 71040 and 71049) US citizens born abroad (CITIZEN = 1) and

respondents for which the citizenship status is unavailable (CITIZEN = 0)

bull Married women in married-couple family households with husband present

We keep female respondents (SEX = 2) not living in group quarters (GQ = 1 2 or 5) and that are part of

married-couple family households (HHTYPE = 1) in which the husband is present (MARST = 1 and MARST_SP =

1)1

bull Non-English speaking respondents aged 25 to 49

We drop respondents who report speaking English in the home (LANGUAGE = 1) and keep those with AGE

between 25 and 49 Under the above restrictions we are left with 515572 respondents in the uncorrected

sample

bull Precise country of birth

We drop respondents who report an imprecise country of birth This is the case for the following BPLD codes

North America nsnec (19900) Central America (21000) Central America ns (21090) West Indies (26000)

BritishWest Indies (26040) British West Indies nsnec (26069) Other West Indies (26070) Dutch Caribbean

nsnec (26079) Caribbean nsnec (26091) West Indies ns (26094) Latin America ns (26092) Americas ns

(29900) South America (30000) South America ns (30090) Northern Europe ns (41900) Western Europe

ns (42900) Southern Europe ns (44000) Central Europe ns (45800) Eastern Europe ns (45900) Europe ns

(49900) East Asia ns (50900) Southeast Asia ns (51900) Middle East ns (54700) Southwest Asia necns

(54800) Asia Minor ns (54900) South Asia nec (55000) Asia necns (59900) Africa (60000) Northern

Africa (60010) North Africa ns (60019) Western Africa ns (60038) French West Africa ns (60039) British

Indian Ocean Territory (60040) Eastern Africa necns (60064) Southern Africa (60090) Southern Africa ns

(60096) Africa nsnec (60099) Oceania nsnec (71090) Other nec (95000) Missingblank (99900) This

concerns 4597 respondents that is about 089 of the uncorrected sample2

1We also drop same-sex couples (SEX_SP = 2)2Note that we do not drop respondents whose husband has a country of birth that is not precisely defined This is the case for 1123 respondentsrsquohusbands

2

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Some countries of birth are detailed at a lower level than a country In these cases we use the following rules

to assign sub-national levels to countries using the BPLD codes we group Canadian provinces (15010-15083)

into Canada (15000) Panama Canal Zone (21071) into Panama (21070) Austrian regions (45010-45080) into

Austria (45000) Berlin districts (45301-45303) into Germany (45300) West-German regions (45311-45333)

into West- Germany (45310) East-German regions (45341-45353) into East-Germany (45300) Polish regions

(45510-45530) into Poland (45500) and Transylvania (45610) into Romania (45600)

bull Precise language spoken

We need the LANGUAGE code to identify a precise language in order to assign the language variables from WALS

(Dryer amp Haspelmath 2011) to each respondent In general the general language code LANGUAGE identifies a

unique language However in some cases it identifies a grouping and only the detailed code LANGUAGED allows

us to identify a unique language This is the case for the following values of LANGUAGE We also indicate the

relevant LANGUAGED codes that we use instead of the general LANGUAGE code

ndash French (11) French (1100) Walloon French (1110mdashmerged with French) Provenccedilal (1120) Patois

(1130mdashmerged with French) and Haitian Creole (1140)

ndash Other Balto-Slavic (26) Bulgarian (2610) Sorbian (2620) Macedonian (2630)

ndash Other Persian dialects (30) Pashto (3010) Kurdish (3020) Baluchi (3030) Tajik (3040) Ossetic (3050)

ndash Hindi and related (31) Hindi (3102) Urdu (3103) Bengali (3112) Penjabi (3113) Marathi (3114) Gu-

jarati (3115) Magahi (3116) Bagri (3117) Oriya (3118) Assamese (3119) Kashmiri (3120) Sindhi

(3121) Dhivehi (3122) Sinhala (3123) Kannada (3130)

ndash Other Altaic (37) Chuvash (3701) Karakalpak (3702) Kazakh (3703) Kirghiz (3704) Tatar (3705)

Uzbek (3706) Azerbaijani (3707) Turkmen (3708) Khalkha (3710)

ndash Dravidian (40) Brahui (4001) Gondi (4002) Telugu (4003) Malayalam (4004) Tamil (4005) Bhili

(4010) Nepali (4011)

ndash Chinese (43) Cantonese (4302) Mandarin (4303) Hakka (4311) Fuzhou (4314) Wu (Changzhou)

(4315)

ndash Thai Siamese Lao (47) Thai (4710) Lao (4720)

ndash Other EastSoutheast Asian (51) Ainu (5110) Khmer (5120) Yukaghir (5140) Muong (5150)

ndash Other Malayan (53) Taiwanese (5310) Javanese (5320) Malagasy (5330) Sundanese (5340)

ndash Micronesian Polynesian (55) Carolinian (5502) Chamorro (5503) Kiribati (5504) Kosraean (5505)

Marshallese (5506) Mokilese (5507) Nauruan (5509) Pohnpeian (5510) Chuukese (5511) Ulithian

(5512) Woleaian (5513) Yapese (5514) Samoan (5522) Tongan (5523) Tokelauan (5525) Fijian (5526)

Marquesan (5527) Maori (5529) Nukuoro (5530)

ndash Hamitic (61) Berber (6110) Hausa (6120) Beja (6130)

3

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

ndash Nilotic (63) Nubian (6302) Fur (6304) Swahili (6308) Koranko (6309) Fula (6310) Gurung (6311)

Beacuteteacute (6312) Efik (6313) Sango (6314)

ndash Other Indian languages (91) Yurok (9101) Makah (9112) Kutenai (9120) Haida (9130) Yakut (9131)

Yuchi (9150)

ndash Mayan languages (9210) Pureacutepecha (9220) Mapudungun (9230) Oto (9240) Quechua (9250) Arawak

(9270) Muisca (9280) Guaraniacute (9290)

In some cases even the detailed codes do not provide a unique language We drop respondents who report

speaking such languages This is the case for the following LANGUAGE and LANGUAGED codes Slavic unkown

(27) Other Balto-Slavic (26000) India nec (3140) Pakistan nec (3150) Other Indo-European (3190)

Dravidian (4000) Micronesian (5501) Melanesian (5520) Polynesian (5521) Other Pacific (5590) Nilotic

(6300) Nilo-Hamitic (6301) Saharan (6303) Khoisan (6305) Sudanic (6306) Bantu (6307) Other African

(6390) African ns (64) American Indian (70) Other Penutian (77) Tanoan languages (90) Mayan languages

(9210) American Indian ns (93) Native (94) No language (95) Other or not reported (96) Na or blank (0)

This concerns 3633 respondents that is about 070 of the uncorrected sample

bull Linguistic structure available

We drop respondents who report speaking a language for which we do not have the value for the SB gram-

matical variable This is the case for the following languages Haitian Creole (1140) Cajun (1150) Slovak

(2200) Sindhi (3221) Sinhalese (3123) Nepali (4011) Korean (4900) Trukese (5511) Samoan (5522) Ton-

gan (5523) Syriac Aramaic Chaldean (5810) Mande (6309) Kru (6312) Ojibwa Chippewa (7213) Navajo

(7500) Apache (7420) Algonquin (7490) Dakota Lakota Nakota Sioux (8104) Keres (8300) and Cherokee

(8480) This concerns 27671 respondents who report a precise language that is about 537 of the uncorrected

sample

Once we drop the 34954 respondents that either do not report a precise country of birth a precise language

or a language for which the value for the SB grammatical variable is unavailable we are left with 480618

respondents in the regression sample or about 9322 of the uncorrected sample

A12 Alternative samples

bull Sample married before migration

This sample includes only respondents that got married prior to migrating to the US We construct the variable

marbef that takes on value one if the year of last marriage (YRMARR) is smaller than the year of immigration to

the the US (YRIMMIG) Because the YRMARR variable is only available between 2008 and 2015 this subsample

does not include respondents for the years 2005 to 2007

bull Sample of indigenous languages

4

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

The sample of indigenous languages drops respondents who report a language that is not indigenous to their

country of birth We define a language as not indigenous to a country if it is not listed as a language spoken in a

country in the Encyclopedia Britannica Book of the Year (2010 pp 766-770)

bull Sample including all households

The sample including all households is similar to the regression sample described in section A11 except that

we keep all household types (HHTYPE) regardless of the presence of a husband

bull Sample excluding language quality flags

The sample excluding language quality flags only keeps respondents for which the LANGUAGE variable was not

allocated (QLANGUAG = 4)

A2 Variables

A21 Outcome variables

We construct labor market outcomes from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

Throughout the paper we use six outcome variables Our variables are in lowercase while the original ACS variables

are in uppercase

bull Labor participant (lfp)

lfp is an indicator variable equal to one if the respondent was in the labor force the week before the census

(LABFORCE = 2)

bull Employed (emp)

emp is an indicator variable equal to one if the respondent was employed the week before the census (EMPSTAT

= 1)

bull Yearly weeks worked (wks and wks0)

We create measures for yearly weeks worked by using the WKSWORK2 variable It indicates the number of weeks

that the respondent worked during the previous year by intervals We use the midpoints of these intervals

to build our measures The first measure wks is constructed as follows It takes on a missing value if the

respondent did not work at least one week during the previous year (WKSWORK2 = 0) and takes on other values

according to the following rules

ndash wks = 7 if WKSWORK2 = 1 (1-13 weeks)

ndash wks = 20 if WKSWORK2 = 2 (14-26 weeks)

ndash wks = 33 if WKSWORK2 = 3 (27-39 weeks)

ndash wks = 435 if WKSWORK2 = 4 (40-47 weeks)

5

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

ndash wks = 485 if WKSWORK2 = 5 (48-49 weeks)

ndash wks = 51 if WKSWORK2 = 6 (50-52 weeks)

The variable wks0 is similarly defined except that it takes on value zero if the respondent did not work at least

one week during the previous year (WKSWORK2 = 0)

bull Weekly hours worked (hrs and hrs0)

We create measures for weekly hours worked by using the UHRSWORK variable It indicates the number of hours

per week that the respondent usually worked if the respondent worked during the previous year It is top-coded

at 99 hours (this is the case for 266 respondents in the regression sample) The first measure hrs takes on a

missing value if the respondent did not work during the previous year while the second measure hrs0 takes

on value zero if the respondent did not work during the previous year

A22 Respondent control variables

We construct respondent characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except the non-labor income variable nlabinc)

Our variables are in lowercase while the original ACS variables are in uppercase

bull Age (age age_sq age_2529 age_3034 age_3539 age_4044 age_4549)

Respondentsrsquo age is from the AGE variable Throughout the empirical analysis we further control for age squared

(age_sq) as well as for indicators for the following age groups 30-34 (age_3034) 35-39 (age_3539) 40-44

(age_4044) 45-49 (age_4549) The age group 25-29 (age_2529) is the excluded category

bull Race and ethnicity (race and hispan)

We create race and ethnicity variables race and hispan from the RACE and HISPAN variables Our race

variable contains four categories White (RACE = 1) Black (RACE = 2) Asian (RACE = 4 5 or 6) and Other

(RACE = 3 7 8 or 9) In the regressions Asian is the excluded category We additionally control throughout

for an indicator (hispan) for the respondent being of Hispanic origin (HISPAN gt 0)

bull Education (educyrs and student)

We measure education with the respondentsrsquo educational attainment (EDUC) We translate the EDUC codes into

years of education as follows Note that we donrsquot use the detailed codes EDUCD because they are not comparable

across years (eg grades 7 and 8 are distinct from 2008 to 2015 but grouped from 2005 to 2007)

ndash educyrs = 0 if EDUC = 0 No respondent has a value EDUCD = 1 ie there is no missing observation in

the regression sample

ndash educyrs = 4 if EDUC = 1 This corresponds to nursery school through grade 4

ndash educyrs = 8 if EDUC = 2 This corresponds to grades 5 through 8

6

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

ndash Between grade 9 (educyrs = 9) and 4 years of college (educyrs = 16) each EDUC code provides the

exact educational attainment

ndash educyrs = 17 if EDUC = 11 This corresponds to 5+ years of college

We also create an indicator variable student equal to one if the respondent is attending school This informa-

tion is given by the variable SCHOOL

Note that we also provide categories of educational attainment in the summary statistics tables They are defined

as follows

ndash No scholling if EDUC is 0 or 1

ndash Elementary if EDUC is 2

ndash High School if EDUC is between 3 and 6

ndash College if EDUC is between 7 and 11

bull Years since immigration (yrsusa)

We measure the number of years since immigration by the number of years since the respondent has been living

in the US which is given by the variable YRSUSA1

bull Age at immigration (ageimmig)

We measure the age at immigration as ageimmig = YEAR - YRSUSA1 - BIRTHYR where YEAR is the ACS

year YRSUSA1 is the number of years since the respondent has been living in the US and BIRTHYR is the

respondentrsquos year of birth

bull Decade of immigration (dcimmig)

We compute the decade of migration by using the first two digits of YRIMMIG which indicates the year in which

the respondent entered the US In the regressions the decade 1950 is the excluded category

bull English proficiency (englvl)

We build a continuous measure of English proficiency using the SPEAKENG variable It provides 5 levels of

English proficiency We code our measure between 0 and 4 as follows

ndash englvl = 0 if SPEAKENG = 1 (Does not speak English)

ndash englvl = 1 if SPEAKENG = 6 (Yes but not well)

ndash englvl = 2 if SPEAKENG = 5 (Yes speaks well)

ndash englvl = 3 if SPEAKENG = 4 (Yes speaks very well)

ndash englvl = 4 if SPEAKENG = 2 (Yes speaks only English)

7

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

bull Non-labor income (nlabinc)

We construct the non-labor income variable as the difference between the respondentrsquos total personal income

(INCTOT) and the respondentrsquos wage and salary income (INCWAG) for the previous year Both variables are

adjusted for inflation and converted to 1999 $ using the CPI99 variable We assign a value of zero to the

INCWAG variable if the respondent was not working during the previous year Note that total personal income

(INCTOT) is bottom coded at -$19998 in the ACS This is the case for 14 respondents in the regression sample

Wage and salary income (INCWAG) is also top coded at the 995th Percentile in the respondentrsquos State

A23 Household control variables

We construct household characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables throughout the empirical analysis (except time since married timemarr and

county of residence county) Our variables are in lowercase while the original ACS variables are in uppercase

These variables are common to the respondent and her husband

bull Children aged lt 5 (nchild_l5)

The number of own children aged less than 5 years old in the household is given by the NCHLT5 variable

bull Household size (hhsize)

The number of persons in the household is given by the NUMPREC variable

bull State of residence (state)

The State of residence is given by the STATEICP variable

bull Time since married (timemarr)

The number of years since married is given by the YRMARR variable It is only available for the years 2008-2015

bull County of residence (county)

The county of residence variable county is constructed from the COUNTY together with the STATEICP vari-

ables However not all counties are identifiable For 2005-2011 384 counties are identifiable while for 2012-

2015 434 counties are identifiable (see the excel spreadsheet provided by Ruggles et al 2015) About 796

of the respondents in the regression sample reside in identifiable counties

A24 Husband control variables

We construct the husbands characteristics from the variables in the ACS 1 samples 2005-2015 (Ruggles et al 2015)

They are used as control variables in the household-level analyses Our variables are in lowercase while the original

ACS variables are in uppercase In general the husband variables are similarly defined as the respondent variables

with the extension _sp added to the variable name We only detail the husband control variables that differ from the

respondent control variables

8

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

bull Years since immigration (yrsusa_sp)

It is similarly defined as yrsusa except we assign the value of age_sp to yrsusa_sp if the respondentrsquos

husband was born in the US

bull Age at immigration (ageimmig_sp)

It is similarly defined as ageimmig except we assign a value of zero to ageimmig_sp if the respondentrsquos

husband was born in the US

A25 Bargaining power measures

bull Age gap (age_gap)

The age gap between spouses is defined as the age difference between the husband and the wife age_gap =

age_sp - age

bull Non-labor income gap (nlabinc_gap)

The non-labor income gap between spouses is defined as the difference between the husbandrsquos non-labor income

and the wifersquos non-labor income nlabinc_gap = nlabinc_sp - nlabinc We assign a value zero to the 16

cases in which one of the non-labor income measures is missing

A26 Country of birth characteristics

bull Labor force participation

ndash Country of birth female labor force participation (ratio_lfp)

We assign the value of female labor participation of a respondentrsquos country of birth at the time of her decade

of migration to the US More precisely we gather data for each country between 1950 and 2015 from the

International Labor Organization Then we compute decade averages for each countrymdashmany series have

missing years Finally for the country-decades missing values we impute a weighted decade average of

neighboring countries where the set of neighboring countries for each country is from the contig variable

from the dyadic GEODIST dataset (Mayer amp Zignago 2011) To avoid measurement issues across countries

we compute the ratio of female to male participation rates in each country

ndash Past migrants female labor force participation (lfp_cob)

We compute the female labor force participation of a respondentrsquos ancestor migrants by computing the

average female labor participation rate (LABFORCE) of immigrant women to the US aged 25 to 49 living

in married-couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the

time of the census prior to the respondentrsquos decade of migration We use the following samples from

Ruggles et al (2015) the 1940 100 Population Database the 1950 1 sample the 1960 1 sample the

1970 1 state fm1 sample the 1980 5 sample the 1990 5 sample the 2000 5 sample and the 2014

5-years ACS 5 sample for the 2010 decade

9

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

bull Education

ndash Country of birth female education (yrs_sch_ratio)

We assign the value of female years of schooling of a respondentrsquos country of birth at the time of her year

of migration to the US More precisely we gather data for female and male years of schooling aged 15

and over for each country between 1950 and 2010 from the Barro-Lee Educational Attainment Dataset

(Barro amp Lee 2013) which provides five years averages The original data is available here and here

We impute a weighted average of neighboring countries for the few missing countries where the set of

neighboring countries for each country is from the contig variable from the dyadic GEODIST dataset

(Mayer amp Zignago 2011) Finally to avoid measurement issues across countries we compute the ratio of

female to male years of schooling in each country

ndash Past migrants female education (educyrs_cob)

We compute the female years of schooling of a respondentrsquos ancestor migrants by computing the average

female educational attainment (EDUC) of immigrant women to the US aged 25 to 49 living in married-

couple family households (HHTYPE = 1) that are from the respondentrsquos country of birth at the time of the

census prior to the respondentrsquos decade of migration We use the same code conversion convention as

with the educyrs variable We use the following samples from Ruggles et al (2015) the 1940 100

Population Database the 1950 1 sample the 1960 1 sample the 1970 1 state fm1 sample the 1980

5 sample the 1990 5 sample the 2000 5 sample and the 2014 5-years ACS 5 sample for the 2010

decade

bull Fertility (tfr)

We assign the value of total fertility rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Net migration rate (mig)

We assign the value of net migration rate of a respondentrsquos country of birth at the time of her decade of migration

to the US More precisely we gather data for each country between 1950 and 2015 from the Department of

Economic and Social Affairs of the United Nations (DESA 2015) which provides five years averages The

original data is available here We impute a weighted average of neighboring countries for the few missing

countries where the set of neighboring countries for each country is from the contig variable from the dyadic

GEODIST dataset (Mayer amp Zignago 2011)

bull Geography

10

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

ndash Latitude (lat)

The latitude (in degrees) of the respondentrsquos country of birth is the lat variable at the level of the countryrsquos

capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data is available

here

ndash Longitude (lon)

The longitude (in degrees) of the respondentrsquos country of birth is the lon variable at the level of the

countryrsquos capital from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original data

is available here

ndash Continent (continent)

The continent variable takes on five values Africa America Asia Europe and Pacific It is the

continent variable from the country-specific GEODIST dataset (Mayer amp Zignago 2011) The original

data is available here When the continent variable is used in the regressions Africa is the excluded

category

ndash Bilateral distance (distcap)

The distcap variable is the bilateral distance in kilometers between the respondentrsquos country of birth

capital and Washington DC It is the distcap variable from the dyadic GEODIST dataset (Mayer amp

Zignago 2011) The original data is available here

bull GDP per capita (gdpko)

We assign the value of the GDP per capita of a respondentrsquos country of birth at the time of her decade of

migration to the US More precisely we gather data for each country between 1950 and 2014 from the Penn

World Tables (Feenstra et al 2015) We build the gdpko variable by dividing the output-side real GDP at chained

PPPs in 2011 US$ (rgdpo) by the countryrsquos population (pop) The original data is available here We impute a

weighted average of neighboring countries for the few missing countries where the set of neighboring countries

for each country is from the contig variable from the dyadic GEODIST dataset (Mayer amp Zignago 2011)

bull Genetic distance (gendist_weight)

We assign to each respondent the population weighted genetic distance between her country of birth and the

US as given by the new_gendist_weighted variable from Spolaore amp Wacziarg (2016) The original data is

available here

bull Common language (comlang_ethno)

The comlang_ethno variable is an indicator variable equal to one if a language is spoken by at least 9 of the

population both in the respondentrsquos country of birth and in the US It is the comlang_ethno variable from the

dyadic GEODIST dataset (Mayer amp Zignago 2011) The original data is available here

11

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

A27 County-level variables

bull County-level country of birth density (sh_bpl_w)

The sh_bpl_w variable measures the density of immigrant workers from the same country of birth as a respon-

dent in her county of residence For a respondent from country c residing in county e it is computed as

sh_bpl_wce =Immigrants from country c in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of immigrants above age 16

of both sexes that are in the labor force

bull County-level language density (sh_lang_w)

The sh_lang_w variable measures the density of immigrant workers speaking the same language as a respondent

in her county of residence For a respondent speaking language l residing in county e it is computed as

sh_lang_wle =Immigrants speaking language l in county e

Immigrants in county etimes100

where the sample is the pooled 2005-2015 1 ACS samples (Ruggles et al 2015) of non-English speaking

immigrants above age 16 of both sexes that are in the labor force

B Missing language structures

While our main source for language structures is WALS (Dryer amp Haspelmath 2011) we complement our dataset

from other source in collaboration with linguists as well as with Mavisakalyan (2015) We scrupulously followed the

definitions in WALS (Dryer amp Haspelmath 2011) when inputting missing values The linguistic sources used for each

language as well as the values inputted are available in the data repository The languages for which at least one of

the four linguistic variables were inputted by linguists are the following Albanian Bengali Bulgarian Cantonese

Czech Danish Gaelic Guarati Italian Japanese Kurdish Lao Malayalam Marathi Norwegian Panjabi Pashto

Polish Portuguese Romanian Serbian-Croatian Swahili Swedish Taiwanese Tamil Telugu Ukrainian Urdu and

Yiddish Mavisakalyan (2015) was used to input the GP variable for the following languages Fula Macedonian and

Uzbek

12

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

C Appendix tables

Table C1 Language Distribution by Family in the Regression Sample

Language Genus Respondents Percent Language Genus Respondents Percent

Afro-Asiatic Indo-European

Beja Beja 716 015 Albanian Albanian 1650 034Arabic Semitic 9567 199 Armenian Armenian 2386 050Amharic Semitic 2253 047 Latvian Baltic 524 011Hebrew Semitic 1718 036 Gaelic Celtic 118 002

German Germanic 5738 119Total 14254 297 Dutch Germanic 1387 029

Altaic Swedish Germanic 716 015Danish Germanic 314 007

Khalkha Mongolic 3 000 Norwegian Germanic 213 004Turkish Turkic 1872 039 Yiddish Germanic 168 003Uzbek Turkic 63 001 Greek Greek 1123 023

Hindi Indic 12233 255Total 1938 040 Urdu Indic 5909 123

Austro-Asiatic Gujarati Indic 5891 123Bengali Indic 4516 094

Khmer Khmer 2367 049 Panjabi Indic 3454 072Vietnamese Viet-Muong 18116 377 Marathi Indic 1934 040

Persian Iranian 4508 094Total 20483 426 Pashto Iranian 278 006

Austronesian Kurdish Iranian 203 004Spanish Romance 243703 5071

Chamorro Chamorro 9 000 Portuguese Romance 8459 176Tagalog Greater Central Philippine 27200 566 French Romance 6258 130Indonesian Malayo-Sumbawan 2418 050 Romanian Romance 2674 056Hawaiian Oceanic 7 000 Italian Romance 2383 050

Russian Slavic 12011 250Total 29634 617 Polish Slavic 6392 133

Dravidian Serbian-Croatian Slavic 3289 068Ukrainian Slavic 1672 035

Telugu South-Central Dravidian 6422 134 Bulgarian Slavic 1159 024Tamil Southern Dravidian 4794 100 Czech Slavic 543 011Malayalam Southern Dravidian 3087 064 Macedonian Slavic 265 006Kannada Southern Dravidian 1279 027

Total 342071 7117Total 15582 324 Tai-Kadai

Niger-CongoThai Kam-Tai 2433 051

Swahili Bantoid 865 018 Lao Kam-Tai 1773 037Fula Northern Atlantic 175 004

Total 1040 022 Total 4206 088Sino-Tibetan Uralic

Tibetan Bodic 1724 036 Finnish Finnic 249 005Burmese Burmese-Lolo 791 016 Hungarian Ugric 856 018Mandarin Chinese 34878 726Cantonese Chinese 5206 108 Total 1105 023Taiwanese Chinese 908 019

Japanese Japanese 6793 141Total 43507 905

Aleut Aleut 4 000Zuni Zuni 1 000

13

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C2 Summary Statistics No Sample SelectionReplication of Table 1

Mean Sd Min Max Obs Difference

A Individual characteristics

Age 377 66 25 49 515572 -005Years since immigration 147 93 0 50 515572 010Age at immigration 230 89 0 49 515572 -015

Educational AttainmentCurrent student 007 025 0 1 515572 -000Years of schooling 125 37 0 17 515572 -009No schooling 005 021 0 1 515572 000Elementary 011 032 0 1 515572 001High school 036 048 0 1 515572 000College 048 050 0 1 515572 -001

Race and ethnicityAsian 031 046 0 1 515572 -002Black 004 020 0 1 515572 -002White 045 050 0 1 515572 003Other 020 040 0 1 515572 001Hispanic 049 050 0 1 515572 003

Ability to speak EnglishNot at all 011 031 0 1 515572 001Not well 024 043 0 1 515572 000Well 025 043 0 1 515572 -000Very well 040 049 0 1 515572 -000

Labor market outcomesLabor participant 061 049 0 1 515572 -000Employed 055 050 0 1 515572 -000Yearly weeks worked (excl 0) 442 132 7 51 325148 -001Yearly weeks worked (incl 0) 272 239 0 51 515572 -005Weekly hours worked (excl 0) 366 114 1 99 325148 -003Weekly hours worked (incl 0) 226 199 0 99 515572 -005Labor income (thds) 255 286 00 5365 303167 -011

B Household characteristics

Years since married 123 75 0 39 377361 005Number of children aged lt 5 042 065 0 6 515572 -000Number of children 186 126 0 9 515572 001Household size 425 161 2 20 515572 002Household income (thds) 613 599 -310 15303 515572 -021

Table C2 notes The summary statistics are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) except those for household characteristics which arecomputed using household weights (HHWT) The last column reports the difference in meansbetween the uncorrected sample and the regression sample along with stars indicating thesignificance level of a t-test of differences in means See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the10 percent level

14

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C3 Summary Statistics HusbandsReplication of Table 1

Mean Sd Min Max Obs SB1-SB0

A Individual characteristics

Age 411 83 15 95 480618 -18Years since immigration 145 111 0 83 480618 -32Age at immigration 199 122 0 85 480618 -46

Educational AttainmentCurrent student 004 020 0 1 480618 -002Years of schooling 124 38 0 17 480618 -16No schooling 005 022 0 1 480618 001Elementary 012 033 0 1 480618 010High school 036 048 0 1 480618 010College 046 050 0 1 480618 -021

Race and ethnicityAsian 025 043 0 1 480618 -068Black 003 016 0 1 480618 001White 052 050 0 1 480618 044Other 021 040 0 1 480618 022Hispanic 051 050 0 1 480618 059

Ability to speak EnglishNot at all 006 024 0 1 480618 002Not well 020 040 0 1 480618 002Well 026 044 0 1 480618 -008Very well 048 050 0 1 480618 004

Labor market outcomesLabor participant 094 024 0 1 480598 002Employed 090 030 0 1 480598 002Yearly weeks worked (excl 0) 479 88 7 51 453262 -02Yearly weeks worked (incl 0) 453 138 0 51 480618 11Weekly hours worked (excl 0) 429 103 1 99 453262 -01Weekly hours worked (incl 0) 405 139 0 99 480618 10Labor income (thds) 406 437 00 5365 419762 -101

B Household characteristics

Years since married 124 75 0 39 352059 -054Number of children aged lt 5 042 065 0 6 480618 004Number of children 187 126 0 9 480618 030Household size 426 162 2 20 480618 028Household income (thds) 610 596 -310 15303 480618 -159

Table C3 notes The summary statistics are computed using husbands sample weights(PERWT_SP) provided in the ACS (Ruggles et al 2015) except those for household char-acteristics which are computed using household weights (HHWT) The last column reportsthe estimate β from regressions of the type Xi = α +β SBI+ ε where Xi is an individuallevel characteristic where SBI is the wifersquos Robust standard errors are not reported Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

15

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Tabl

eC

4S

umm

ary

Stat

istic

sH

ouse

hold

sw

ithFe

mal

eIm

mig

rant

sM

arri

edS

pous

ePr

esen

tA

ged

25-4

9H

hus

band

Ww

ife

Defi

nitio

nM

ean

Sd

Min

M

ax

Obs

SB

1-SB

0

Age

gap

Hag

e-W

age

35

57

-33

6448

061

8-0

9

A

mer

ican

husb

and

H=

US

citi

zen

018

038

01

480

618

003

Whi

tehu

sban

dH=

whi

tean

dW6=

whi

te0

050

230

148

061

8-0

05

O

rigi

nho

mop

hily

HC

OB=

WC

OB

073

044

01

480

618

-00

0L

ingu

istic

hom

ophi

lyH

lang

uage

=W

lang

uage

087

034

01

480

618

006

Edu

catio

nga

pH

educ

atio

n-W

educ

atio

n0

053

05-1

717

480

618

-04

3

Lab

orpa

rtic

ipat

ion

gap

HL

FP-W

LFP

034

054

-11

480

598

013

Em

ploy

men

tgap

Hem

ploy

men

t-W

empl

oym

ent

035

059

-11

480

598

014

Wee

ksga

pH

wee

ksw

orke

d-W

wee

ksw

orke

d3

715

3-4

444

284

275

14

H

ours

gap

Hho

urs

wor

ked

-Who

urs

wor

ked

64

142

-93

9728

427

51

6

Lab

orin

com

ega

p(t

hds

)H

labo

rinc

ome

-Wla

bori

ncom

e15

540

9-4

2652

125

016

4-1

8

N

onla

borp

artic

ipat

ion

gap

(thd

s)

Hno

nla

bori

ncom

e-W

non

labo

rinc

ome

29

193

-414

515

480

618

-04

Tabl

eC

4no

tes

The

sum

mar

yst

atis

tics

are

com

pute

dus

ing

hous

ehol

dsa

mpl

ew

eigh

ts(HHW

T)pr

ovid

edin

the

AC

S(R

uggl

eset

al2

015)

T

hela

stco

lum

nre

port

sth

ees

timat

from

regr

essi

ons

ofth

ety

peX i

=α+

βSB

I+

εw

here

X iis

anin

divi

dual

leve

lcha

ract

eris

ticR

obus

tsta

ndar

der

rors

are

notr

epor

ted

See

appe

ndix

Afo

rdet

ails

onva

riab

leso

urce

san

dde

finiti

ons

lowastlowastlowast

Sign

ifica

ntat

the

1pe

rcen

tlev

ellowastlowast

Sign

ifica

ntat

the

5pe

rcen

tlev

ellowast

Sign

ifica

ntat

the

10pe

rcen

tlev

el

16

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C5 Summary Statistics Country and County-Level VariablesFemale Immigrants Married Spouse Present Aged 25-49

Variable Unit Mean Sd Min Max Obs

COB LFP Ratio female to male 5325 1820 4 118 474003Ancestry LFP 5524 1211 0 100 467378COB fertility Number of children 314 127 1 9 479923Ancestry fertility Number of children 208 057 1 4 467378COB education Ratio female to male 8634 1513 7 122 480618Ancestry educaiton Years of schooling 1119 252 1 17 467378COB migration rate -221 421 -63 109 479923COB GDP per capita PPP 2011 US$ 8840 11477 133 3508538 469718COB and US common language Indicator 071 045 0 1 480618Genetic distance COB to US 003 001 001 006 480618Bilateral distance COB to US Km 6792 4315 737 16371 480618COB latitude Degrees 2201 1457 -44 64 480618COB longitude Degrees -1603 8894 -175 178 480618County COB density 2225 2600 0 94 382556County language density 3272 3026 0 96 382556

Table C5 notes The summary statistics are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A26 for more details on variables sources and definitions

Table C6 Gender in Language and Economic Participation Individual LevelReplication of Columns 1 and 2 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4) (5)

Sex-based -0101 -0076 -0096 -0070 -0069[0002] [0002] [0002] [0002] [0003]

β -coef -0101 -0076 -0096 -0070 -0069

Years of schooling 0019 0020 0010[0000] [0000] [0000]

β -coef 0072 0074 0037

Number of children lt 5 -0135 -0138 -0110[0001] [0001] [0001]

β -coef -0087 -0089 -0071

Respondent char No No No No YesHousehold char No No No No Yes

Observations 480618 480618 480618 480618 480618R2 0006 0026 0038 0060 0110

Mean 060 060 060 060 060SB residual variance 0150 0148 0150 0148 0098

Table C6 notes The estimates are computed using sample weights (PERWT) providedin the ACS (Ruggles et al 2015) See appendix A for details on variable sources anddefinitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant atthe 10 percent level

17

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C7 Gender in Language and Economic Participation Additional OutcomesReplication of Column 5 of Table 3 Language Indices

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Panel A Intensity_1 = (SB + GP + GA + NG) times SB

Intensity_1 -0009 -0010 -043 -0342 -020 -0160[0002] [0002] [010] [0085] [007] [0064]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel B Intensity_2 = (SB + GP + NG) times SB

Intensity_2 -0013 -0015 -060 -0508 -023 -0214[0003] [0003] [014] [0119] [010] [0090]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel C Intensity = (GP + GA + NG) times SB

Intensity -0010 -0012 -051 -0416 -026 -0222[0003] [0003] [012] [0105] [009] [0076]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel D Intensity_PCA

Intensity_PCA -0008 -0009 -039 -0314 -019 -0150[0002] [0002] [009] [0080] [006] [0060]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Panel E Top_Intensity = 1 if Intensity = 3 = 0 otherwise

Top_Intensity -0019 -0013 -044 -0965 012 -0849[0009] [0009] [042] [0353] [031] [0275]

Observations 478883 478883 478883 478883 301386 301386R2 0132 0135 0153 0138 0056 0034

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Mean 060 055 272 2251 442 3657

Table C7 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

In the main analysis we use the following measure of intensity of female and male distinctions in the language

grammar Intensity = SBtimes (GP+GA+NG) where Intensity is a categorical variable that ranges from 0 to 3

The reason why we chose this specification instead of a purely additive specification such as in Gay et al (2013) is

that in a non-sex-based gender system the agreements in a sentence do not relate to female male categories They

may instead depend on whether the type of noun of animate or inanimate human or non-human When a grammatical

gender system is organized around a construct other than biological gender SB = 0 In this case Intensity = 0

18

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

since SB enters multiplicatively in the equation In these cases the intensity of female and male distinctions in the

grammar is zero since such distinctions are not an organizing principle of the grammatical gender system When a

grammatical gender system is sex-based SB= 1 In this case the intensity measure can range from 0 to 3 depending

on how the rest of rules of the system force speakers to code female male distinctions

Appendix Table C7 replicates the column (5) of Table 3 for all labor outcome variables using these alternative

measures Intensity_1 is defined as Intensity_1= SBtimes (SB+GP+GA+NG) where Intensity is a categorical

variable that ranges from 0 to 4 The only difference with Intensity is that it assigns a value of 1 for a language with

a sex based language but no intensity in the other individual variables Panel A presents results using Intensity_1

Intensity_2 is defined as Intensity_2= SBtimes (SB+GP+NG) where Intensity_2 is a categorical variable that

ranges from 0 to 3 The only difference with Intensity is that it excludes the variable GA since this individual variable

is not available for a number of languages Panel B presents results using Intensity_2 Intensity_PCA is the prin-

cipal component of the four individual variables SB GP GA and NG Panel D presents results using Intensity_PCA

Top_Intensity is a dummy variable equal to 1 if Intensity is equal to 3 (corresponding to the highest intensity)

and 0 otherwise3 Panel E presents results using Top_Intensity These alternative measure should not be taken

as measuring absolute intensity but rather as a ranking of relative intensity across languages grammar As Appendix

Table C7 shows our main results are robust to these alternative specifications providing reassuring evidence that the

specification we use is not driving the results

3We thanks an anonymous referee suggesting this alternative specification

19

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C8 Gender in Language and Economic Participation Alternative SamplesReplication of Column 5 of Table 3

(1) (2) (3) (4) (5) (6) (7)

Panel A Labor force participation

Sex-based -0027 -0029 -0045 -0073 -0027 -0028 -0026[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0132 0128 0137 0129 0124 0118 0135Mean 060 061 060 062 066 067 060

Panel B Employed

Sex-based -0028 -0030 -0044 -0079 -0029 -0029 -0027[0007] [0006] [0017] [0005] [0007] [0006] [0008]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0135 0131 0140 0131 0126 0117 0138Mean 055 056 055 057 061 061 055

Panel C Weeks worked including zeros

Sex-based -101 -128 -143 -388 -106 -124 -0857[034] [029] [082] [025] [034] [028] [0377]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0153 0150 0159 0149 0144 0136 0157Mean 272 274 270 279 302 300 2712

Panel D Hours worked including zeros

Sex-based -0813 -1019 -0578 -2969 -0879 -1014 -0728[0297] [0255] [0690] [0213] [0297] [0247] [0328]

Observations 480618 652812 411393 555527 317137 723899 465729R2 0138 0133 0146 0134 0131 0126 0142Mean 2251 2264 2231 2310 2493 2489 2247

Panel E Weeks worked excluding zeros

Sex-based -024 -036 -085 -124 -022 -020 -0034[024] [020] [051] [016] [024] [019] [0274]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0056 0063 0057 0054 0054 0048 0058Mean 442 444 441 443 449 446 4413

Panel F Hours worked excluding zeros

Sex-based -0165 -0236 0094 -0597 -0204 -0105 -0078[0229] [0197] [0524] [0154] [0227] [0188] [0260]

Observations 302653 413396 258080 357049 216919 491369 292959R2 0034 0032 0034 0033 0039 0035 0035Mean 3657 3663 3639 3671 3709 3699 3656

Sample Baseline Age 15-59 Indig Include Exclude All Quallang English Mexicans hh flags

Respondent char Yes Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes Yes

Table C8 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggleset al 2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

20

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C9 Gender in Language and Economic Participation Logit and ProbitReplication of Columns 1 and 5 of Table 3

OLS Logit Probit

(1) (2) (3) (4) (5) (6)

Panel A Labor force participation

Sex-based -0101 -0027 -0105 -0029 -0105 -0029[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0006 0132 0005 0104 0005 0104Mean 060 060 060 060 060 060

Panel B Employed

Sex-based -0115 -0028 -0118 -0032 -0118 -0032[0002] [0007] [0002] [0008] [0002] [0008]

Observations 480618 480618 480618 480612 480618 480612R2 0008 0135 0006 0104 0006 0104Mean 055 055 055 055 055 055

Respondent char No Yes No Yes No YesHousehold char No Yes No Yes No Yes

Respondent COB FE No Yes No Yes No Yes

Table C9 notes This table replicates table 3 with additional outcomes and with different estimators Theestimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015) Seeappendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percentlevel

Table C10 Gender in Language and Economic Participation Individual LevelHusbands of Female Immigrants Married Spouse Present 25-49 (2005-2015)

Dependent variable Labor force participation

(1) (2) (3)

Sex-based 0026 0009 -0004[0001] [0002] [0004]

Husband char No Yes YesHousehold char No Yes Yes

Husband COB FE No No Yes

Observations 385361 385361 384327R2 0002 0049 0057

Mean 094 094 094

Table C10 notes The estimates are computed using sam-ple weights (PERWT_SB) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and def-initions The husband characteristics are analogous to the re-spondent characteristics in Table 3 The sample is the sameas in Table 3 except that we drop English-speaking and nativehusbandslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5percent level lowast Significant at the 10 percent level

21

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C11 Gender in Language and Economic ParticipationReplication of Column 5 of Table 3

Dependent variable Labor force participation

(1) (2) (3) (4)

Sex-based -0028 -0025 -0046[0006] [0006] [0006]

Unmarried 0143 0143 0078[0001] [0001] [0003]

Sex-based times unmarried 0075[0004]

Respondent char Yes Yes Yes YesHousehold char Yes Yes Yes Yes

Respondent COB FE Yes Yes Yes Yes

Observations 723899 723899 723899 723899R2 0118 0135 0135 0136Mean 067 067 067 067

Table C11 notes The estimates are computed using sample weights (PERWT) pro-vided in the ACS (Ruggles et al 2015) See appendix A for details on variablesources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Signifi-cant at the 10 percent level

22

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C12 Language and Household BargainingReplication of Column 3 of Table 5

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Sex-based -0027 -0026 -1095 -0300 -0817 -0133[0009] [0009] [0412] [0295] [0358] [0280]

β -coef -0027 -0028 -0047 -0020 -0039 -0001

Age gap -0004 -0004 -0249 -0098 -0259 -0161[0001] [0001] [0051] [0039] [0043] [0032]

β -coef -0020 -0021 -0056 -0039 -0070 -0076

Non-labor income gap -0001 -0001 -0036 -0012 -0034 -0015[0000] [0000] [0004] [0002] [0004] [0003]

β -coef -0015 -0012 -0029 -0017 -0032 -0025

Age gap times SB 0000 -0000 0005 0008 0024 0036[0000] [0000] [0022] [0015] [0019] [0015]

β -coef 0002 -0001 0001 0003 0006 0017

Non-labor income gap times SB -0000 -0000 -0018 0002 -0015 -0001[0000] [0000] [0005] [0003] [0005] [0004]

β -coef -0008 -0008 -0014 0003 -0014 -0001

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes YesRespondent COB FE Yes Yes Yes Yes Yes Yes

Observations 409017 409017 409017 250431 409017 250431R2 0145 0146 0164 0063 0150 0038

Mean 059 054 2640 4407 2182 3644SB residual variance 0011 0011 0011 0011 0011 0011

Table C12 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al2015) See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

23

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

Table C13 Linguistic Heterogeneity in the HouseholdReplication of Column 3 of Table 6

Extensive margin Intensive margins

Including zeros Excluding zeros

Dependent variable LFP Employed Weeks Hours Weeks Hours(1) (2) (3) (4) (5) (6)

Same times wifersquos sex-based -0070 -0075 -3764 -2983 -0935 -0406[0003] [0003] [0142] [0121] [0094] [0087]

Different times wifersquos sex-based -0044 -0051 -2142 -1149 -1466 -0265[0014] [0015] [0702] [0608] [0511] [0470]

Different times husbandrsquos sex-based -0032 -0035 -1751 -1113 -0362 0233[0011] [0011] [0545] [0470] [0344] [0345]

Respondent char Yes Yes Yes Yes Yes YesHousehold char Yes Yes Yes Yes Yes YesHusband char Yes Yes Yes Yes Yes Yes

Observations 387037 387037 387037 387037 237307 237307R2 0122 0124 0140 0126 0056 0031

Mean 059 054 2640 2184 4411 3649SB residual variance 0095 0095 0095 0095 0095 0095

Table C13 notes The estimates are computed using sample weights (PERWT) provided in the ACS (Ruggles et al 2015)See appendix A for details on variable sources and definitionslowastlowastlowast Significant at the 1 percent level lowastlowast Significant at the 5 percent level lowast Significant at the 10 percent level

24

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

D Appendix figure

Figure D1 Regression Weights for Table 3 column (5)

Regression Weight

00 - 08

09 - 24

25 - 65

66 - 240

To better understand the extent to which each country of birth contributes to the identification of the coefficient

in column (5) table 3 we apply Aronow amp Samiirsquos (2016) procedure to uncover the ldquoeffective samplerdquo used in the

regression This procedure generates regression weights by computing the relative size of the residual variance of the

SB variable for each country of birth in the sample Appendix figure D1 shows regression weights for each country

of birth resulting from the regression of column (5) in Table 3 This specification includes respondent country of birth

fixed effects Therefore the weights inform us of the countries where identification is coming from and whether these

are multilingual countries or not As the figure shows identification mostly comes from multilingual countries such

as India followed by the Philippines Vietnam China Afghanistan and Canada

25

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure

References

Aronow P M amp Samii C (2016) lsquoDoes regression produce representative estimates of causal effectsrsquo American

Journal of Political Science 60(1) 250ndash267

Barro R J amp Lee J W (2013) lsquoA new data set of educational attainment in the world 1950ndash2010rsquo Journal of

development economics 104 pp 184ndash198

DESA (2015) lsquoUnited nations department of economic and social affairs population division World population

prospects The 2015 revisionrsquo

Dryer M amp Haspelmath M (2011) lsquoThe world atlas of language structures onlinersquo

Feenstra R C Inklaar R amp Timmer M P (2015) lsquoThe next generation of the penn world tablersquo The American

Economic Review 105(10) pp 3150ndash3182

Gay V Santacreu-Vasut E amp Shoham A (2013) The grammatical origins of gender roles Technical report

Inc E B (2010) 2010 Britannica book of the year Vol 35 Encyclopaedia britannica

Mavisakalyan A (2015) lsquoGender in language and gender in employmentrsquo Oxford Development Studies 43(4) pp

403ndash424

Mayer T amp Zignago S (2011) Notes on cepiirsquos distances measures The geodist database Working Papers 2011-25

CEPII

Ruggles S Genadek K Goeken R Grover J amp Sobek M (2015) Integrated public use microdata series Version

60

Spolaore E amp Wacziarg R (2016) Ancestry and development new evidence Technical report Department of

Economics Tufts University

26

  • Introduction
  • Methodology
    • Data
      • Economic and demographic data
      • Linguistic data
        • Empirical Strategy
          • Empirical results
            • Decomposing language from other cultural influences
              • Main results
              • Robustness checks
                • Language and the household
                  • Language and household bargaining power
                  • Evidence from linguistic heterogeneity within the household
                    • Language and social interactions
                      • Conclusion
                      • Appendix_2017_01_31_Revisionpdf
                        • Data appendix
                          • Samples
                            • Regression sample
                            • Alternative samples
                              • Variables
                                • Outcome variables
                                • Respondent control variables
                                • Household control variables
                                • Husband control variables
                                • Bargaining power measures
                                • Country of birth characteristics
                                • County-level variables
                                    • Missing language structures
                                    • Appendix tables
                                    • Appendix figure