learning from coworkers - princeton universityerossi/lc.pdfcompensating di erential for the...
TRANSCRIPT
Learning from Coworkers∗
Gregor Jarosch
Princeton University
Ezra Oberfield
Princeton University
Esteban Rossi-Hansberg
Princeton University
September 15, 2020
Abstract
We investigate learning at the workplace. To do so, we use German administrative data that con-
tain information on the entire workforce of a sample of establishments. We document that having
more-highly-paid coworkers is strongly associated with future wage growth, particularly if those work-
ers earn more. Motivated by this fact, we propose a dynamic theory of a competitive labor market
where firms produce using teams of heterogeneous workers that learn from each other. We develop a
methodology to structurally estimate knowledge flows using the full-richness of the German employer-
employee matched data. The methodology builds on the observation that a competitive labor market
prices coworker learning. Our quantitative approach imposes minimal restrictions on firms’ produc-
tion functions, can be implemented on a very short panel, and allows for potentially rich and flexible
coworker learning functions. In line with our reduced-form results, learning from coworkers is sig-
nificant, particularly from more knowledgeable coworkers. We show that between 4 and 9% of total
worker compensation is in the form of learning and that inequality in total compensation is signifi-
cantly lower than inequality in wages.
JEL: E24, J31, O33, D83Keywords: knowledge diffusion, production in teams, growth, income distribution, peer-effects
∗We thank the IR Section at Princeton University for providing a secured facility to access the Germandata and the IES at Princeton University for financial support. We thank our discussant Hugo Hopenhaynand participants at various seminars and conferences for useful comments. We also thank Sara Casella andMingduo Zhao for research assistance.
1 Introduction
Social interactions are an essential part of an individual’s life. These interactions are po-
tentially an important source of learning. Furthermore, since working adults spend a large
fraction of their time working, it is natural that most of this learning is the result of in-
teractions with coworkers. It is plausible that this form of learning constitutes the largest
and most important knowledge acquisition mechanism in society. One that transmits and
diffuses the practical knowledge that individuals use every day in their productive endeavors.
Little is known about this type of knowledge transfer in the workplace. Who learns from
whom? How much? What is the labor market value of this learning? How does this learning
change as we change the organization of production in the economy? We aim to provide
answers to some of these questions.
We are interested in understanding how individuals learn from coworkers with different
levels of knowledge and the implications of this form of learning for individual and aggregate
outcomes. To do so, we start by exploring the reduced-form empirical relationship between
the wage growth of an individual and the wages of her coworkers. To measure the key
features of this relationship we use German administrative data that contain the employment
biographies of the entire workforce of the establishments in the sample. We use a variety
of empirical specifications that allow us to understand which features of the distribution of
wages are related to an individual’s wage growth.
Our findings indicate that more-highly-paid coworkers substantially increase future wage
growth. Furthermore, the transmission depends on particular features of the wage distri-
bution. The data suggest a small sensitivity of wage growth to the wages of less-well-paid
workers and larger sensitivity to the wages of those higher up in the wage distribution. We
also show that the effects we find are present across the wage distribution, for workers that
switch plants, and for workers that leave their employers as a result of a mass layoff. In
addition, we show how the effects vary with worker tenure, age, establishment size, and with
the occupation of the team of coworkers. We also show that they are robust to controlling
for employment and wage bill growth at the worker’s plant.
Our results are strongly suggestive of significant learning from coworkers, particularly
1
from workers that earn more. Furthermore, although the battery of checks and different
specifications we present can never discard all alternative interpretations, we argue that our
evidence can rule out many of them. In particular, these findings cannot purely reflect most
forms of back-loading, mean reversion, sorting, rent-sharing, as well as certain alternative
forms of worker learning unrelated to coworkers.1
Motivated by these reduced-form facts, we develop a benchmark model of idea flows in
a competitive labor market. Workers produce in teams and, while doing so, learn from each
other. The model has the key feature that a worker’s pay reflects both her knowledge and a
compensating differential for the opportunity to learn from her coworkers. The labor market
also compensates those who provide their coworkers with learning opportunities.
Our goal is to take advantage of the structure of the model, together with detailed micro
data on individual wages in production teams, to measure the magnitude and characteristics
of learning on the job. Our model yields a mapping between the matched employer-employee
data and the underlying knowledge and learning of individuals. Thus, we structurally es-
timate a variety of parametric versions of the learning function, motivated by the most
important reduced-form patterns we document. We develop a novel way to estimate the pa-
rameters of this function using micro data for the German labor market. Our methodology
uses only information on each worker’s wage and the wages of her coworkers.
A first step in implementing our approach is to choose a cardinality for knowledge. We
show that the expected present value of income is a natural and useful choice for the units
of knowledge. It simplifies the empirical implementation of our approach and allows for a
natural interpretation of the estimated learning functions. We proceed by showing that, given
the learning function, we can invert individual Bellman equations to recover the knowledge
of individuals from the full set of wages of the members of each production team in a
single cross section. Doing this for several years yields a panel of individuals’ knowledge.
Our identification of the learning function comes from the restriction that the evolution of
individual knowledge must be consistent with the learning function we had imposed. We
find the fixed point of this GMM procedure using an iterative algorithm, resulting in a
1To further show that wage growth is related to the composition of coworkers, in Appendix C.1 we alsopresent an exercise where we build an instrument for changes in coworker composition using workers thatdie or otherwise disappear unexpectedly from the dataset.
2
structurally estimated ‘learning function’ that maps an agent’s learning to the knowledge
distribution of her coworkers.
Our model and estimation strategy rely on some strong but standard assumptions like
complete financial markets (or linear utility), perfect labor market competition, and sta-
tionarity. However, since the estimation strategy relies solely on the individual Bellman
equations together with observed wages and team composition, it imposes minimal restric-
tions on the set of firm technologies and complementarities across workers. In fact, our
methodology proves that these characteristics of the production function are not needed to
estimate learning functions. Furthermore, it can be implemented on very short panels re-
quiring only two observations per worker. As such, we view the structure we impose, and
the empirical results we obtain with it, as a natural benchmark.
We implement this structural estimation strategy using the German data and find that
agents’ learning is relatively insensitive to the knowledge of less knowledgeable workers and
significantly more sensitive to those with more knowledge, particularly from the most knowl-
edgeable members of their teams. On average, between 4 and 9% of the total compensation
of workers comes in the form of learning from coworkers in the same team (either same es-
tablishment or same establishment and occupation). We also show that inequality in wages
is between one third and one fifth larger than inequality in compensation because work-
ers with different levels of knowledge differ in how their compensation is divided between
wages and learning. We further document an apparent tension between firms’ production
requirements—which are reflected in the equilibrium composition of teams—and coworker
learning: Coworker knowledge flows would almost double if workers were to be grouped
in teams randomly. The finding suggests the presence of knowledge complementarities in
production.
Our methodology to estimate the learning function can be extended to incorporate other
observable worker characteristics beyond their level of knowledge. In particular, we explore
the role of age, and show that agents that are younger than 40 learn more rapidly, particularly
from other young peers. We also generalize our methodology in order to incorporate search
frictions in the labor market, rent sharing with heterogenous firms, imperfect information,
and incomplete markets. An empirical implementation of these extensions requires a richer
3
dataset than the one we have available, but they should be helpful to guide researchers with
access to dynamic employer-employee panel data.
There is a large literature in macroeconomics that has used learning from others as the
key mechanism to generate aggregate growth. Lucas (2009) proposes a theory of growth
based on random meetings between agents in the entire population. Lucas and Moll (2014)
and Perla and Tonetti (2014) extend these models to add a time allocation choice, while
Jovanovic and Rob (1989), Jovanovic and MacDonald (1994), and Konig et al. (2016) focus
on the innovation/imitation margin. Other models like Sampson (2015), Perla et al. (2015),
and Luttmer et al. (2014) also generate growth through adoption of ideas from others. As
Alvarez et al. (2013) and Buera and Oberfield (2020) show, the selection of what particular
ideas an individual or firm confronts, as determined for example by trade flows, is essential to
shape the growth properties of these models. This literature considers random learning from
the population, or a selected group of the population, but it has not incorporated learning
from coworkers. The importance of studying this form of selection in learning is evident,
but challenging. For starters, it requires modeling explicitly teams of coworkers that are
heterogeneous across firms. Caicedo et al. (2016) introduce learning in an economy where
production is organized in heterogeneous production hierarchies as in Garicano and Rossi-
Hansberg (2006), but learning interactions do not happen exclusively within the organization.
Jovanovic (2014) studies learning in teams of two, while Burstein and Monge-Naranjo (2009)
study an environment in which a manager hires identical workers and imparts knowledge to
those workers.2 We go further than these papers in that we model learning within teams and
provide direct evidence of its importance and its characteristics. We also provide a structural
estimation of the key parameters of the model.
While much of the empirical literature has focused on contemporaneous peer effects (Mas
and Moretti (2009) and Cornelissen et al. (2017)), empirical work of learning within teams
is much more scarce. Nix (2015) argues that increasing the average education of ones peers
raises one’s earnings in subsequent years. Akcigit et al. (2018) argue that increasing one’s
exposure to star patenters raises the likelihood of patenting and the quality of one’s patents.
2Anderson and Smith (2010) study matching with dynamic types which can also be interpreted as amodel of learning in teams of two.
4
In related and complementary work, Herkenhoff et al. (2018) build on a frictional sorting
setup to investigate learning with production complementarities. Like us, they detect strong
coworker spillovers. The main difference between both papers is that our competitive labor
market model allows us to structurally estimate the model without imposing restrictions on
the production technology. Our strategy is thus well-suited for utilizing the full richness of
the matched employer-employee data. Furthermore, our model features teams with arbitrary
numbers of heterogeneous workers, which allows us to study the role of the within-team
distribution of knowledge for the coworker learning process. A limitation of our analysis is
that wages reflect only knowledge and compensating differential from learning. While we
view this as an important benchmark, it certainly misses some components of wages, such
as rent sharing or adjustment frictions. An important advantage of Herkenhoff et al. (2018)
is that, in addition to knowledge and compensating differentials, wages reflect rent sharing.
A limitation is that, to inform the division of wages into these three components, they have
to lean heavily on the rest of the structure of the model, and in particular on auxiliary
assumptions on the production function.
The remainder of this paper is organized as follows. In Section 2 we present a number of
reduced-form findings about the relationship between the wage of an individual, the wages
of her coworkers, and an individual’s wage growth. Section 3 presents a general but simple
model of an economy in which agents learn from their coworkers. The theory is useful in
specifying exactly the concept of learning we have in mind and its implications. Section 4
contains our main results. We present and implement an algorithm to structurally estimate
the learning function introduced in Section 3, and describe the implications for individual in-
vestment in knowledge and inequality. Section 5 concludes. An Appendix discusses multiple
extensions and alternative interpretations of our model, presents results when we condi-
tion on a worker’s age, introduces our German matched employer-employee dataset, and
presents additional reduced-form results. A Supplementary Appendix discusses the detailed
construction of our dataset and presents results for an alternative team definition.
5
2 Reduced-Form Evidence
We start our analysis using German social security data to investigate the relationship be-
tween coworker (relative) wages and individual wage growth. Our goal is to provide empirical
evidence of workers learning from their teammates. To do so, we relate individual wage dy-
namics to wages in the peer group using various flexible reduced-form specifications. We
explore three empirical relationships that could indicate learning from coworkers. First, do
future wages rise more steeply if one’s coworkers are more highly paid? Second, if so, does
this relationship depend on which coworkers are more highly paid? That is, is it those be-
low in the within-team wage distribution or those above that matter? Third, how do these
reduced-form patterns change with team size, tenure, age, and the current wage level? We
further offer various robustness checks with the particular focus on ruling out alternatives to
learning which could be driving our initial findings, like a back-loaded wage structure, mean
reversion in wages, sorting, or rent-sharing within firms. At the end of the section we take
stock and discuss what we can conclude from the evidence we present.
We use a German dataset that contains the complete set of workers at a sample of estab-
lishments from 1999 to 2009. The dataset contains information on a worker’s establishment,
occupation, and average daily earnings along with a rich set of other worker characteristics
(age, gender, job and employment tenure, education, location, among others). Throughout,
we work with two different ways of defining a peer group. In Team Definition 1 a team is
defined by all workers in the same establishment. In Team Definition 2 a team is defined as
all workers in the same establishment and occupation. Data Appendix B presents summary
statistics, a Supplementary Appendix D describes the construction of the dataset.
2.1 Regression Framework
We begin with the following baseline specification which we implement separately for various
horizons h,
wi,t+h = α + βw−i,t + γwi,t + ωage + ωtenure + ωgender + ωeduc + ωocc + ωt + εi,t. (1)
6
wi,t+h is individual i’s log wage in year t + h, which we project on the log mean wage
of her peers in year t, w−i,t, controlling for her own log wage in year t, wi,t, along with
fixed effects for age decile, tenure decile, gender, education, occupation, and year. Unless
otherwise indicated that is the set of fixed effects used in all specifications. Further, we omit
observations that fall into the top and bottom percentile in terms of wage growth from t to
t+ h in all reduced form specifications.
All our regressions pool the observations across all years t. Since we observe the full peer
groups for ten years, the results for h > 1 use only a subset of years t. For instance, for
h = 10 we are restricted to exclusively use information about peers in the year 1999 and
2000 (since we observe workers until 2010). Likewise, for h = 5 we can use all years between
1999 and 2005.
We report the parameter estimates for each team definition in Table I, clustering standard
errors at the establishment level. Panel A reports the results using team definition 2. We
first note that our findings suggest quantitatively large effects: They imply that doubling
the mean wage of individual i’s peers raises, in expectation, i’s wage next year by almost
7%. These effects are naturally larger as the horizon extends further into the future, but
they do not grow linearly. This is natural in the context of learning, as agents likely learn
less as they gradually become more knowledgeable. Over a 10 year horizon, doubling peers’
wages results in 21% higher wages.
We next contrast these results with the corresponding results for the wider team defi-
nition 1. Comparing Panel A and Panel B of Table I, the coefficients tend to be larger for
the narrower team definition. These results are consistent with learning from coworkers if
interactions between coworkers within occupations are more intense or more useful. Thus, in
the rest of this section we restrict our attention to team definition 2. We separately report
all results for the alternative team definition in Supplementary Appendix E.
We next turn to an alternative specification where we split a peer group into those with
higher and lower wages. In particular, we let w+−i,t (w−−i,t) denote the log of the mean wage
of i’s peers with higher (lower) wages. We then run the otherwise unaltered specification,
wi,t+h = α+β+w+−i,t +β−w−−i,t +γwi,t +ωage +ωtenure +ωgender +ωeduc +ωocc +ωt + εi,t , (2)
7
Narrow Team Definition
Horizon in Years 1 2 3 5 10
w 0.068∗∗∗ 0.094∗∗∗ 0.12∗∗∗ 0.16∗∗∗ 0.21∗∗∗
(0.0036) (0.0052) (0.0071) (0.0094) (0.014)
Within R2 0.89 0.82 0.77 0.68 0.47Observations 3969166 3473642 2989524 2167851 508504
Broad Team Definition
Horizon in Years 1 2 3 5 10
w 0.058∗∗∗ 0.083∗∗∗ 0.11∗∗∗ 0.15∗∗∗ 0.21∗∗∗
(0.0031) (0.0046) (0.0063) (0.0086) (0.014)
Within R2 0.89 0.83 0.78 0.68 0.48Observations 4112284 3595957 3092504 2239223 523683
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β as estimated from specification (1). Column titles indicate horizon n. Standard errors clusteredat the establishment level. The regressions include current wage and fixed effects for age decile, tenuredecile, gender, education, occupation, and year.
Table I Estimation Results for Specification (1).
and report our findings in Table II.
The table documents a stark asymmetry. It suggests that the peers higher up in the
team wage distribution matter far more for future wage outcomes than the peers below.
While increasing the average wage of either group comes with a significant increase in the
expected wage of an individual, the peer group above has an impact three to five times larger
at all horizons. These findings are consistent with larger knowledge flows from more-highly-
skilled peers and relatively little sensitivity to the knowledge of less-skilled peers. Table III
shows similar results for the alternative team definition 1 that includes all workers in an
establishment. Since we find this stark asymmetry to be robust throughout, we henceforth
build on specification (2).
Table C.VIII in Appendix C.3 presents additional results when we use the same specifi-
cation but, in addition, control for either establishment wage bill growth or establishment
employment growth. Specifically, we control for the growth in total wage bill (or total count)
of the full time employed at the establishment between t − 2 and t − 1, t − 1 and t, and t
8
Horizon in Years
1 2 3 5 10
w+ 0.090∗∗∗ 0.13∗∗∗ 0.16∗∗∗ 0.22∗∗∗ 0.28∗∗∗
(0.0064) (0.010) (0.015) (0.021) (0.032)
w− 0.019∗∗∗ 0.029∗∗∗ 0.041∗∗∗ 0.060∗∗∗ 0.097∗∗∗
(0.0048) (0.0065) (0.0081) (0.011) (0.016)
Within R2 0.88 0.81 0.76 0.66 0.46Observations 3462305 3034301 2617097 1903104 448560
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2). Team definition 2. Column titles indicate horizonn. Standard errors clustered at the establishment level. The regressions include current wage and fixedeffects for age decile, tenure decile, gender, education, occupation, and year.
Table II Estimation Results for Specification (2).
and t + 1. The results remain virtually identical, indicating that the relationship between
wage growth and coworker composition is not driven by changes in these establishment-level
characteristics.3
2.1.1 Across the Wage, Age, Tenure, and Size Distributions
We show next that the forces we document are present across the labor market. In particular,
we run the same baseline specification separately for workers in different deciles of the wage
distribution.4 We also repeat the exercise for different deciles of the age, tenure, and team
size distribution.
Our first set of results are reported in Panel A of Table IV which presents the regression
coefficients for specification (2) which we run separately for each decile of the wage distri-
3We point to additional robustness results which are relegated to Appendix C.3. There, we show thatthe patterns documented are robust to modified sample selection criteria and reduced-form specifications.We show results when we omit teams with top-coded wages, when we omit teams with apprentices, whenwe exclusively work with establishments that neither have a collective bargaining agreement nor benchmarktheir wages with one, and we show results when we split the sample into prior and after the Hartz labormarket reforms. We also report results when we include more high-dimensional fixed effects and when wecontrol for many lags of the individual log wage. Most importantly, we show our baseline results under theinclusion of team fixed effects under both team definitions. As can be seen from the corresponding tables inAppendix C.3, our baseline results appear mostly insensitive to these modifications.
4Of course, we use the full peer group in the construction of the independent variable as before.
9
Horizon in Years
1 2 3 5 10
w+ 0.090∗∗∗ 0.13∗∗∗ 0.17∗∗∗ 0.23∗∗∗ 0.32∗∗∗
(0.0058) (0.0089) (0.012) (0.017) (0.026)
w− 0.025∗∗∗ 0.039∗∗∗ 0.057∗∗∗ 0.085∗∗∗ 0.12∗∗∗
(0.0044) (0.0060) (0.0081) (0.012) (0.019)
Within R2 0.89 0.82 0.77 0.68 0.48Observations 4026321 3522994 3032228 2197932 515017
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2). Team definition 1. Column titles indicate horizonh. Standard errors clustered at the establishment level. The regressions include current wage and fixedeffects for age decile, tenure decile, gender, education, occupation, and year.
Table III Estimation Results for Specification (2). Team Definition 1.
bution. The results are fairly stable across the wage distribution.5 We conclude that having
more-highly-paid coworkers is associated with future wage growth, largely independent of
the current level of wages.
Our next set of results is reported in Panel B of Table IV where we cut the sample
into different deciles of the (pooled) sample age distribution. Our findings suggest that the
effects we document are substantially stronger for young workers. Furthermore, while we find
positive and significant effects from more-highly-paid peers above and below for all segments
of the wage distribution, the asymmetry vanishes, and perhaps reverses, at the top.
Panel C of Table IV also shows that the patterns we describe are present across the
(job) tenure distribution. In particular, the table shows that more-highly-paid coworkers are
associated with larger wage growth for workers everywhere in the tenure distribution. In
addition, increasing the wages of more-highly-paid peers has a larger effect on one’s future
wage growth compared with the effect from less-well-paid peers. The asymmetry is stark at
the bottom of the tenure distribution and, similar to the results for age, vanishes towards
the top.
Panel D of Table IV gauges the role of team size. It shows that the patterns we describe
5The sharp increase in the coefficient estimate for the two top deciles is likely a consequence of thetop-coding since that group has an artificially compressed distribution of w+.
10
Panel A: Decile of the Wage Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.18∗∗∗ 0.13∗∗∗ 0.14∗∗∗ 0.13∗∗∗ 0.13∗∗∗ 0.11∗∗∗ 0.11∗∗∗ 0.10∗∗∗ 0.32∗∗∗ 0.35∗∗∗
(0.012) (0.015) (0.018) (0.018) (0.021) (0.026) (0.028) (0.023) (0.021) (0.060)
w− 0.043∗∗∗ 0.041∗∗∗ 0.045∗∗∗ 0.047∗∗∗ 0.048∗∗ 0.046∗∗ 0.071∗∗∗ 0.069∗∗∗ 0.057∗∗∗ 0.024∗∗∗
(0.0097) (0.010) (0.012) (0.014) (0.015) (0.015) (0.015) (0.0096) (0.010) (0.0058)
Within R2 0.43 0.086 0.053 0.040 0.037 0.039 0.051 0.080 0.17 0.062Observations 248920 263528 265462 264823 262632 261579 261499 261687 262796 264087
Panel B: Decile of the Age Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.31∗∗∗ 0.27∗∗∗ 0.16∗∗∗ 0.12∗∗∗ 0.097∗∗∗ 0.089∗∗∗ 0.058∗∗∗ 0.035∗∗∗ 0.018 0.0092(0.021) (0.026) (0.017) (0.014) (0.013) (0.012) (0.010) (0.0092) (0.0093) (0.0088)
w− 0.023 0.032∗∗ 0.030∗∗∗ 0.038∗∗∗ 0.045∗∗∗ 0.046∗∗∗ 0.056∗∗∗ 0.061∗∗∗ 0.065∗∗∗ 0.056∗∗∗
(0.014) (0.011) (0.0089) (0.0084) (0.0088) (0.0083) (0.0082) (0.0073) (0.0079) (0.0069)
Within R2 0.61 0.70 0.75 0.77 0.78 0.79 0.80 0.82 0.82 0.82Observations 313804 256870 253754 294113 205242 301874 284811 261404 221992 223184
Panel C: Decile of the Tenure Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.26∗∗∗ 0.25∗∗∗ 0.19∗∗∗ 0.16∗∗∗ 0.10∗∗∗ 0.077∗∗∗ 0.062∗∗∗ 0.050∗∗∗ 0.035∗ 0.012(0.023) (0.027) (0.025) (0.017) (0.014) (0.014) (0.012) (0.013) (0.014) (0.014)
w− 0.012 0.022∗∗ 0.038∗∗∗ 0.059∗∗∗ 0.051∗∗∗ 0.061∗∗∗ 0.062∗∗∗ 0.065∗∗∗ 0.050∗∗∗ 0.059∗∗∗
(0.011) (0.0086) (0.0090) (0.0091) (0.0085) (0.0093) (0.0089) (0.010) (0.013) (0.016)
Within R2 0.66 0.73 0.77 0.79 0.80 0.79 0.79 0.78 0.74 0.74Observations 259787 258018 261847 264724 314261 216646 298387 231182 261285 250908
Panel D: Decile of the Size Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.057∗∗∗ 0.097∗∗∗ 0.13∗∗∗ 0.16∗∗∗ 0.16∗∗∗ 0.23∗∗∗ 0.24∗∗∗ 0.25∗∗∗ 0.16∗∗∗ 0.16(0.0049) (0.0081) (0.013) (0.017) (0.020) (0.027) (0.043) (0.054) (0.040) (0.11)
w− 0.027∗∗∗ 0.040∗∗∗ 0.050∗∗∗ 0.056∗∗∗ 0.042∗∗ 0.021 -0.022 -0.084∗∗ -0.056 -0.11(0.0036) (0.0063) (0.0080) (0.010) (0.014) (0.016) (0.018) (0.030) (0.044) (0.074)
Within R2 0.78 0.77 0.75 0.74 0.72 0.70 0.70 0.66 0.61 0.65Observations 263527 276795 256020 256637 266652 259741 260926 259466 257506 259821
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2) for separate deciles of the wage, age, tenure, andteam size distributions. We include observation i in the decile k in t if i falls into the k’th decile ofthe pooled distribution. Team definition 2 at horizon h = 3 years. Standard errors clustered at theestablishment level. The regressions include current wage and fixed effects for age decile, tenure decile,gender, education, occupation, and year (whenever possible).
Table IV Baseline results for different deciles of the wage, age, tenure, and team sizedistribution.
11
are present in small and large teams alike. The estimated relationships are smaller for
teams with less workers and become somewhat less precise at the very top.6 The results
are consistent with knowledge flows per worker that increase with team size, but that are
significant throughout the team size distribution.
2.1.2 Switchers
In a competitive labor market, if the relationship between wage growth and team composition
is the result of learning, the resulting knowledge has to be valued outside the establishment.
We now explore whether the relationship we have uncovered is also present for workers that
switch teams. Thus, we run a specification (2) for a sample of workers that leave their
establishment after the reference spell. Specifically, we restrict the sample to workers who
leave their job after the reference date in year t and regain employment at a different employer
by the reference date in year t+ 1.7
The results are reported in Panel A of Table V. The table shows that more-highly-
paid peers are associated with higher wage growth even for those who move to a different
establishment. Furthermore, we find the same marked asymmetry we documented using all
workers. The results indicate that the effect of having better coworkers is somewhat larger
for agents that switch jobs than for agents that remain in the same job. This can be the
result of selection. Switchers might be the ones that learned the most, which might give
them an incentive to leave the team if there is no room for their newly acquired skills in
their current organization. Alternatively, it might be the result of the fact that switchers
tend to be young, and young workers’ wage growth is more sensitive to peers’ wages.
To account for some of these forms of selection we also present results for only those
switchers who experience an interim nonemployment spell between jobs. That is, we re-
strict the sample to movers who experience a period of joblessness in year t and report the
corresponding results in Panel B of Table V.
We go further and restrict the sample to switchers with an interim spell of nonemployment
6The reason is that we cluster standard errors at the establishment level and there are very few estab-lishments in the largest team decile.
7Recall that we assign the employer pertaining to the spell overlapping January 31th of any given yearas the annual observation. We further note that our data do not allow us to observe the firm’s otherestablishments so we cannot rule out that some of the team-switchers move within the same firm.
12
Panel A: All Switchers
Horizon in Years 1 2 3 5 10
w+ 0.092∗∗∗ 0.17∗∗∗ 0.20∗∗∗ 0.25∗∗∗ 0.35∗∗∗
(0.018) (0.022) (0.024) (0.027) (0.032)
w− -0.052∗∗∗ -0.031∗∗ -0.019 -0.0082 0.026(0.011) (0.0099) (0.012) (0.014) (0.022)
Within R2 0.59 0.55 0.49 0.40 0.26Observations 194848 228110 203726 160495 43609
Panel B: Switchers with Nonemployment Spell
Horizon in Years 1 2 3 5 10
w+ 0.094∗∗∗ 0.19∗∗∗ 0.20∗∗∗ 0.24∗∗∗ 0.31∗∗∗
(0.020) (0.019) (0.021) (0.023) (0.032)
w− -0.039 0.024 0.044∗∗ 0.036∗ 0.061∗
(0.022) (0.014) (0.016) (0.018) (0.030)
Within R2 0.37 0.46 0.39 0.31 0.19Observations 21084 72223 68781 57331 16224
Panel C: Switchers, Mass Layoff Event
Horizon in Years 1 2 3 5 10
w+ 0.11 0.14∗∗ 0.14∗∗ 0.22∗∗∗ 0.34∗∗∗
(0.056) (0.046) (0.051) (0.055) (0.089)
w− 0.032 0.032 0.064 0.049 -0.016(0.069) (0.042) (0.048) (0.054) (0.11)
Within R2 0.34 0.34 0.28 0.21 0.14Observations 2264 5258 5453 4904 1545
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2) on a sample of establishment switchers. TeamDefinition 2. Column titles indicate horizon h. Standard errors clustered at the establishment level.The regressions include current wage and fixed effects for age decile, tenure decile, gender, education,occupation, and year.
Table V Establishment switchers.
13
whose reference-spell employer also experienced a mass layoff event in year t.8 Arguably,
constructing the sample this way controls more fully for selection. As can be seen in Panel
C of Table V our corresponding estimates remain large and significant, although they tend
to fall somewhat. Naturally, the results are less precise since we lose many observations.
2.1.3 Coworker Occupation
Learning could depend on the occupation of the coworkers with whom they interact. Hence,
we offer results for specifications which split individuals working in the same establishment
as worker i into groups. The first exercise explores whether wage growth is more strongly
related to the wages of teammates that are employed in managerial occupations.9 Table
VI presents the results for our baseline specification in 1 after weighting for the share of
each group. Perhaps surprisingly, it shows that the relationship between wage growth and
peer wages is almost identical, and positive, for peers that are managers or those that are
workers. Under our interpretation, the managerial classification does not seem to be essential
for worker learning.10
Perhaps workers learn more from their peers in the same occupation. Hence, we go
further and divide coworkers into those working in the same occupation and those working
in different occupations. We explore this using the specification in (2) where we split each
of those two groups into those paid more than wi and those paid less. We include those four
variables on an otherwise unchanged specification (2) and report the results in Table VII.
The results indicate that individuals learn more from higher-wage peers in the same
occupation than in other occupations. The asymmetry between β+ and β− is also much
larger in the same occupation than in alternative ones. These results are natural if we
interpret them as resulting from learning. In their own occupation, individuals learn mostly
from more knowledgeable peers. In contrast, when they interact with peers in occupations
that use different knowledge, they learn from everyone since they may know less of the topic
themselves.
8See Supplementary Appendix D for details on the mass layoff definition.9We classify as managers workers in 3-digit KldB 88 occupations 751-753 and 761-763.
10We obtain similar conclusions if we use the wage gap to workers and managers that earn more.
14
Horizon in Years
1 2 3 5 10
ww 0.062∗∗∗ 0.090∗∗∗ 0.12∗∗∗ 0.16∗∗∗ 0.21∗∗∗
(0.0039) (0.0058) (0.0077) (0.010) (0.017)
wm 0.058∗∗∗ 0.084∗∗∗ 0.11∗∗∗ 0.16∗∗∗ 0.22∗∗∗
(0.0036) (0.0053) (0.0076) (0.010) (0.019)
Within R2 0.88 0.82 0.77 0.67 0.47Observations 3517568 3086436 2661936 1934856 458302
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Variables are weighted: To construct wm we compute the mean wage of workers in managerialoccupation(s) and then weight by the fraction of overall establishment employment of managers. Columntitles indicate horizon h. Standard errors clustered at the establishment level. The regressions includecurrent wage and fixed effects for age decile, tenure decile, gender, education, occupation, and year.
Table VI Manager and worker peers.
2.1.4 A More Flexible Specification
The results above, based on specification (2), exploit only two moments of the distribution
of coworker wages, namely, the gap to coworkers with wages above and below. We continue
the reduced-form exploration with an exercise that attempts to approximate the wage distri-
bution surrounding a worker in a more flexible way. To do so, we divide a worker’s peers into
11 bins. The bottom bin takes peers j with wage such that log(wj)− log(wi) < −.45 while
the top bin takes those peers with log(wj) − log(wi) > .45. All other workers are grouped
into 9 equally spaced bins in-between. We then compute, for each individual i and year t,
the fraction of her coworkers in each bin k, pi,k,t, and run the following regression,
wi,t+h =11∑k=2
βkpi,k,t + γwi,t + ωage + ωtenure + ωgender + ωeduc + ωocc + ωt + εi,t. (3)
That is, we project log wages h years ahead on the current log wage and a non-parametric ap-
proximation of the current peer wage distribution around a worker, along with our standard
controls and fixed effects. We present the results in Figure 1.
Figure 1 plots the marginal response of log wages h years ahead to increasing the weight on
15
Horizon in Years
1 2 3 5 10
w+same occ 0.12∗∗∗ 0.16∗∗∗ 0.21∗∗∗ 0.29∗∗∗ 0.39∗∗∗
(0.013) (0.019) (0.026) (0.037) (0.048)
w−same occ -0.00039 0.0076 0.013 0.022 0.043(0.012) (0.017) (0.023) (0.033) (0.047)
w+other occ 0.086∗∗∗ 0.12∗∗∗ 0.15∗∗∗ 0.21∗∗∗ 0.30∗∗∗
(0.0068) (0.011) (0.015) (0.022) (0.037)
w−other occ 0.038∗∗∗ 0.054∗∗∗ 0.076∗∗∗ 0.11∗∗∗ 0.15∗∗∗
(0.0064) (0.0089) (0.012) (0.018) (0.027)
Within R2 0.88 0.81 0.76 0.66 0.46Observations 3315351 2907077 2509645 1827701 431782
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2) with separate coefficients for peers in the sameoccupation and in other occupations. Variables are weighted: To construct w+
same occ (w+other occ), we
compute the mean wage of workers above in the wage distribution in the same (other) occupation(s) andthen weight by the fraction of overall establishment employment the (other) occupation(s) accounts for.Column titles indicate horizon h. Standard errors clustered at the establishment level. The regressionsinclude current wage and fixed effects for age decile, tenure decile, gender, education, occupation, andyear.
Table VII Peers in the same and in other occupations.
each of the 10 bins (relative to increasing the weight on bin 1 which is the omitted category).
The figure shows that moving 10% of one’s peers from the bottom bin into the highest bin
increases wages 3 years ahead by slightly more than 1.5%. The figure confirms the findings
from the previous exercises: Those who are less well paid (those in bins 5 and under) have
similar effects on a worker’s future wage growth. In contrast, workers seem to benefit from
additional highly paid workers in the peer group (those in bins 7 and higher). Note that, while
workers benefit more from more-highly-paid peers, the effects are less than proportional.11
This suggest that knowledge flows more efficiently from those in close proximity relative
to those far above in the wage distribution. Nevertheless, we stress that the effects are
monotonically increasing (almost everywhere), suggesting that individuals learn more from
11The top bin collects all peers j such that log (wj) − log (wi) > .45 and thus does not directly comparewith the other groups.
16
coworkers that are further out in the wage distribution.
The bottom panel of Figure 1 also confirms that learning from higher earners accumulates
over time presumably because the composition of teams is highly persistent. We report the
table underlying Figure 1 in Appendix C.2 along with the corresponding results for the
other team definition. In addition, we report the results, for horizon h = 3, for specifications
restricted to workers above (or below) the median wage in their team and to specifications
restricted to workers selected from particular deciles of the wage distribution. The basic
patterns in Figure 1 are generally confirmed.
-.2 0 .2 .4 >.45
0
10
2010-4
3 years95% bounds
-.2 0 .2 .4 >.45
0
1
2
10-3
1 yr235
Notes: We plot the coefficients βk from regression specification (3) with weights pk scaled such thatthey add up to 100. The bin k = 1 (which takes the peers i such that log (wj) − log (wi) < −.45) isthe omitted category. The top panel only plots h = 3 along with the 95% confidence bands. Standarderrors clustered at the establishment year level. The bottom panel plots estimates for different horizons.All workers with log (wj) − log (wi) > .45 are in one single bin as indicated by the break in the axisand the lines. The figure uses Team Definition 2. Standard errors clustered at the establishment level.The regressions include current wage and fixed effects for age decile, tenure decile, gender, education,occupation, and year.
Figure 1 Approximating the Wage Distribution
17
2.1.5 Taking stock
Put together the results in this section are, we believe, strongly consistent with the view
that workers learn from their peers. In a labor market in which wages are monotone in
knowledge conditional on team composition, wage growth is naturally associated with learn-
ing. Learning is naturally the result of interactions with those that possess knowledge, so
having more knowledgeable peers should lead to more learning. We show that, in fact, wage
growth is positively associated with the wage gap to coworkers, and particularly with the
wage gap to coworkers that earn more and are in similar occupations. The resulting increase
in knowledge is embedded in workers, and therefore reflected in wages, even for workers that
leave the establishment for, arguably, exogenous reasons.
Besides coworker learning there are other mechanisms that might potentially drive some
of the patterns uncovered above. The entirety of the evidence presented above allows us to
rule out many of them. We group alternative mechanisms in five main classes and discuss
each in turn.
Wage Back-Loading Certain plausible wage back-loading patterns could give rise to some
of our results. In particular, firms could attempt to retain workers by offering wage schedules
that pay relatively more in the future. Firms could have incentives to do so if it is costly to
hire new workers and workers search on the job. This would, for instance, be the case in an
environment similar to Burdett and Coles (2003) or Postel-Vinay and Robin (2002).
Note, however, that the patterns we document are present and similar in size for team
switchers, including those who arguably leave their establishments involuntarily. This sug-
gests back-loading, or establishment-specific factors, cannot be the only force behind our
baseline results. This is further corroborated by the evidence showing that our baseline
effects are present across the tenure and age distribution.
Mean Reversion There are two main forms of mean reversion in wages that could poten-
tially affect our results: Economy-wide and team-specific. Economy-wide mean reversion is
inconsistent with Table IV Panel A. There we show that the magnitude of the relationship
between wage growth and the wage gap does not decline as we focus on higher wage deciles.
18
Furthermore, the sharp asymmetry between β+ and β−, and the pattern of marginal
effects in Figure 1, are hard to reconcile with an explanation which builds on within-team
mean reversion in wages. For example, suppose deviations of one’s wage from the average
are idiosyncratic and not due to productivity differences, and therefore may be unrelated
to future wages within one’s own team or other teams. Then, differences in the timing of
raises unrelated to changes in productivity could give rise to a positive relationship between
wage gaps and future wage growth. Workers that have not received raises might later seek
other offers, or might later get raises due to equity concerns, or might quit to find a better
paying job. This mechanism, however, does not lead to effects that are stronger for the gap
to those that earn more than for the gap to those that earn less. In addition, the patterns
should not persist when we focus at those that switch jobs due to mass layoffs.
Rent-Sharing The simplest forms of rent-sharing in which a firm gets a windfall or in-
creases profitability and, through bargaining, raises the wages of all workers cannot explain
our results. The reason is that the shock does not affect the gap to coworkers, and would
therefore not affect the coefficient of interest. In particular, this mechanism does not pro-
vide any reason why a contemporaneous gap between one’s wage and coworkers’ wages would
predict future wage growth.
It is conceivable that our results could in part reflect a form of staggered rent-sharing.
Specifically, suppose a firm’s marginal product rises, which increases rents to be shared,
but some workers receive raises before others. We would expect the firms whose marginal
product rises to also increase employment or raise wages. However, as we show in Appendix
C.3, the baseline results hardly change when we include establishment-level growth controls.
It could also be that a firm experiences windfalls that leads to asymmetric raises among
employees, and the pattern of which coworkers get larger raises (due to greater rent-sharing)
happens to covary with the distribution of coworkers in just the right way. However, the fact
that the relationship is similar in magnitude among those who switch firms due to a mass
layoff is inconsistent with this mechanism.
19
Worker Sorting In principle, workers sorting could also be a mechanism behind our
findings. Workers could have heterogeneous income profiles: some workers will experience
faster wage growth than others regardless of their coworkers (conditional on observables
such as age, experience, etc.). It is possible that our estimates of learning from coworkers
are confounded by the sorting of workers across teams if sorting is such that high growth
individuals happen to be at the lower end of the within-team wage distribution. To check if
this is driving our results, we use a specification that controls for the worker’s wage growth
over the last five years in Table C.VII of Appendix C.3.12 The resulting relationship between
future wage growth and coworker wages is similar, albeit a bit smaller, than our baseline
specification, suggesting that our results are not driven by worker sorting.
Other Forms of Learning Beyond learning from coworkers, individual learning in the
workplace can result from learning-by-doing or from diffusion of a firm’s embedded knowl-
edge. For example, our results could also perhaps be explained by team-wide events that lead
employees to learn, with the new knowledge spreading to workers in a way that improves their
labor market prospects elsewhere. However, the fact that our results barely change when
we include establishment-level employment growth and wage bill growth controls limits the
importance of this alternative mechanism.13
More generic stories in which learning is sufficiently flexible and arbitrarily covary with
flexible moments of the distribution of wages within or outside of the firm are hard to rule
out in general; particularly if they do not lead to overall employment or wage bill growth
or large differences in learning across the size distribution (which we also do not find in
Table IV Panel D). Therefore, to move forward, we need to isolate the causal effect of team
composition on wage growth. In Appendix C.1 we propose and execute such an identification
strategy using (pseudo-)random exit events where a full-time employed, prime-age, worker
12Specifically, we control for the log of the wage in each of the last five years to allow for more flexibletime series processes in income profiles.
13It could also be that some firms engender learning more than others, and that team formation is such that,on average, those in high-learning firms tend to have more-highly-paid coworkers than those in low-learningfirms. To assess this, we incorporate team fixed effects into our baseline specification. Such a specificationcannot identify the impact of having better coworkers on average, as this is approximately co-linear withone’s own wage and the team fixed effect. Nevertheless, we can still identify the asymmetry between the gapto more-highly paid coworkers and the gap to less-well-paid coworkers. Table C.VI of Appendix C.3, showsthat this asymmetry is similar to in our baseline specification.
20
permanently leaves employment (as in Jager and Heining (2019)). Although noisy, these
results are consistent with our interpretation.
The reasoning above suggests that learning is a natural and parsimonious rationalization
of the facts we have uncovered, and that it is not obvious how to rationalize them using
alternative mechanisms. Of course, except for the somewhat noisier instrumental variable
results in Appendix C.1, the rest of our reduced-form results cannot be understood as causal
and so provide simply a suggestive statement about equilibrium relationships. In the next
section we propose a theory of learning motivated by this evidence. We use the suggested
specification of the learning function to structurally estimate the model using the same data.
3 A Benchmark Model
Motivated by the evidence in the previous section, we develop a theory of worker learning
from coworkers in a competitive labor market. Consider an economy populated by a unit
mass of heterogeneous individuals with knowledge z ∈ Z = [0, z]. Individuals have a prob-
ability δ of dying each period. Each period a mass δ of new individuals is born. Newborns
start with a level of knowledge z drawn from a distribution B0 (·). Agents supply labor
inelastically, consume, and discount the future according to a discount factor β. Agents are
employed in firms where they obtain a wage and where they can learn from other coworkers.
An agent z, working in a firm that employs the agent as well as a vector of coworkers z
will draw her next period’s knowledge from a distribution G (z′|z, z). Financial markets are
complete, or utility is linear, so agents maximize the expected present value of income.
Since individuals learn from coworkers, the wage they are willing to accept depends on
how much they might learn from coworkers. Thus, the wage schedule, w (z, z) , paid to a
worker with knowledge z depends also on the vector of coworkers z.
All firms produce the same consumption goods. Potential firms pay a fixed cost c in
goods, after which they draw technology a ∈ A from a distribution A (·). A firm with
technology a produces according to the production function F (z; a), where z is the vector
of workers it hires. Firms take the wage schedule as given. We purposely impose minimal
structure on the production function. In particular, differences across technologies need not
21
be Hicks-neutral or even factor augmenting; production technologies may also vary in their
complementarities across workers with different levels of knowledge. Hence, different firms,
in general, make different choices of z.
3.1 Firms
Let W (z) be the total wage bill of a firm that hires the vector of workers z. If z = {zi}ni=1 for
some n, then W (z) =∑n
i=1 w (zi, z−i), where z−i is the set of i’s coworkers. A firm chooses
the set of workers to maximize profit
π (a) = maxzF (z; a)−W (z) . (4)
Let z (a) = arg max zF (z; a)−W (z) denote a’s optimal choice.14
3.2 Individuals
Agents decide where to work each period given wages and the learning opportunities across
firms. Let Z denote the set of all possible vectors of coworkers. The expected present value
of earnings for an agent with knowledge z is given simply by
V (z) = maxz∈Z
w (z; z) + β
∫ ∞0
V (z′) dG (z′|z, z) . (5)
Namely, each period individuals choose where to work to maximize their wage, plus the
future stream of wages given their learning opportunities in the firm. In general, equilibrium
wages adjust so that workers are indifferent about working in a set of firms. The competitive
labor market assumption implies that workers with a given z will obtain the same value,
V (z), independent of where they work. Hence, the present value of earnings of a worker
does not depend on her current coworkers. Furthermore, since firms take the wage schedule
as given, it must be the case that if a firm wants to hire a vector of workers (z, z) then the
wage schedule must capture what it would cost to hire those workers. The wage schedule
14Note that the firm is choosing both the type of workers, zi, and the number of workers n. Togetherthese choices determine the vector z.
22
must therefore satisfy
w (z; z) = V (z)− β∫ ∞
0
V (z′) dG (z′|z, z) , (6)
for any z, z chosen in equilibrium. A simple implication is that for any z, z′
w (z; z)− w (z; z′) = −β[∫ ∞
0
V (z′) dG (z′|z, z)−∫ ∞
0
V (z′) dG (z′|z, z′)]. (7)
Namely, firms with distinct sets of employees pay different wages to identical individuals
to compensate for differences in their learning. If an individual learns a lot at a firm, the
firm can pay a low wage and still attract the worker. In this sense, wages incorporate
compensating differentials in learning.
3.3 Labor Market Clearing and Free Entry
Let B (z) be the fraction of workers with knowledge no greater than z. For any vector z,
let N (z, z) denote the number of elements of z that are weakly less than z. Labor market
clearing requires that for each z,
B (z) = m
∫a
N (z (a) , z) dA (a) , (8)
where m denotes the mass of firms in the economy.
Free entry requires that ∫a
[π (a)− c] dA (a) = 0. (9)
3.4 The Distribution of Knowledge
Given the choices of firms, we can define O (z|z) : Z×Z → [0, 1] to be the fraction of workers
with knowledge z that, in equilibrium, have a vector of coworker knowledge that is strictly
dominated by the vector z.15 Then the fraction of workers with knowledge no greater than
15It is possible the equilibrium is such that many teams contain workers with knowledge z. In that case,it must be that multiple teams of coworkers maximize (5). In the special case in which the solution to themaximization in (5) is unique for each x, then O (z|x) would be degenerate with a mass point at z chosen
23
z next period are those who are born with knowledge weakly less than z, and those whose
interactions with coworkers leaves them with knowledge weakly less than z. Namely,
B (z) = δB0 (z) + (1− δ)∫x
∫z∈Z
G (z|x, z) dO (z|x) dB (x) . (10)
3.5 Equilibrium
A stationary competitive equilibrium consists of a wage schedule w, a value function V , a
mass of firms m, firm choices z (a) and profit π(a), a coworker vector set Z, and a distribution
of worker knowledge B, such that
1. V and w satisfy (5) and (6);
2. z (a) and π(a) solve (4), namely, the choice of a team of workers maximizes the profit
for a firm with technology a taking the wage schedule as given;
3. The labor market clears for each z, so (8) is satisfied;
4. The free entry condition (9) holds;
5. The law of motion for B in (10) is satisfied.
3.6 Characterizing Equilibrium
The methodology we use below in Section 4 requires the value function V (z) to be strictly
increasing. Here we provide one set of conditions that are sufficient to give rise to this
property. In particular, we impose some structure on the functions F and G. We state these
properties in three assumptions. Throughout, when we compare two ordered vectors of the
same length, z1 < z2 means that each element of z2 is weakly greater than the corresponding
element in z1, and at least one element is strictly greater.
Assumption 1 F (z, a) is strictly increasing in each element of z: z1 < z2 implies F (z1, a) <
F (z2, a).
by individual x, z (x), and so the integral in (10) would be∫xG (z|x, z (x)) dB (x).
24
Assumption 2 G is strictly decreasing in z and z: z1 < z2 implies that G (z′|z, z1) >
G (z′|z, z2) for any z, z′, and z1 < z2 implies that G (z′|z1, z) > G (z′|z2, z) for any z, z′.
Assumption 3 There is free disposal of knowledge.
The first assumption implies that more knowledgeable individuals always have an absolute
advantage in production. The second assumption is that if two individuals have the same
coworkers, the one with more knowledge this period will have stochastically more knowledge
next period. It also says that if two individuals have the same knowledge, the one with more
knowledgeable coworkers will have stochastically more knowledge next period.
These assumptions are sufficient to deliver the following results:
Lemma 1 Suppose there is a firm with productivity a such that (z, z) = z(a). Then for each
z1 > z2 it must be that w (z1, z) > w (z2, z).
Proof. First, free disposal of knowledge ensures that V is weakly increasing. Second, the
fact that G is decreasing in z implies that w (z, z) is weakly decreasing in z. Finally, toward
a contradiction, suppose there was a z1 > z2 such that w (z1, z) ≤ w (z2, z). Then the firm
should hire z1 instead of z2. It would strictly increase output, it could pay that worker a
weakly lower wage, and it could weakly lower the wage of all other workers.
Proposition 1 V (z) is strictly increasing in z.
Proof. For any wage schedule, the operator T V (z) = maxz∈Zw (z, z)+β∫∞
0V (z′) dG (z′|z, z)
is a contraction because it satisfies Blackwell’s sufficient conditions. To show that the V is
strictly increasing, it is sufficient to show that if V is weakly increasing, T V is strictly in-
creasing. To see this, consider z1 < z2. Market clearing ensures that there is a firm that
hires z1, and let z1 be the coworkers of z1 in at least one such firm. Then this, along with
25
Lemma 1, implies
T V (z1) = w (z1, z1) + β
∫ ∞0
V (z′) dG (z′|z1, z1)
< w (z2, z1) + β
∫ ∞0
V (z′) dG (z′|z1, z1)
≤ w (z2, z1) + β
∫ ∞0
V (z′) dG (z′|z2, z1)
≤ maxz∈Z
w (z2, z) + β
∫ ∞0
V (z′) dG (z′|z2, z)
= T V (z2) ,
where the first inequality follows from Lemma 1 and the second inequality from the as-
sumption that G (·|z, z) is decreasing in z and the presumption that V is weakly increasing.
Proposition 2 z1 < z2 implies that w (z, z1) > w (z, z2).
Proof. This follows directly from the assumption that G is decreasing in z, Proposition 1,
and (7).
Proposition 3 Within a team, a worker that earns a higher wage has more knowledge.
Proof. Consider two workers in the same team, with respective knowledge z1 > z2. Let z
denote the vector of the rest of their coworkers. Then we have that
w (z2, (z1, z)) < w (z1, (z1, z)) < w (z1, (z2, z))
where the first inequality follows from Lemma 1 and the second inequality follows from
Proposition 2.
Finally, we show how a worker’s wage is related to her marginal product. Firms choose
a vector of workers z to maximize profits. Hence, they solve
π (a) = maxn,{zi}ni=1
F (z; a)−n∑
j=1
w (zj, z−j) .
26
Optimality implies∂
∂ziF (z; a)−
∑j 6=i
∂w (zj, z−j)
∂zi=∂w (zi, z−i)
∂zi.
The marginal cost to a firm of having its ith worker have a bit more knowledge is ∂w(zi,z−i)∂zi
.
The marginal benefit equals the sum of its marginal product and the change in wages the
firm must pay its other workers.
Since (6) must hold for any z, we can differentiate with respect to coworker i’s knowledge
to get∂w (zj, z−j)
∂zi= −βdE [V (z′) |zj, z−j]
dzi.
We can thus write the optimal condition for the firm as
∂w (zi, z−i)
∂zi=
∂
∂ziF (z; a) + β
∑j 6=i
d
dziE [V (z′) |zj, z−j] .
Hence, the marginal value of a worker’s knowledge to the firm reflects both the marginal
product of the knowledge and the marginal increase in coworkers’ learning.
Our theory of learning in a competitive labor market provides a decomposition of the
wage of a worker into two components, the competitive value they obtain in the market as a
function of their knowledge and a compensating differential for learning. Hence, conditional
on the team, a worker’s wage is increasing in her knowledge. Furthermore, otherwise identical
workers in teams with more knowledgeable coworkers earn less, since they learn more and so
can expect larger future increases in wages. In the next section we use the structure of the
model to identify individual worker knowledge using data on wages and team composition
which, in turn, allows for the estimation of the learning function, G (z′|z, z).
4 Structural Estimation
We now turn to a structural estimation of the amount of learning within teams using the
theory we developed in the previous section. One of the key problems interpreting the results
in Section 2 is there is no one-to-one mapping between wages and knowledge. In order to
go beyond reduced-form relationships between the distribution of wages and wage growth
27
and determine the implications of our findings for learning, we need a theory that allows
us to map one into the other. We use the theory developed in Section 3 to do so. Our
main objective is to estimate the “learning function” G(·). Below we describe a strategy to
recover G(·) from panel data that includes teams’ wages, and implement our strategy using
the German data.
Heuristically, our method exploits two dimensions of the data. First, it uses the within-
team coworker pay differentials we observe in repeated cross-sections to back out the worker
types z which are consistent with a particular G(.). That is, compensating differentials
reveal z given a learning function G(.). Second, it uses the intertemporal dimension of the
resulting panel dataset of worker types to estimate G(.). That is, the dynamic evolution of
zi projected on z−i identifies the learning function G(.).
4.1 Identifying Learning Parameters
Our identification strategy requires a panel of at least two years of matched employer-
employee data that includes wages. We rely only on the worker’s Bellman equation,
V (z) = w (z, z) + βE [V (z′) |z, z] (11)
which is the result of the worker’s maximization. Equation (11) depends on the assumptions
of stationarity, perpetual youth, competitive labor markets, and complete financial markets
(or linear utility).16 However, we do not need to place any assumptions on the set of firms
that are active, or features of the technologies that firms use beyond those which guarantee
that V (z) is increasing (e.g. Assumptions 1 to 3). The set of technologies and firms in the
economy determine the set of teams we observe in equilibrium, but our strategy simply uses
the set of observed teams.
We first note that z does not have a natural cardinality. We are therefore free to choose
a convenient one: If V (z) is the value function in the current equilibrium, we choose a
16Our approach allows for a number of generalizations. For example, if markets are so incomplete thatagents cannot save or borrow, we can simply replace the current return in (11) with a known increasing andconcave function of the wage.
28
cardinality of z so that V (z) = z. Then, (11) becomes
z = w (z, z) + βE [z′|z, z]
= w (z, z) + β
∫ ∞0
z′dG (z′|z, z)
or
zi = wi + βE [z′i|zi, z−i] (12)
= wi + β
∫ ∞0
z′idG (z′i|zi, z−i) .
Our strategy hinges on the following two observations. First, if we know, for each worker
i, z′i, zi, and z−i, we can directly identify G. Conversely, if we know G, we can invert (12)
and solve for zi as a function of a worker’s wage and the wages of her coworkers; for a team of
size n, (12) for each of the n team members delivers a system of n equations in n unknowns
(zi for each team member). Together these equations provide several moment conditions
that can be used to identify G using GMM.
Operationally, we choose a functional form for G (z′|z, z; θ), with parameters θ, and we
calibrate β externally. Starting from period t, we can decompose next period’s knowledge,
z′, into expected and unexpected components. Namely,
z′i = E (zi, z−i) + εi, (13)
where E (zi, z−i) is the conditional expectation and εi is the expectational error. We then use
the moment conditions built from E [εi|zi, z−i] = 0. Below we specialize to the case where
E (zi, z−i) = E [z′i|zi, z−i] is a linear combination of several moments {mk (zi, z−i)}Kk=1, so
that E (zi, z−i) =∑K
k=1 θkmk (zi, z−i). In such a case, we would have K parameters {θk} and
K natural moment conditions
E [mk (zi, z−i) εi] = 0, k = 1, ..., K. (14)
Formally, if a team has n workers, then given θ and w, (12) provide n equations for the n
29
unknowns of {zi}. Therefore given the wages wt and a vector of team assignments rt, we
can construct Z (wt, rt, θ) to be the I × 1 vector of all workers’ knowledge at t, where I is
the number of individuals in the data. Given this, we can construct M (wt, rt, θ) to be the
I ×K matrix of moments so that the i, k entry of M (wt, rt, θ) is mk (zi, z−i) where zi, z−i
are the knowledge of i and her coworkers implied by the wages, wt, the assignment rt, and
parameters θ. Then the k moments conditions (14) can be stacked as
E[M (wt, rt, θ)
T (Z (wt+1, rt+1, θ)−M (wt, rt, θ) θ)]
= 0. (15)
We solve for θ using an iterative two-step procedure that exploits the panel structure of
our data along with the intertemporal restrictions inherent in the learning function (13).
1. We first guess parameters θguess.
2. Given this guess, we can back out the types z in a team solely from information on
wages.17 In other words, we invert (12) to solve for all workers’ knowledge, Z (wt, rt, θguess).
We do this by finding a fixed point z of the operator
T (z) =
{wi + β
∫z′dG (z′|zi, z−i; θguess)
}i
.
We can then use the wages at time t + 1 to solve for all workers knowledge at t + 1,
Z (wt+1, rt+1, θguess). With this, we have the implied values of zi, z−i, and z′i for each
worker.
3. We then use these knowledge levels to estimate θ using a linear regression
zit+1 =K∑k=1
θkmk (zit, z−it) + εit.
4. If our estimated θ = θguess then we have found a fixed point. This fixed point is a
solution to (15). If not, we use θ to update our guess and go back to step 1.
17As discussed in the theory section above, the vector of types z is the solution to the firm problem in (4).Here, we simply use the composition of teams observed in the data.
30
A proof of identification then amounts to guaranteeing that this procedure has a unique
fixed point. While we currently do not have such a proof, this method has always uncov-
ered the true parameter values in Monte Carlo simulations and has always converged when
implemented on the matched German data.
4.2 Results
Guided by our reduced-form findings, we focus on the following parametric form for the
conditional expectation, that implicitly determines G(·),
E [z′i|zi, z−i] =
∫ ∞0
z′idG (z′i|zi, z−i; θ) =1
n− 1
∑j 6=i
ziΘ
(zjzi
)(16)
where n is the worker’s team size and Θ (·) is a weakly increasing function. Below we let
Θ (·) be piece-wise linear. We focus on the expected value because this is the only feature of
the function G needed to invert the Bellman equation and recover the workers’ knowledge.
This functional form could be motivated by a variant of the model in Lucas (2009) in which a
worker is equally likely to attain knowledge from any coworker, and the function Θ describes
how the worker’s learning depends on the gap between the worker and the coworker. In
contrast to Lucas (2009), however, here agents only learn from coworkers, not from the
whole population.
We begin by studying the parametric learning function
Θ (x) =
1 + θ0 + θ+ (x− 1) , x ≥ 1
1 + θ0 + θ− (x− 1) , x < 1,
or
E [z′i − zi|zi, z−i] = θ0zi +1
n− 1
θ− ∑zj<zi
(zj − zi) + θ+∑zj≥zi
(zj − zi)
. (17)
This learning function allows for asymmetric learning from types zj for a worker zi depending
on whether zj > zi or vice versa. It also allows for a constant time trend in skill growth, θ0.
It is also scale-invariant (apart from the constant) since we divide the second term by n− 1.
In updating our guess for θ = {θ+, θ−, θ0}, we make use of the linear structure of the
31
learning function and regress z′i−zi on zi,1
n−1
∑zj<zi
(zj − zi), and 1n−1
∑zj>zi
(zj − zi). Note
that all the information used in this regression is constructed purely from the cross-sectional
dimension of the data in the first step.
For this baseline learning function, we report our parameter estimates along with the
associated standard errors in Table VIII.18 Choosing the expected present value of earnings
as the cardinality of z allows for a natural interpretation of these estimates. In particular,
the point estimates suggest that raising the average expected present value of earnings of a
worker’s more-highly-paid coworkers by 100 Euros raises that worker’s expected present value
of earnings over the next year by 7 to 9 Euros times the share of more-highly-paid workers.
In turn, doing so for the coworkers that are less well paid only increases expected present
value of earnings by 1 to 4 Euros times the share of less-well-paid workers. This implies that
learning is not very sensitive to the knowledge of less-well-paid coworkers. Furthermore,
since both θ+ and θ− are positive, the results imply that learning for individuals at the
bottom of the distribution of knowledge in a team is large relative to learning for individuals
at the top of the distribution, and since θ+ > θ−, those at the bottom are more sensitive
to the knowledge of their coworkers. Naturally, we find somewhat larger effects for the
narrower team definition.19 Clearly, these point estimates are very much consistent with the
reduced-form patterns discussed in the previous section.
Finally, while θ0 is precisely estimated and strictly positive it is very small for both team
definitions. One reason for why we find essentially no trend growth in wages beyond what
arises from learning is that the average real wage growth during the period covered in our
dataset was very limited, as discussed above.
Before we turn to richer specification we briefly verify that the vector {z}t we construct
indeed picks up the expected present value of income. To this end, we select workers that
are in one of the teams we observe in 1999 and then are employed full-time in each of the
following years until the end of our sample in 2010. We construct their realized present
value of income using a 5% discount rate and assuming that their final wage in 2010 extends
18The only other parameter we need to choose is β which we set to .95 (annual) here. Our results are notparticularly sensitive to this choice.
19One important observation across all specifications we have worked with is that θ0 is substantially largerfor team definition 2.
32
Team Definition
1 2
θ+ .0673 .0882(.0004) (.0006)
θ− .0111 .0370(.0003) (.0004)
θ0 .0039 .0060(.00004) (.00003)
Observations 4763089 4590120
GMM standard errors in parentheses
Table VIII Parametric estimation results for the learning function (17).
forever into the future. We then regress the realized present value of income on a constant
and the zi,1999 constructed in the course of estimating θ. The resulting regression coefficient
is 1.05 which is reassuring.
A more flexible learning function The next step is to generalize the specification of
G to allow for additional flexibility in order to capture potential nonlinearities in coworker
learning. Hence, we specify the learning function to be defined by Θ(1) = θ0 and
Θ′(x) =
θ++, x ≥ 1 + b
θ+, 1 ≤ x < 1 + b
θ−, x < 1
.
Θ is a continuous and piecewise linear function with kinks at x = 1 and x = 1 + b, and
corresponds to the conditional expectation
E [z′i − zi|zi, z−i] = θ0zi+1
n− 1
{ ∑zj<zi
θ− (zj − zi) + (18)
∑zj>zi
[θ+ (zj − zi) + 1zj>(1+b)zi
(θ++ − θ+
)(zj − zi − bzi)
]},
33
Team Definition
1 2
θ+ .0844 .1091(.0011) (.0009)
θ++ .0668 .0853(.0004) (.0006)
θ− .0102 .0346(.0003) (.0004)
θ0 .0038 .0058(.00004) (.00003)
Observations 4763089 4590120
GMM standard errors in parentheses.
Table IX Estimates for the learning function (18).
where 1 denotes the indicator function. This piece-wise linear function incorporates addi-
tional flexibility yet still allows us to linearly project z′− z on the right-hand-side to update
the four parameters of the learning function {θ0, θ−, θ+, θ++}.20 When implementing this
learning function, we set b = 10%.
The results are reported in Table IX. Our estimates for θ0 and θ− are hardly changed by
the modification of the learning function for either team definition. That is, as before, chang-
ing the knowledge of those team members with lower type affects an individual’s expected
learning little in comparison with those team members with more knowledge. Likewise,
the estimated trend growth remains minimal. Our results indicate that θ+ > θ++, so the
marginal returns (in terms of future wage growth) to improving the knowledge of those above
in the wage distribution appears to be somewhat larger for those in closer proximity in the
distribution of knowledge. Just like in Table VIII, we find that the effects are stronger for
the narrower team definition. Workers appear to benefit more from those coworkers that
work in the same occupation. Similarly, improving those below in the skill distribution has
far larger positive effects when they also work in the same occupation.
20In light of our previous findings and due to computational limitations we have thus far restricted thelearning function to take a single parameter for the group zj < zi. This restriction could, in principle, berelaxed.
34
We conclude this subsection with three short exercises which cast light on the quantitative
importance of coworker learning and its interplay with how teams are formed. We run all
three exercises in the context of both the basic learning function (17) and the piece-wise
linear learning function in (18) for both team definitions.
4.2.1 Investment in Knowledge
An individual receives compensation in two ways: with wages and with knowledge. We can
use our estimated framework to gauge the quantitative importance of coworker learning in
the economy by comparing the value of knowledge flows to the value of wage payments.
Specifically, we compute the value of the annual flow of knowledge, β (z′i − z), where next
period’s knowledge z′i is given by equation (13) and compute its simple pooled average across
all individuals and years in our sample. We then subtract the pure trend component βθ0z
from this and contrast it with the pooled average of wages.
We present the results for both team definitions and both learning functions in Table
X. Coworkers knowledge flows account for roughly 4-9% of the average flow value workers
receive, with the remainder given by the wage. This is the total value of knowledge flows each
worker receives relative to the knowledge flows she would attain from working on a team of
identical workers. In other words, agents invest on average 4-9% of their total compensation
in learning from others at work.21 While the results are fairly similar across the two different
specifications, they are substantially larger for team definition 1. The reason is simply that
there is more within-team knowledge dispersion with team definition 1: Workers do not
only learn from their coworkers in the same occupation but also from everyone else in the
establishment.
Naturally, there is substantial heterogeneity in this breakdown across the knowledge
distribution. For the basic learning function and team definition 1, knowledge flows amount
to 16.0% for the bottom decile of the knowledge distribution reflecting the substantial room
for learning at the bottom. In turn, it becomes negative at the top decile, dropping to -.02%,
reflecting the mild negative effect of having mostly less knowledgeable coworkers. Naturally,
21If individuals do, in fact, learn from coworkers with the same knowledge, then some of the trend com-ponent would also represent knowledge flows. We note that when we do not subtract the trend componentthis number rises substantially, to approximately 17% in all four cases.
35
Team Definition
Learning Function 1 2
Basic Learning 8.27% 3.64%
Piece-Wise Linear Learning 8.68% 4.21%
Notes: First row refers to results for learning function (17) and second row to results for learning function(18).
Table X Coworker learning as a fraction of compensation.
this is somewhat less pronounced for the second team definition where the bottom decile
receives 6.2% of their flow compensation in terms of learning, dropping to -1.2% at the top.
The numbers are very similar for the piece-wise linear learning function.
4.2.2 The Role of Sorting
In equilibrium, the team selected by a firm produces both output and knowledge. As a
result, the sorting of workers across firms reflects both of these goals. How does equilibrium
sorting affect the value of learning within teams? How much would the total value of learning
change if teams were formed randomly?
To assess the role of coworker sorting for learning, we conduct a simple experiment
where we randomly reshuffle workers across existing teams in the final coworker year in our
sample, 2009. Specifically, we fix the vector of estimated zi,09 and then randomly reallocate
the existing 2009 team IDs. This gives each worker a random set of coworkers from the
population of workers while keeping the team size distribution unaltered. We then compute,
for all workers, the resulting counterfactual conditional expectation E(zi, z
cf−i)
for her type
in 2010, zi,10, where zcf−i is worker i’s counterfactual peer group in 2009.
We then contrast the average of the counterfactual conditional expectation with the
average of the factual conditional expectation, E (zi, z−i). We report the results in Table XI
which shows that, under random sorting, the average growth in z rises between 64% and
91%. We highlight that the associated losses in the value of output must weakly exceed
these knowledge gains in an efficient allocation. The results are naturally larger for team
36
Team Definition
Learning Function 1 2
Basic Learning 64.91% 91.14%
Piece-Wise Linear Learning 64.48% 88.07%
Notes: First row refers to results for learning function (17) and second row to results for learning function(18).
Table XI Increase in knowledge growth from random assignment.
definition 2 because within-team knowledge dispersion is smaller in more narrowly defined
teams.
These findings suggest that workers are allocated to teams in a way that hinders knowl-
edge flows relative to a random sorting benchmark. We interpret this as reflecting supermod-
ularity in the production function, which results in positive assortative matching of workers
in teams. Intuitively, since the learning function is increasing and convex over much of its
domain, learning benefits from large differences between coworkers. In contrast, production
benefits from small differences between team members due to knowledge complementarity in
the production function. In sum, these findings seem to suggest a tension between the con-
temporaneous requirements on the production side and the dynamic returns from coworker
learning.
That being said, there could be reasons for assortative matching aside from production
complementarities that are outside of the model. It could be, for example, that geographic
or other frictions in team formation lead to the assortative matching patterns.
4.2.3 Inequality
Two individuals with the same knowledge and same present value of earnings might earn
different wages because they work on teams with different opportunities to learn. In other
words some of the wage differences reflect compensating differentials for learning rather than
unequal compensation.
We now ask how variation in log wages compares to variation in log compensation,
37
Team Definition
Learning Function 1 2
Basic Learning -33.47% -19.56%
Piece-Wise Linear Learning -33.53% -19.70%
Notes: We first compute the variance of log wages and then the variance of log overall compensation.We report the percentage decline when moving from the first to the second measure of inequality. Firstrow refers to results for learning function (17). Second row for learning function (18).
Table XII Decline in inequality when taking learning into account.
where compensation is measured as (1 − β)zi = wi + βE[z′i − zi|zi, z−i]. The unconditional
variance of log wages in our data is approximately .13, while the unconditional variance in
log compensation is approximately .10 or .09, depending on the team definition (and similar
for both learning functions).
Table XII reports the difference between the variance of log wages and the variance in
log compensation as a fraction of the variance of log wages. By this measure, inequality in
compensation is one-fifth to one-third smaller than wage inequality. Note that the result
that compensation inequality is smaller is not mechanical; it reflects the fact that those with
less knowledge receive a larger share of their compensation in the form of learning.
4.2.4 The Role of Firm Size
Thus far, we have imposed that learning is independent of team size. To investigate the role
of firm size, we now modify the basic learning function (17) to allow for size effects. We let,
E [z′i − zi|zi, z−i] = θ0zi +1
(n− 1)k
θ− ∑zj<zi
(zj − zi) + θ+∑zj≥zi
(zj − zi)
. (19)
Specifically, we estimate θ = {θ+, θ−, θ0} separately for each k ∈ {.6, .65, .7, .75, .8, .9, 1, 1.1, 1.2}.
Note that k < 1 (k > 1) implies increasing (decreasing) returns to scale. To evaluate the
relative performance of the different specifications we run, after we have found a fixed point,
38
the regression
zi,t+1 = β0zi,t + β−∑
zj,t<zi,t
zj,t − zi,t(n− 1)k
+ β+∑
zj,t≥zi,t
zj,t − zi,t(n− 1)k
(20)
and compute the associated R2.
We plot the resulting R2 relative to the R2 for the k = 1 case in figure 2. The results
suggest that, at the establishment level, coworker learning is indeed independent of size.
At the same time, the specification with k = 0.8 obtains the best fit for the narrower team
definition.22 This suggests non-trivial returns to scale in learning from coworkers in one’s own
occupation. We interpret this as reflecting gains from diversity in learning from coworkers
in the same occupation.
0.6 0.7 0.8 0.9 1 1.1 1.20.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1.02
1.04
Est x Occ TeamsEstablishment Teams
Notes: We plot the R2 of the regression (20) after the estimation of the learning function (19) for eachk in {.6, .65, .7, .75, .8, .9, 1, 1.1, 1.2}. We express the R2 relative to the resulting R2 for k = 1 for bothteam definitions.
Figure 2 Returns to Team Size
4.3 Extensions and Alternative Interpretations
Our methodology can be extended to a variety of other settings. We can extend our frame-
work to have the production function, the learning function, or the value placed on knowledge
22For this specification, we obtain θ+ = .030 and θ− = .012, smaller than the results for k = 1 reported intable VIII, column 2. However, once we multiply by n−1
(n−1)k(with n denoting average team size) to offset, in
a simple way, for the mechanical effects of the different specification, we obtain θ+ = .131 and θ− = .005.
39
depend on observable worker characteristics aside from knowledge. For example, it may be
that younger workers’ knowledge growth is more sensitive to the knowledge of their peers
than older workers. In fact, this extension allows us to relax the baseline assumptions of
complete markets as well as perfect information about one’s knowledge and the knowledge
of coworkers. We can also relax the assumption of perfect competition in the labor market
and incorporate search frictions and other adjustment costs. Finally, our methodology can
be used in a setting in which knowledge is multidimensional. The common theme across
these extensions is that, conditional on the composition of one’s team and other observables,
there is a bijective mapping from knowledge to values.
Appendix A presents all these extensions of our methodology. As we make clear there,
all of the extensions require additional data beyond team composition and wages. We do not
currently have the data necessary to estimate learning functions in settings with incomplete
markets, search frictions, or multidimensional knowledge. Nevertheless, these extensions
should be useful to guide future researchers with access to richer datasets. In Appendix A
we do estimate an extension in which the learning function depends on her age. Naturally,
we find that the young learn more overall. Furthermore, we find that both the young and
the old learn more from the young, but the discrepancy is starker for the young. That is,
the young learn disproportionately from the young, closely in line with our reduced-form
findings.
5 Conclusion
We presented evidence suggesting learning from coworkers is significant. Our results are intu-
itive and natural. Workers learn from those more knowledgeable than they are, particularly
if they have the same occupation; knowledge growth is more sensitive to more-knowledgeable
coworkers than to less-knowledgeable coworkers. Individuals—especially those that are
younger and less knowledgeable—invest a substantial fraction of their compensation into
knowledge growth. As a result, inequality in wages overstates inequality in total compensa-
tion.
We hope that these findings are useful in encouraging work on more empirical research
40
with learning from coworkers at its core. Our baseline theory, although general in its specifi-
cation of technology and existing complementarities in production, does assume that workers
are simple income maximizers and that labor markets are competitive. We relax some of
these assumptions in the final section, but do not have enough data to incorporate them in
our estimation. It would be valuable to refine our estimates of the value of learning in the
workplace which richer datasets that implement these extensions.
Finally, the importance of learning from coworkers implied by our findings suggests large
aggregate consequences of any economy-wide change that affects the composition of teams.
Many such changes come to mind, like, for example, technological improvements in informa-
tion and communication technology, other forms of skilled-biased technical change, as well
as increased spatial segregation. Our results underscore the importance of studying these
and other well-known trends in the economy from the point of view of their effect on team
formation and the resulting learning from coworkers. Doing so would, however, require spec-
ifying and estimating a production function, which the rest of our analysis does not require.
We therefore leave these exercises for future research.
41
References
Akcigit, Ufuk, Santiago Caicedo, Ernest Miguelez, Stefanie Stantcheva, and Vale-
rio Sterzi, “Dancing with the Stars: Innovation Through Interactions,” Technical Report,
National Bureau of Economic Research 2018.
Alvarez, Fernando E, Francisco J Buera, and Robert E Lucas Jr, “Idea Flows,
Economic Growth, and Trade,” Technical Report, National Bureau of Economic Research
2013.
Anderson, Axel and Lones Smith, “Dynamic Matching and Evolving Reputations,” The
Review of Economic Studies, 2010, 77 (1), 3–29.
Buera, Francisco J and Ezra Oberfield, “The Global Diffusion of Ideas,” Econometrica,
2020, 88 (1), 83–114.
Burdett, Ken and Melvyn Coles, “Equilibrium Wage-Tenure Contracts,” Econometrica,
2003, 71 (5), 1377–1404.
Burstein, Ariel T and Alexander Monge-Naranjo, “Foreign Know-how, Firm Control,
and the Income of Developing Countries,” The Quarterly Journal of Economics, 2009, 124
(1), 149–195.
Caicedo, Santiago, Robert E Lucas Jr, and Esteban Rossi-Hansberg, “Learning,
Career Paths, and the Distribution of Wages,” Technical Report, National Bureau of
Economic Research 2016.
Cornelissen, Thomas, Christian Dustmann, and Uta Schonberg, “Peer Effects in
the Workplace,” American Economic Review, 2017, 107 (2), 425–56.
Davis, Steven J. and Till von Wachter, “Recessions and the Costs of Job Loss,” Brook-
ings Papers on Economic Activity, 2011, 43 (2), 1–72.
42
Eberle, Johanna, Alexandra Schmucker, and Stefan Seth, “Example programs for
data preparation of the Sample of Integrated Labour Market Biographies for Stata - Cre-
ating cross-sectional data and biographical variables,” Technical Report 2013.
Garicano, Luis and Esteban Rossi-Hansberg, “Organization and inequality in a knowl-
edge economy,” The Quarterly Journal of Economics, 2006, 121 (4), 1383–1435.
Herkenhoff, Kyle, Jeremy Lise, Guido Menzio, and Gordon Phillips, “Worker
Mobility and the Diffusion of Knowledge,” Technical Report, mimeo 2018.
Hopenhayn, Hugo A, “Entry, exit, and firm dynamics in long run equilibrium,” Econo-
metrica: Journal of the Econometric Society, 1992, pp. 1127–1150.
Jager, Simon and Jorg Heining, “How substitutable are workers? evidence from worker
deaths,” Technical Report 2019.
Jovanovic, Boyan, “Misallocation and Growth,” American Economic Review, 2014, 104
(4), 1149–71.
and Glenn M MacDonald, “Competitive Diffusion,” Journal of Political Economy,
February 1994, 102 (1), 24–52.
and Rafael Rob, “The Growth and Diffusion of Knowledge,” Review of Economic
Studies, October 1989, 56 (4), 569–82.
Klosterhuber, Wolfram, Joerg Heining, and Stefan Seth, “Linked-employer-
employee-data from the IAB: LIAB longitudinal model 1993-2010 (LIAB LM 9310),”
FDZ Datenreport. Documentation on Labour Market Data 201308 en, Institut fuer
Arbeitsmarkt- und Berufsforschung (IAB) February 2014.
Konig, Michael D, Jan Lorenz, and Fabrizio Zilibotti, “Innovation vs. imitation and
the evolution of productivity distributions,” Theoretical Economics, 2016, 11 (3), 1053–
1102.
Lucas, Robert E., “Ideas and Growth,” Economica, 02 2009, 76 (301), 1–19.
43
and Benjamin Moll, “Knowledge Growth and the Allocation of Time,” Journal of
Political Economy, February 2014, 122 (1), pp. 1–51.
Luttmer, Erzo GJ et al., “An Assignment Model of Knowledge Diffusion and Income
Inequality,” Federal Reserve Bank of Minneapolis Research Department Staff Report, 2014,
715.
Mas, Alexandre and Enrico Moretti, “Peers at Work,” American Economic Review,
2009, 99 (1), 112–45.
Nix, Emily, “Learning Spillovers in the Firm,” Technical Report, mimeo 2015.
Perla, Jesse and Christopher Tonetti, “Equilibrium Imitation and Growth,” Journal of
Political Economy, 2014, 122 (1), 52 – 76.
, , and Michael E Waugh, “Equilibrium Technology Diffusion, Trade, and Growth,”
Technical Report, National Bureau of Economic Research 2015.
Postel-Vinay, Fabien and Jean-Marc Robin, “Equilibrium Wage Dispersion with
Worker and Employer Heterogeneity,” Econometrica, 2002, 70 (6), 2295–2350.
Sampson, Thomas, “Dynamic Selection: An Idea Flows Theory of Entry, Trade and
Growth,” The Quarterly Journal of Economics, 2015.
Stole, Lars A and Jeffrey Zwiebel, “Intra-firm bargaining under non-binding contracts,”
The Review of Economic Studies, 1996, 63 (3), 375–410.
44
Appendix
A Model Extensions and Additional Interpretations
In this section, we develop a number of extensions. We first let the production function, the
learning function, or the value placed on knowledge depend on observable worker character-
istics aside from knowledge. We then relax the baseline assumptions of complete markets,
perfect information about one’s knowledge and the knowledge of coworkers, perfect com-
petition in the labor market, and incorporate search frictions and other adjustment costs.
Finally, we show that the methodology can be used in a setting in which knowledge is multi-
dimensional. We conclude the section by estimating the learning function in an environment
in which a worker’s learning depends on her age.
A.1 Incorporating Other Observables
We now study a setting in which either the production function or the learning function
depends on worker characteristics aside from knowledge. These characteristics may or may
not evolve endogenously. We require the characteristics to be observable.
An individual is described by knowledge z and a vector of observable characteristics
x. These evolve according to a joint Markov process. For example, x could consist of an
individual’s age, schooling, occupation, location, etc. Denote the joint Markov process by
G (z′, x′|z, x, {z, x}) . The value function for an individual with state z, x is V (z;x) satisfying
V (z, x) = w (z, x; {z, x}) + β
∫ ∫V (z′, x′)G (dz′, dx′|z, x, {z, x}) .
Thus, both the production function and the learning function can depend on the individual
characteristics and those of her coworkers. The realized value function of worker i at time t
is
vit = wit + β
∫ ∫V (z′, x′)G (dz′, dx′|zit, xit, {zit, xit}) . (A.1)
The key step is to transform the learning function from the knowledge space to the value
1
space. We require that V is strictly increasing in z for each x, and therefore has a partial
inverse Z (v, x) that satisfies v = V (Z (v, x) , x) and z = Z (V (z, x) , x). Further, define the
learning function in the value space as
G (v′, x′|v, x, v, x) ≡ G(Z (v′, x′) , x′|Z (v, x) , x, Z (v, x) , x
),
where Z (v; x) is the vector of coworkers’ knowledge given their values and characteristics.
With this, we can write (A.1) as
vit = wit + β
∫ ∫v′G (dv′, dx′|v, x, v, x) . (A.2)
A.1.1 Algorithm
We now show that it is straightforward to extend the methodology described in the previous
section to estimate the function G. Let E be the conditional expectation,
E (v, x, v, x) ≡ E [v′|v, x, v, x] =
∫ ∫v′G (dv′, dx′|v, x, v, x) ,
so that the realized Bellman equation (A.2) can be written as
vit = wit + βE (vit, xit, v−it, x−it) . (A.3)
If we observed {v′i, x′i, vi, xi, v−i, x−i} for each worker, we could directly estimate E . Con-
versely, if we knew E and we observed {wi, xi, w−i, x−i} for each worker, we could solve for
{vi} for each team using the system of equations given by (A.3) for each team member.
We can again cast this in terms of a GMM estimation procedure. For example, suppose
we assume E takes the form of a linear combination of moments {mk (v, x, v, x)}Kk=1, so
E (v, x, v, x) =∑K
k=1 θkmk (v, x, v, x) . Next, define the expectational error term εit+1 to be
εit+1 ≡ vit+1 − E [vit+1|vit, xit, v−it, x−it] = vit+1 − E (vit, xit, v−it, x−it) .
We can then build moment conditions to estimate θ from E [εit+1|vit, xit, vit, xit] = 0. Our
2
natural moment conditions would then be E [εit+1mk (vit, xit, vit, xit)] = 0. Formally, given
θ, we can solve for {vit} using (A.3). Given the entire vector of wages w, observable charac-
teristics x, and team assignments r, let Υ (w, x, r, θ) be the corresponding values that have
been solved for using (A.3) so that {vit} = Υ (wt, xt, rt, θ) . Given this, we can construct
M (wt, xt, rt, θ) to be the I ×K matrix of moments so that the i, k entry of M (wt, xt, rt, θ)
is mk (vi, xi, v−i, x−i) where vi, v−i are the values of i and her coworkers implied by the
wages, wt, the observable characteristics xt, the assignment rt, and parameters θ. Then the
k moment conditions (14) can be stacked as
E[M (wt, xt+1, rt, θ)
T (Υ (wt+1, xt, rt+1, θ)−M (wt, xt, rt, θ) θ)]
= 0.
A.1.2 Incomplete Markets
In this section we show that with additional information about workers’ assets, one could
relax the assumption of complete markets. Suppose an individual faces a consumption-
savings problem in which she can only save in risk-free bonds, possibly including a constraint
on her asset position. In such a model, an individual with assets b and knowledge z chooses
consumption and a team of coworkers to maximize the present value of utility given by
V (b, z) = maxc,b′,z
u (c) + βE [V (b′, z′) |z, z, b] ,
subject to the budget constraint c = b + w (z, b, z) − b′
1+r, borrowing constraints, and the
constraint that there is a firm willing to hire that worker along with those coworkers.23 If,
as is natural, the value is increasing in knowledge for any level of assets, the value function
has a partial inverse, Z(v, b). We can then express the learning function, G (z′|z, z) in value
space as G(v′|v, b, v, b
).24 Subsuming the optimal choices, the Bellman equations can be
expressed as
vi = u
(bi + wi −
b′i1 + r
)+ βE
[v′i|vi, b, v, b
].
23The wage schedule would depend on the worker’s assets because workers with different levels of assetswould place different values on new knowledge.
24Formally, G(v′|v, b, v, b
)≡ G
(Z (v′, b′ (z, b)) |Z (v, b) , {Z (vj , bj)}j 6=i
), where b′ (z, b) is the continua-
tion assets chosen by the worker.
3
Thus, with data on worker assets (so that we observe bi and b′i) and a choice of utility
function, we could implement the methodology of Section A.1.1 to estimate G.
A.1.3 Imperfect Information
In this section we discuss a setting in which individuals have imperfect information about
their coworkers’ knowledge. We discuss two extreme benchmark cases. First, we assume
that information about each worker’s knowledge is imperfect but all workers and firms share
the same beliefs. Second, we assume each individual has perfect information about her own
knowledge but no information about the knowledge of her coworkers.
We begin with a model in which beliefs about each workers’ knowledge are imprecise
but symmetric at all times. Each individual has a knowledge z∗ that evolves stochastically
over time according to the distribution G∗(z∗|z∗, z∗). Suppose further that, conditional on
observables, beliefs about one’s knowledge can be summarized by a single number, zi. For
example, suppose that beliefs were normal, with mean zi and variance that depended only
on age, experience, and tenure. The wage a firm had to pay a worker would depend on the
beliefs about her knowledge as well as the compensating differential reflecting how much that
worker expected to learn (which depends on beliefs about her coworkers’ knowledge). Thus,
the wage can be expressed as w(z, x, z, x), where xi are the observables that are sufficient to
characterize the precision of beliefs. A worker’s Bellman equation can be expressed as
V (z, x) = w(z, x, z, x) + βE [V (z′, x′)|z, x, z, x] .
The true learning functionG∗ along with the information generated in the production/learning
process give rise to a Markov process characterizing the evolution of beliefs, G(z′, x′|z, x, z, x).
This, along with the Bellman equation, are the same as those in Section A.1, and so we can
implement the same approach.
We next turn to a model with asymmetric information. We consider an extreme case
in which workers have perfect information about their own type and no information about
their coworkers’ types.25 In such an environment, wages would not reflect compensating
25Here, we do not need to take a stand on firms’ beliefs.
4
differentials for learning. Therefore, an individual’s wage would perfectly reveal her type, i.e.,
there would be a one-to-one mapping from wages to knowledge. In this type of environment,
the reduced-form approach taken in Section 2 could be used to structurally identify the
learning function.
A.1.4 Search and Rent Sharing
In this section we extend the model to relax the assumption of perfect competition for
workers and incorporate search frictions and other forms of adjustment costs. Suppose
that a firm’s output depends on its productivity and its team of workers according to the
production function F (z, a). Compensation of each worker is determined by multilateral
bargaining. We impose two restrictions on the bargaining solution. First, surplus is split
among the workers and the firm period by period. Second, the bargaining solution is such
that each worker’s compensation is a function of her type, the vector of coworkers, and the
firm’s productivity, w (z, z, a).26 The firm solves a dynamic stochastic optimization problem
that incorporates vacancy posting costs, worker attrition, and any other adjustment costs
such as firing and training costs. The solution induces a Markov transition function that
characterizes the (endogenous) law of motion for each worker’s knowledge and the set of
future coworkers, G(a′, z′|a, z), which incorporates learning, hires, and separations.27
The present value of income for each worker and the present value of earnings for the
firm would satisfy, respectively,
V (z, z, a) = w (z, z, a) + βE [V (z′, z′, a′) |z, z, a] ,
Y (a, z) = π (a, z) + βE [Y (a′, z′) |a, z] ,
where π (a, z) is the current flow profit, incorporating revenue, less the wage bill, less the
cost of choosing the probability distribution over next period’s team.
The key step in our methodology is to express the learning functions in terms of values.
26Two standard examples are Stole and Zwiebel (1996) and multilateral Nash bargaining in which bar-gaining weights are either symmetric or depend only on workers knowledge and the firm’s productivity.
27This specification nests models such as Hopenhayn (1992) in which productivity evolves exogenously aswell as those that incorporate R&D.
5
In particular, if we can write the learning functions for a worker and firm as Gw (v′|v, v, y),
and Gf (y′|y,v) respectively, then we can express the realized Bellman equations as
vi = wi + βE [v′i|vi, v−i, yj] , (A.4)
yj = πj + βE[y′j|yj,v
]. (A.5)
With these in hand, we can recover the learning functions as follows: If we know Gw and Gf ,
we can recover the n + 1 unknowns {vi} for each of the n workers and yi for the firm with
the n+ 1 equations, (A.4) for each of the n workers and (A.5) for the firm. Conversely, if we
know vi, v′i, v−i, and yj for each worker as well as yj, y
′j, and v for each firm, we can directly
estimate Gw and Gf . We can thus follow the same iterative procedure outlined above to
estimate the learning functions.
For this procedure to work, we must be able to invert the vector of value functions.
Specifically, given a vector of values {v, y} for a team, we must be able to recover their
knowledge {z, a}. If static/dynamic peer effects are not too large, then the value functions
{V (z, z, a)} and Y (a, z) are jointly invertible because the Jacobian is diagonally dominant.28
What exactly would this recover? E [v′|v, v] incorporates information both about learning
from others and how others affect the evolution of one’s value in other ways (i.e, giving firms
incentive to hire new workers that are complementary to you, or separate from them, etc.).
Hence, we would obtain more of an estimate of dynamic peer effects than of learning.
To reiterate, the key feature of this economy that makes our methodology feasible is the
fact that we can invert value functions and recover knowledge of the team of coworkers.
28To be concrete, suppose the value functions are, in fact, invertible. Let Z (v, v, y) be the knowledgeof a worker with value v and coworkers summarized by v working for a firm with value y, and let A(y,v)be the productivity of a firm that has value y and a team summarized by the vector of values v. ThenGw (v′|v, v, y) and Gf (y′|y,v) are defined as follows. The law of motion G implies a conditional distributionfor each worker’s knowledge in the next period, Gw(z′|z, z, y), and a conditional distribution for the firm’sproductivity Gf (a′|a, z). To derive the expressions for Gw and Gf , we can simply plug Z (v, v, y) and A(y,v)into these conditional distributions and take expectations, namely,
Gw (v′i|vi, v−i, yj) ≡ Ev′−i,y′j
[Gw
(Z(v′i, v
′−i, y
′j
) ∣∣∣∣ Z (vi, v−i, yj) ,{Z (vk, v−k, yj)}k 6=i , A(yj , {vi, v−i})
)∣∣∣∣ vi, v−i, yj , v′i] ,Gf (y′|y,v) ≡ Ev′
[Gf (A(y′,v′)|A(y,v), {Z (vi, v−i, y)}i)
∣∣ y,v, y′] .
6
Future research could try to extend this methodology to situations in which a worker’s
compensation depends on more than the vector of knowledge in her team, for example, on a
worker’s past outside offers.
A.1.5 Multidimensional knowledge
The equations characterizing an individuals value function use the idea that the change in
one’s knowledge depends on the distribution of knowledge among one’s coworkers. We have
proceeded under the presumption that knowledge can be described by a scalar, z. Suppose
instead, that z represents a vector of individual characteristics (e.g. different dimensions
of knowledge). There are some restrictions on the environment and learning process under
which our baseline procedure would still be appropriate to identify the learning.29 Outside
of these special cases, it may be possible, with additional data, to allow for more general
patterns of learning. To be concrete, suppose that knowledge was two-dimensional, zi =
{zi1, zi2}, and the learning function took the form G(z′i1, z′i2|zi1, zi2, z−i1, z−i2). Suppose also
that we observed an additional choice of each worker in addition to compensation. Label this
choice x(zi1, zi2). Finally, suppose that the value function and choice function were jointly
invertible, i.e., we could express zi1 = Z1(vi, xi) and zi2 = Z2(vi, xi), then we could use a
version of our methodology to recover the law of motion G(v′, x′|v, x, v, x). Clearly, this
approach would not separately identify how each component of knowledge evolves, but it
would recover the value of learning.30
29For example, learning could be such that the change in one’s value function depends on the compositionof one’s coworkers’ value functions. In this case, it is natural to assume that the learning function dependson values, not on knowledge directly, since values provide a relevant summary of the vector of characteristicsz. Then, the procedure outlined above to obtain an individual’s knowledge from a panel of wages simplyrecovers the values of all agents, as in equation (11). Under this assumption, our methodology can be usedexactly as described, and our quantitative results would be unchanged. We would simply interpret theestimated learning function as determining how the value of other agents determines the change in value ofa given individual.
30In an environment with rent-sharing, a similar approach can be taken for firms. If firm technology weremultidimensional, we could identify those dimensions separately if we observed several firm-level outcomessuch as revenue, investment, and R&D spending, and if we could invert the value function to express eachdimension of productivity in terms of those outcomes.
7
A.2 Age-Dependent Learning: Estimation Results
We finish this section with an illustration of the methodology in Section A.1 that studies
the differences in learning between young and old workers. Let {y, o} indicate whether a
worker is young or old. We implement the following parametric form for the conditional
expectation:
Ey [v′i − vi|vi, v−i] = θ0yvi +
1
n− 1
{θ+yy
∑vj>vi,j young
(vj − vi) + θ+yo
∑vj>vi,j old
(vj − vi)
},(A.6)
Eo [v′i − vi|vi, v−i] = θ0ovi +
1
n− 1
{θ+oy
∑vj>vi,j young
(vj − vi) + θ+oo
∑vj>vi,j old
(vj − vi)
}.
This specification allows knowledge flows to depend on both the age group of the worker and
the age group of her coworkers. For instance, θ+yo captures the strength of the knowledge flows
from old to young coworkers. Furthermore, the specification allows for age group specific
trend growth. For simplicity, we do not allow for any effects of coworkers j with vj < vi. The
rest of the implementation follows exactly the same routine outlined in the previous section.
We present our results in Table A.I. In line with our previous findings we find little trend
growth for either age group. Second, the young learn more overall. Furthermore, we find that
both the young and the old learn more from the young. Regardless of the team definition,
this discrepancy is starker for the young. That is, the young learn disproportionately from
the young, closely in line with our reduced-form findings.
B Data Appendix
We present below summary statistics of the variables of interest in our dataset. Supplemental
Appendix D describes the exact construction of our dataset.
B.1 Summary Statistics
We begin by briefly describing the dataset along key dimensions. The longitudinal version of
the Linked-Employer-Employee-Data of the IAB (LIAB LM 9310) contains information on
8
Team Definition
1 2
trend growth young: θ0y .0023 .0041
(.0001) (.0001)
learning of young from young: θ+yy .1190 .1473
(.0034) (.0045)
learning of young from old: θ+yo .0489 .0649
(.0016) (.0024)
trend growth old: θ0o .0008 .0020
(.0001) (.0001)
learning of old from young: θ+oy .0530 .0692
(.0014) (.0012)
learning of old from old: θ+oo .0385 .0533
(.0011) (.0015)
Observations 4763089 4590120
GMM standard errors in parentheses.
Notes: Old defined as 40 and older.
Table A.I Estimates for the learning function (A.6).
the complete workforce of a subset of German establishments. The sample establishments
are the ones selected—at least once—in an annually conducted survey between 2000 and
2008. The employee part of the dataset then contains the employment biographies from
1993 to 2010 of all individuals which were, for at least one day, employed at one of the
sample establishments between 1999 and 2009.31 As a consequence, we observe the complete
peer groups at the sample establishments from 1999 to 2009.
The employment biographies come in spell format and contain information, among other
things, on a worker’s establishment, occupation, and average daily earnings along with a
rich set of observables (age, gender, job and employment tenure, education, location, among
others). We organize the resulting dataset as an annual panel. Specifically, the annual
observation recorded (employer, average daily wage, etc.) for each individual pertains to the
spell which overlaps a particular reference date (January 31st).
31For more detail, see Klosterhuber et al. (2014).
9
To construct a baseline sample, we then proceed as follows. We select the panel case
establishments for which we obtain information on the full workforce. We then include
those individuals who were employed at one of those establishments at the reference date
during at least one year between 1999 and 2009. This leaves us with the employment
biographies (between 1993 and 2010) of the full workforce (at a reference date) of a large
number of establishments. The following subsections contain more detailed information on
the construction of the baseline sample.
We next document the team size distribution and the wage distribution, both economy-
wide, and within teams. We report all those statistics for the year 2000.
Team Size Distribution Figure B.1 plots the unweighted size distribution for both team
definitions for the year 2000. We restrict attention to teams that have size ≥ 2. The team
size distribution is naturally more compressed under the second, narrower, team definition,
but for both definitions a sizable fraction of teams are fairly large. The sample contains
4478 establishments with average size 116. When working with the second team definition
we have a total of 28524 teams with an average size of 18.32
Wage Distribution Figure B.2 plots the histogram of the average daily earnings during
the year-2000 reference spell in Euros. The “mass-points” reflect top-coding of the earnings
data at the social security contribution ceiling (which is lower in Eastern Germany). As a
consequence, 7.7% of the wage observations in the regression sample are are top coded.33 A
simple variance decomposition implies that the within-team component accounts for 47.9%
of the overall variance in wages under team definition 1 and 22.7% under team definition 2.
Finally, there is fairly little wage growth in the decade covered by our dataset. The cohort
whose wages are depicted in Figure B.2 experienced average annual wage growth of .97%
32It follows that, on average, an establishment has 6-7 different occupations with at least 2 individuals.The occupational classification follows the 1988 classification of occupations (KldB 88) published by theGerman Federal Employment Agency which has two digits and comprises about 330 values.
33While there exist imputation methods to address the truncation, we instead treat the top-coded ob-servations as actual wage observations and do not correct for the top-coding. We have experimented withvarious ways of treating the top coded observations and found our main empirical results to be robust. Weshow in Appendix C.3 how our reduced-form results change when omitting all teams with top coded wageobservations. For more detail regarding the construction of wages, see Supplemental Appendix D. There, wealso describe how we eliminate some extreme wage observations.
10
Figure B.1 Team Size Distribution.
0 20 40 60 80 100Team Definition 1
0
0.05
0.1
0.15
0 20 40 60 80 100Team Definition 2
0
0.05
0.1
0.15
Notes: Top panel plots the unweighted team size distribution in the year 2000 for teams of size 2-99 forteam definition 1 (so it corresponds to the establishment size distribution). The bottom panel plots theunweighted team size distribution in the year 2000 for teams of size 2-99 for team definition 2.
over the next 5 years. The annual growth rate drops to .43% in the second half of the decade.
Wage Gap to Coworkers We are interested in how a worker’s future wage growth re-
lates to her coworker’s (relative) wages. To gauge the extent of wage differences across peers,
Figure B.3 plots the histogram of wage gaps, defined as the log difference between an in-
dividual’s wage and the mean wage of her peers, for each team definition. Under the first
team definition, the gap has mean .026 and amounts to -.27, .04, and .29 at the 10th, 50th,
and 90th percentiles in the year 2000. Under the second team definition, the gap has mean
.012 and is -.17, .00, and .21 at the 10th, 50th, and 90th percentiles. Naturally, within-team
wage dispersion is smaller under the narrower team definition.
Correlations
We compute a set of correlations of various wage moments at the team level. Specifically,
Table B.I reports the correlation matrix of team average pay, team pay dispersion, team
mean-median ratio (skewness), team size, and max wage at the team. All entries of the
matrix are positive except the correlation between team average pay and the mean-median
11
Figure B.2 Wage Distribution
0 50 100 1500
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
10th Pct: 54.6Median: 87.090th Pct: 140.6Mean: 90.4 Euros
Notes: Distribution of mean daily wages during spell overlapping 01/31/2000 for full time employeesworking subject to social security.
ratio. 34
Mean Wage SD Wage Mean/median Team size Max wageMean Wage 1Wage sd 0.25 1Mean/median −0.15 0.09 1Team size 0.09 0.23 0.05 1Max wage 0.92 0.48 0.02 0.28 1
Table B.I Pairwise correlations at the team level, team definition 2. All variables in logs
C Additional Reduced-Form Empirical Results
C.1 An identification Strategy
This section develops an instrumental variable (IV) strategy in the spirit of Jager and Heining
(2019) who use arguably random, unexpected worker deaths to study the substitutability of
workers. In particular, we identify (pseudo-)random exit events where a full-time employed,
34A natural interpretation is that highly productive teams have a skewed wage distribution with veryhighly paid managers.
12
Figure B.3 Wage Gap Distribution
-1 -0.5 0 0.5 1Team Definition 1
0
1
2
3
4
-1 -0.5 0 0.5 1Team Definition 2
0
1
2
3
4
Notes: Top panel plots the distribution of wage gaps as defined in the main text in the year 2000 forteam definition 1. Bottom panel: team definition 2.
prime-age (26-54) worker permanently leaves employment subject to social security. Recall
that we observe peer groups from 1999-2009 and have individual-level information until 2010.
We therefore only study exits that occurred before January 2009 so that we know that the
respective worker did not return into employment (including part time) for a minimum of
24 months (and longer for most). A second, stricter version of the exit instrument applies
the additional criterion that the worker was continuously full time employed during the 24
months before the exit occurred.
We construct the instrument as follows: We identify all of i’s peers that exit between
year t− 1 and t, then compute the average wage bill of i’s peers in t− 1 with and without
the workers that exit. The log difference is our instrument.
Because we have only one instrument, we modify the specification (1) so that we regress
the cumulative growth in one’s wages between t and t + n, ∆wni,t, on the log wage gap to
one’s peers, wi,t ≡ log(
w−i,t
wi,t
), which has only one potentially endogenous regressor.
∆wni,t = α + βwi,t + ωage + ωtenure + ωgender + ωeduc + ωocc + ωt + εi,t. (C.1)
Further, since our instrument is a shifter of the change in one’s peers rather than the level,
13
we take first differences of (C.1):
D.∆wni,t = α + βD.wi,t + εi,t. (C.2)
where D. denotes variables in first differences and εi,t ≡ εi,t − εi,t−1. D.wi,t is the variable
we instrument.35
The number of exit events is not very large. Thus, while the instrument is strong, the
two stage least squares estimates are less precise than our baseline. Table C.I contains the
results for team definition 2 for both versions of the instrument along with the estimates of
(C.1) and (C.2) using OLS.36 All point estimates are positive, and rather large, consistent
with the view that increases in the wage gap lead to faster wage growth. However, the results
are significant at the 5% significance level only for 2 and 3 year horizons. Longer horizons
have too few observations.
C.2 Figure 1
This section offers the regression output underlying Figure 1. It then offers the same table
for the alternative team definition followed by the results for various restricted samples at
horizon h = 3.
C.3 Robustness
This sections evaluates the robustness of the main reduced-form empirical results reported
in section 2.1. We do so for team definition 2 at the horizon h = 3 years and report the
corresponding tables for team definition 1 below.
To do so, we begin by contrasting our baseline results for specification (2) when omitting
teams that have any apprentices.37 We then restrict the sample exclusively to teams without
35We drop all fixed effects when differencing. We have experimented with different strategies and foundit makes little difference.
36The estimates from (C.2) in the second panel are biased because the εi,t−1 determines wi,t, which meansthat εi,t is correlated with wi,t and hence D.wi,t. Note that this is not an issue for the IV specificationsbecause the instrument does not contain wi,t.
37We highlight that even our baseline results do not use any wage information on workers in apprenticeship.
14
Horizon in Years 1 2 3 5
Specification (C.1)
w 0.10∗∗∗ 0.15∗∗∗ 0.19∗∗∗ 0.25∗∗∗
(0.0037) (0.0056) (0.0076) (0.011)
Specification (C.2) - OLS
D.w 0.97∗∗∗ 0.81∗∗∗ 0.83∗∗∗ 0.86∗∗∗
(0.017) (0.015) (0.015) (0.016)
Specification (C.2) - First IV
D.w 0.14 0.46∗ 0.50 0.29(0.26) (0.22) (0.26) (0.26)
C-D F-Stat 1058.8 1052.8 976.6 819.3K-P F-Stat 42.9 50.2 62.3 64.9Observations 1243851 1052910 894950 621950
Specification (C.2) - Second IV
D.w 0.41 0.81∗∗ 0.96∗ 0.46(0.32) (0.31) (0.41) (0.46)
C-D F-Stat 828.1 713.4 549.7 345.3K-P F-Stat 46.5 46.8 43.1 32.7Observations 1059259 899139 766043 529574
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Standard errors clustered at the establishment level. Cragg-Donald and Kleibergen-Paap Fstatistics.
Table C.I IV results. Team Definition 2.
any top-coded wage observations.38 We report the corresponding results, contrasted with our
benchmark results, for h = 3 in table C.IV and, for the alternative team definition, in table
E.VI. While the results vary across the samples, the main takeaway from our reduced-form
exercises is robust.
We next restrict the sample to workers in teams that are not restricted by collective
bargaining agreements. To that end, we use IAB establishment panel for the year 2000
38The ceiling varies from year to year and differs between former Eastern and Western Germany. Further-more the data display a certain amount of bunching in a small interval around the officially reported ceilinglevels. To identify workers with top-coded wages we thus simply group workers into 50 Euro-cent wide binsin each year and flag the two bins with the most mass.
15
Horizon in Years
1 2 3 5 10
Bin 2 0.000088∗ 0.00024∗∗∗ 0.00036∗∗∗ 0.00047∗∗∗ 0.00100∗∗∗
(0.000039) (0.000063) (0.000085) (0.00011) (0.00026)
Bin 3 0.000023 0.000091 0.00018∗∗ 0.00025∗ 0.00082∗∗∗
(0.000033) (0.000049) (0.000067) (0.00011) (0.00021)
Bin 4 -0.000062∗ -0.00000048 0.000075 0.00013 0.00048∗
(0.000030) (0.000049) (0.000064) (0.000098) (0.00020)
Bin 5 -0.000079∗∗ -0.000013 0.000088 0.00019 0.00075∗∗∗
(0.000029) (0.000049) (0.000066) (0.00010) (0.00022)
Bin 6 0.000049∗ 0.00015∗∗∗ 0.00026∗∗∗ 0.00039∗∗∗ 0.00079∗∗∗
(0.000023) (0.000039) (0.000054) (0.000090) (0.00019)
Bin 7 0.00025∗∗∗ 0.00036∗∗∗ 0.00049∗∗∗ 0.00065∗∗∗ 0.0011∗∗∗
(0.000026) (0.000045) (0.000066) (0.00010) (0.00022)
Bin 8 0.00037∗∗∗ 0.00051∗∗∗ 0.00068∗∗∗ 0.00085∗∗∗ 0.0014∗∗∗
(0.000030) (0.000047) (0.000068) (0.00011) (0.00023)
Bin 9 0.00042∗∗∗ 0.00057∗∗∗ 0.00073∗∗∗ 0.00092∗∗∗ 0.0014∗∗∗
(0.000040) (0.000061) (0.000084) (0.00012) (0.00025)
Bin 10 0.00040∗∗∗ 0.00051∗∗∗ 0.00066∗∗∗ 0.00080∗∗∗ 0.0011∗∗∗
(0.000042) (0.000069) (0.000098) (0.00015) (0.00025)
Bin 11 0.00081∗∗∗ 0.0012∗∗∗ 0.0016∗∗∗ 0.0023∗∗∗ 0.0033∗∗∗
(0.000047) (0.000076) (0.00010) (0.00016) (0.00029)
Within R2 0.88 0.81 0.76 0.66 0.46Observations 3354925 2942967 2537838 1846999 436728
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Each row reports the coefficient on the weight of bins 2 through 11 where the weight on thebottom bin is the omitted category. Each column corresponds to one line in figure 1. Team definition 2.Column titles indicate horizon h. Standard errors clustered at the establishment level. The regressionsinclude current wage and fixed effects for age decile, tenure decile, gender, education, occupation, andyear.
Table C.II Results for figure 1 using specification (3).
which asks establishments whether a binding collective bargaining agreement exists and if
so if they pay above the applicable collective bargaining agreement. The survey also asks
whether firms benchmark their wages with a collective bargaining agreement in case they
are not subject to a binding agreement. The second columns of table C.V reports the results
if we restrict the sample to workers in establishments paying, on average, at least 10% above
their collective bargaining agreement in the year 2000. The third column restricts the sample
to establishments that are neither subject to a collective bargaining agreement nor report to
benchmark their pay structure with one.
Finally, we offer results for various specifications where we include higher-dimensional
fixed effects. In particular, we extend the baseline specification (2) in three different ways.
16
AllAbove
Team-MedianBelow
Team-Median2nd Pct. 4th Pct. 7th Pct. 9th Pct.
Bin 2 0.00036∗∗∗ 0.0010∗∗ 0.00028∗∗∗ 0.00062∗ -0.00016 0.0016∗∗∗ 0.00016(0.000085) (0.00036) (0.000071) (0.00030) (0.00044) (0.00036) (0.00013)
Bin 3 0.00018∗∗ 0.0013∗∗ 0.00015∗∗ 0.00012 0.00054 0.00038∗ 0.000060(0.000067) (0.00049) (0.000058) (0.00030) (0.00035) (0.00018) (0.00016)
Bin 4 0.000075 0.0011∗∗ 0.00011 0.00029 0.00050 0.00067∗∗ -0.000041(0.000064) (0.00040) (0.000062) (0.00025) (0.00034) (0.00023) (0.00011)
Bin 5 0.000088 0.00057 0.00021∗∗∗ 0.00055∗ 0.00028 0.00051∗ 0.00017(0.000066) (0.00039) (0.000060) (0.00025) (0.00032) (0.00023) (0.00010)
Bin 6 0.00026∗∗∗ 0.00087∗ 0.00044∗∗∗ 0.00063∗∗ 0.00041 0.00088∗∗∗ 0.00028∗∗
(0.000054) (0.00037) (0.000058) (0.00023) (0.00034) (0.00022) (0.00011)
Bin 7 0.00049∗∗∗ 0.0011∗∗ 0.00058∗∗∗ 0.00093∗∗∗ 0.00078∗ 0.0010∗∗∗ 0.00068∗∗∗
(0.000066) (0.00038) (0.000074) (0.00024) (0.00033) (0.00020) (0.000098)
Bin 8 0.00068∗∗∗ 0.0013∗∗∗ 0.00075∗∗∗ 0.00090∗∗∗ 0.00060 0.0012∗∗∗ 0.00098∗∗∗
(0.000068) (0.00037) (0.00012) (0.00024) (0.00034) (0.00021) (0.00010)
Bin 9 0.00073∗∗∗ 0.0014∗∗∗ 0.00044∗∗∗ 0.0011∗∗∗ 0.0010∗ 0.0012∗∗∗ 0.0013∗∗∗
(0.000084) (0.00038) (0.00011) (0.00025) (0.00045) (0.00021) (0.00011)
Bin 10 0.00066∗∗∗ 0.0014∗∗∗ 0.00033∗ 0.00060∗ 0.0013∗∗∗ 0.0016∗∗∗ 0.0079∗
(0.000098) (0.00038) (0.00014) (0.00029) (0.00038) (0.00020) (0.0037)
Bin 11 0.0016∗∗∗ 0.0024∗∗∗ 0.00020 0.0017∗∗∗ 0.0015∗∗∗ 0.0017∗∗∗ 0(0.00010) (0.00040) (0.00016) (0.00029) (0.00035) (0.00025) (.)
Within R2 0.76 0.71 0.78 0.091 0.048 0.061 0.17Observations 2537838 1307463 1230371 235351 245260 244998 244236
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Each row reports the coefficient on the weight of bins 2 through 11 where the weight on thebottom bin is the omitted category. Columns 2 and 3 report the results when the sample is restrictedto workers above (below) the team median wage. The remaining columns restrict the sample to workersfrom particular parts of the wage distribution. Team definition 2. Column titles indicate horizon h.Standard errors clustered at the establishment level. The regressions include current wage and fixedeffects for age decile, tenure decile, gender, education, occupation, and year.
Table C.III Results from specification (3) under team definition 2 for various restrictedsamples.
First, we include additional fixed effects for establishment (establishment x occupation in the
second team definition). In a second specification, we include additional team fixed effects.
In a third specification, we include occupation x year fixed effects.
In doing so, we replace the separate right hand side variables w+ and w− in specification
(2) with the gap w+ − w−. The reason is the following: One’s own wage, those of higher
paid teammates, and those of lower paid teammates are approximately (but not exactly)
collinear with a team fixed effect. If we include a team fixed effect, we cannot recover both
of β+ and β−, but we can recover their difference. The results are reported in table C.VI.
We complement this with an additional set of results that add, to the baseline specification
17
BaselineTeams w/oApprentices
Teams w/oTop CodedWages
Before2005
After2004
w+ 0.16∗∗∗ 0.16∗∗∗ 0.12∗∗∗ 0.17∗∗∗ 0.16∗∗∗
(0.015) (0.015) (0.0084) (0.017) (0.015)
w− 0.041∗∗∗ 0.054∗∗∗ 0.056∗∗∗ 0.051∗∗∗ 0.028∗∗∗
(0.0081) (0.0069) (0.0069) (0.0096) (0.0093)
Within R2 0.76 0.76 0.73 0.76 0.77Observations 2617097 1667028 1319836 1825318 791774
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2). Team definition 2. Column (1): Baseline. Column(2): Sample restricted to teams without workers in apprenticeship. Column (3): Sample restricted toteams without top-coded wages. Columns(4) and (5) restrict split sample by years. The regressionsinclude current wage and fixed effects for age decile, tenure decile, gender, education, occupation, andyear (whenever possible).
Table C.IV Subsamples. Team Definition 2.
All 2000 >10% CBNo CB, No
Benchmarking
w+ 0.14∗∗∗ 0.14∗∗∗ 0.11∗
(0.021) (0.024) (0.051)
w− 0.044∗∗∗ 0.047∗∗∗ 0.027(0.013) (0.014) (0.027)
Within R2 0.75 0.75 0.67Observations 336073 275890 10591
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2). Team definition 2. Column (1): Benchmarkresults for year 2000 at horizon h = 3 years. Column (2): Restrict sample to establishments whichreport to pay at least 10% above their collective bargaining agreement. Column (3): Restrict sample toestablishments which neither have a collective bargaining agreement nor benchmark their wage structurewith one. The regressions include current wage and fixed effects for age decile, tenure decile, gender,education, occupation, and year.
Table C.V Collective Bargaining. Team Definition 2.
18
Baseline Est x Occ FE Team Occ x Yr
w+ − w− 0.057∗∗∗ 0.087∗∗∗ 0.12∗∗∗ 0.060∗∗∗
(0.0079) (0.010) (0.012) (0.0079)
Within R2 0.76 0.47 0.51 0.76Observations 2617097 2614229 2585060 2617037
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: We replace the separate right hand side variables w+ and w− in specification (2) with thegap w+ − w−. Team definition 2, horizon h = 3. Column (1): Baseline. Column (2): Baseline plusestablishment x occupation fixed effects. Column (3): Baseline plus establishment x occupation x yearfixed effects. Column (4): Baseline plus occupation x year fixed effects. The regressions include currentwage and fixed effects for age decile, tenure decile, gender, education, occupation, and year.
Table C.VI Fixed Effects.
(2), 5 lags of an individual’s log wage, wi. We view this as a way of informing a worker fixed
effect in a backward-looking way that is not affected by her contemporaneous peers. The
results, albeit smaller, line up closely with our baseline findings. We present the results in
table C.VII.
Table C.VIII contains results for specification 2 when we control for establishment level
growth in two different ways. Our first measure of establishment growth is the annual growth
rate of the number of full time employees in our dataset. The second is the annual growth
rate of the wage bill of the full time employed. We then add to the otherwise unchanged
specification (2) three linear controls, namely the respective growth measure between t − 2
and t− 1, t− 1 and t, and t and t+ 1. The main takeaway is that the results hardly change
when we add these controls.
19
Horizon in Years
1 2 3 5
w+ 0.045∗∗∗ 0.064∗∗∗ 0.093∗∗∗ 0.10∗∗∗
(0.0062) (0.0096) (0.013) (0.018)
w− 0.016∗∗∗ 0.025∗∗∗ 0.038∗∗∗ 0.057∗∗∗
(0.0043) (0.0069) (0.0091) (0.013)
Within R2 0.90 0.85 0.81 0.73Observations 1075048 872091 663704 291095
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Baseline specification (2) with 5 additional lags of log wage.
Table C.VII Additional Lags.
Wage Bill Growth Controls
Horizon in Years 1 2 3 5
w+ 0.090∗∗∗ 0.12∗∗∗ 0.17∗∗∗ 0.24∗∗∗
(0.0076) (0.011) (0.015) (0.021)
w− 0.028∗∗∗ 0.033∗∗∗ 0.042∗∗∗ 0.060∗∗∗
(0.0056) (0.0069) (0.0082) (0.011)
Within R2 0.88 0.82 0.77 0.67
Employment Growth Controls
w+ 0.090∗∗∗ 0.12∗∗∗ 0.17∗∗∗ 0.24∗∗∗
(0.0077) (0.011) (0.015) (0.021)
w− 0.028∗∗∗ 0.033∗∗∗ 0.042∗∗∗ 0.060∗∗∗
(0.0057) (0.0070) (0.0083) (0.011)
Within R2 0.88 0.82 0.77 0.67Observations 2313583 2186058 1865434 1244737
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Additional Controls: Growth in total wage bill (or number) of the full time employed at theestablishment between t− 2 and t− 1, t− 1 and t, and t and t+ 1.
Table C.VIII Counterpart to table II with controls for establishment level wage bill (oremployment) growth. Team Definition 2.
20
Supplementary Appendix - Not for Publication
D Dataset Construction
Construction of Basic Annual Panel
The basic dataset comes in spell format where a spell can correspond either to employment
or to a period of benefit receipt. We implement the publicly available code by Eberle,
Schmucker and Seth (2013) to convert the spells into monthly cross sections which we then
merge into a monthly panel covering 1993-2010.39 For each spell that runs through an entire
calendar year we see one observation per variable (occupation, employment status, average
hourly wage, type of benefit receipt,...) per year. For all other spells, we see one observation
per variable per spell.
Our main analysis is carried out on an annual panel and we select the spell overlapping
January 31st of a given year as the observation for the year. This implies that the peer
groups we study are the full workforce of the sample establishments on January 31st of each
year from 1999 to 2009.
When assigning a wage observation we assign the daily wage during spells of full time
employment unless otherwise noted.40 As a consequence, we ignore information on earnings
from part time employment and construct peer groups only from full time employees for full
time employees. The reason is that, while we observe a part time flag, we do not have good
information on hours which blurs the mapping between daily earnings—as reported to the
social security administration—and the wage for non full time employees.
Construction of Regression Sample for Section 2
Team Construction We identify all workers that work at one of the sample establishments
and construct teams with minimum size 2 as the collection of workers employed full time
subject to social security during the reference spell for a given calendar year. We exclude
39Download link accessed under http://doku.iab.de/fdz/reporte/2013/MR 04-13 EN.pdf.40We follow the routine in the aforementioned code to select a main employment spell in case individuals
hold several jobs.
1
workers in vocational training and interns. Thus, our reduced-form exploration projects log
wages for individual i working full time subject to social security in year t+ n on the wages
of her full-time coworkers in year t if she worked in one of our sample establishments in year
t.
Wages We drop the bottom percentile and the top .1 percentile of the annual wage obser-
vations and, since our approach requires information on wages, we only use information on
full time workers who work subject to social security.41 We flag observations as top-coded
due to the social security ceiling when they fall into one of the two masspoints which are
easily identified in the wage distribution in a given year. We further omit any observations
from the regressions in section 2 where wage growth over the corresponding horizon h falls
into the top or bottom tenths of percentile of the pooled sample. To construct real wages
we deflate using a CPI.
Mass Layoffs and Job Loss To identify a mass layoff event at an establishment, we
use information from the IAB establishment panel, which is the annual survey from which
the panel cases in our dataset are sampled. In particular, we identify a mass layoff event
if the following is true: The establishment reduces full time employment by at least 25%
since two years prior, still has a strictly positive number of full time employees, had more
than 25 employees two years prior, did not build up employment by more than 30% between
three and two years prior, does not rebuild to more than 90% of employment two years prior
within the next year, and was surveyed each years from three years prior to one year past.42
We register a job loss for individual i in year t if there is at least one instance where she
is employed subject to social security at the end of a month but not anymore at the end of
the following month.43 We register a job loss in the context of a mass layoff event in year t
if we register a job loss at the individual level during year t and a mass layoff event at her
41The reason for the asymmetry is that almost no wage observations lie above the social security ceiling.A small number of observations have a wage that is above the ceiling which the data provider suspects mayeither be due to bonuses or actually incorrect. Dropping the top .1 percentile each year eliminates any suchobservations.
42These criteria closely follow Davis and von Wachter (2011).43We do so when we actually have information on that worker for the following month, that is when we
see her receiving benefits. If we do not have any information for the subsequent month we only register jobloss if the worker disappears for more than two months from the dataset.
2
ascribed establishment, that is the one she works at during the spell overlapping January
31st of that year, during the same year t.
Construction of Estimation Sample for Section 4
To construct the sample for the structural estimation in section 4, we build on the same
annual worker panel underlying our reduced-form work.
We largely mimic the construction of the sample used in the reduced-form part. We
likewise drop the bottom percentile and the top .1 percentile of wage observations and
only use information on full time workers who work subject to social security. As in the
reduced-form, we assign the workforce on January 31 during any year to the establishment
in that year. We keep the same panel case establishments as in the reduced-form and use
only the years 1999-2009 since only in those years do we observe the full workforce of the
establishments.
E Tables for Team Definition 1
This subsection reports all empirical results from the main body of the paper when we define
a peer group to be all workers at an establishment.
3
Decile of the Wage Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.19∗∗∗ 0.14∗∗∗ 0.16∗∗∗ 0.16∗∗∗ 0.15∗∗∗ 0.15∗∗∗ 0.13∗∗∗ 0.100∗∗ 0.43∗∗∗ 0.32∗∗∗
(0.010) (0.013) (0.017) (0.021) (0.026) (0.031) (0.038) (0.031) (0.020) (0.074)
w− 0.060∗∗∗ 0.076∗∗∗ 0.083∗∗∗ 0.096∗∗∗ 0.092∗∗∗ 0.094∗∗∗ 0.11∗∗∗ 0.094∗∗∗ 0.060∗∗∗ 0.011∗
(0.011) (0.012) (0.013) (0.016) (0.017) (0.018) (0.017) (0.012) (0.011) (0.0050)
Within R2 0.42 0.091 0.058 0.045 0.043 0.046 0.058 0.087 0.21 0.057Observations 287759 303889 306947 306718 304500 303378 303475 303665 315004 296800
Decile of the Age Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.31∗∗∗ 0.27∗∗∗ 0.17∗∗∗ 0.14∗∗∗ 0.12∗∗∗ 0.10∗∗∗ 0.084∗∗∗ 0.072∗∗∗ 0.051∗∗∗ 0.038∗∗∗
(0.018) (0.020) (0.015) (0.013) (0.012) (0.012) (0.010) (0.0099) (0.0095) (0.0092)
w− 0.055∗∗ 0.074∗∗∗ 0.056∗∗∗ 0.053∗∗∗ 0.049∗∗∗ 0.056∗∗∗ 0.063∗∗∗ 0.070∗∗∗ 0.070∗∗∗ 0.069∗∗∗
(0.019) (0.013) (0.0098) (0.0090) (0.0085) (0.0083) (0.0081) (0.0077) (0.0076) (0.0061)
Within R2 0.61 0.70 0.75 0.78 0.80 0.80 0.81 0.83 0.83 0.84Observations 348673 285365 286022 336003 354897 232921 333851 309471 265648 279320
Decile of the Tenure Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.23∗∗∗ 0.24∗∗∗ 0.20∗∗∗ 0.18∗∗∗ 0.14∗∗∗ 0.094∗∗∗ 0.099∗∗∗ 0.084∗∗∗ 0.070∗∗∗ 0.086∗∗∗
(0.019) (0.020) (0.021) (0.016) (0.013) (0.013) (0.014) (0.014) (0.018) (0.024)
w− 0.056∗∗∗ 0.066∗∗∗ 0.066∗∗∗ 0.068∗∗∗ 0.057∗∗∗ 0.067∗∗∗ 0.073∗∗∗ 0.082∗∗∗ 0.050∗∗∗ 0.053∗∗∗
(0.012) (0.0097) (0.0093) (0.0089) (0.0085) (0.0088) (0.010) (0.011) (0.015) (0.016)
Within R2 0.67 0.74 0.78 0.80 0.80 0.81 0.80 0.80 0.75 0.76Observations 302698 296147 303727 304230 305582 307905 307934 306307 301182 296473
Decile of the Size Distribution
1 2 3 4 5 6 7 8 9 10
w+ 0.068∗∗∗ 0.098∗∗∗ 0.13∗∗∗ 0.14∗∗∗ 0.11∗∗∗ 0.14∗∗∗ 0.19∗∗∗ 0.20∗∗∗ 0.24∗∗ 0.63∗∗
(0.0069) (0.014) (0.017) (0.022) (0.025) (0.030) (0.037) (0.037) (0.066) (0.12)
w− 0.030∗∗∗ 0.048∗∗∗ 0.043∗∗∗ 0.026 0.0033 0.017 0.011 -0.053 0.026 0.018(0.0052) (0.0088) (0.012) (0.014) (0.018) (0.020) (0.026) (0.041) (0.053) (0.017)
Within R2 0.76 0.75 0.72 0.70 0.70 0.71 0.69 0.62 0.61 0.69Observations 292073 299230 301358 309821 310036 311021 312640 312095 312825 271044
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2) for separate deciles of the wage, age, tenure, andteam size distributions. We include observation i in the decile k in t if i falls into the k’th decile ofthe distribution in year t. Team definition 1 at horizon h = 3 years. Standard errors clustered at theestablishment level. The regressions include current wage and fixed effects for age decile, tenure decile,gender, education, occupation, and year (whenever possible).
Table E.I Baseline results for different deciles of the wage, age, tenure, and team sizedistribution. Team Definition 1. Counterpart to table IV.
4
Panel A: All Switchers
Horizon in Years 1 2 3 5 10
w+ 0.080∗∗∗ 0.15∗∗∗ 0.18∗∗∗ 0.23∗∗∗ 0.35∗∗∗
(0.019) (0.019) (0.021) (0.022) (0.028)
w− -0.033∗∗ -0.0037 0.0065 0.029 0.052∗
(0.011) (0.010) (0.012) (0.015) (0.024)
Within R2 0.61 0.56 0.51 0.42 0.28Observations 236844 275172 245570 193710 52114
Panel B: Switchers with Nonemployment Spell
Horizon in Years 1 2 3 5 10
w+ 0.092∗∗∗ 0.17∗∗∗ 0.19∗∗∗ 0.22∗∗∗ 0.33∗∗∗
(0.018) (0.015) (0.016) (0.018) (0.026)
w− -0.038∗ 0.052∗∗∗ 0.056∗∗∗ 0.049∗ 0.062∗
(0.019) (0.015) (0.017) (0.019) (0.031)
Within R2 0.39 0.47 0.41 0.32 0.21Observations 26822 87818 83827 70084 19542
Panel C: Switchers, Mass Layoff Event
Horizon in Years 1 2 3 5 10
w+ 0.15∗∗ 0.15∗∗∗ 0.17∗∗∗ 0.17∗∗∗ 0.35∗∗∗
(0.053) (0.035) (0.036) (0.043) (0.073)
w− 0.078 0.10∗ 0.042 0.062 0.091(0.061) (0.043) (0.049) (0.054) (0.087)
Within R2 0.37 0.36 0.31 0.23 0.16Observations 2871 6617 6855 6139 1895
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2) on a sample of establishment switchers. TeamDefinition 1. Column titles indicate horizon h. Standard errors clustered at the establishment level.The regressions include current wage and fixed effects for age decile, tenure decile, gender, education,occupation, and year.
Table E.II Establishment switchers. Counterpart to table V.
5
Horizon in Years 1 2 3 5
Specification (C.1)
w 0.087∗∗∗ 0.13∗∗∗ 0.17∗∗∗ 0.23∗∗∗
(0.0029) (0.0044) (0.0059) (0.0087)
Specification (C.2) - OLS
D.w 1.18∗∗∗ 1.00∗∗∗ 1.03∗∗∗ 1.08∗∗∗
(0.018) (0.015) (0.015) (0.015)
Specification (C.2) - First IV
D.w 0.070 0.32 0.11 0.25(0.33) (0.32) (0.25) (0.31)
C-D F-Stat 2781.2 2668.8 2280.3 1623.1K-P F-Stat 75.2 77.3 73.4 65.6Observations 2591738 2205499 1874646 1307635
Specification (C.2) - Second IV
D.w 0.15 0.53 0.16 -0.24(0.52) (0.48) (0.41) (0.58)
C-D F-Stat 1277.2 1134.5 859.3 476.8K-P F-Stat 35.7 33.4 27.9 20.7Observations 2362091 2016292 1717940 1201808
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Standard errors clustered at the establishment level. Cragg-Donald and Kleibergen-Paap Fstatistics.
Table E.III IV Results. Team Definition 1. Counterpart to table C.I.
6
Horizon in Years
1 2 3 5 10
Bin 2 0.000038 0.00018∗∗∗ 0.00028∗∗∗ 0.00045∗∗∗ 0.00068∗∗∗
(0.000043) (0.000050) (0.000072) (0.00010) (0.00016)
Bin 3 0.00018∗∗∗ 0.00033∗∗∗ 0.00050∗∗∗ 0.00076∗∗∗ 0.0011∗∗∗
(0.000029) (0.000045) (0.000058) (0.000089) (0.00014)
Bin 4 0.00015∗∗∗ 0.00030∗∗∗ 0.00046∗∗∗ 0.00066∗∗∗ 0.00093∗∗∗
(0.000022) (0.000034) (0.000048) (0.000072) (0.00014)
Bin 5 0.000025 0.00014∗∗ 0.00029∗∗∗ 0.00055∗∗∗ 0.0011∗∗∗
(0.000031) (0.000048) (0.000066) (0.00010) (0.00017)
Bin 6 0.000066∗∗ 0.00020∗∗∗ 0.00035∗∗∗ 0.00056∗∗∗ 0.00073∗∗∗
(0.000022) (0.000035) (0.000048) (0.000078) (0.00014)
Bin 7 0.00029∗∗∗ 0.00041∗∗∗ 0.00056∗∗∗ 0.00080∗∗∗ 0.0012∗∗∗
(0.000029) (0.000043) (0.000060) (0.000095) (0.00017)
Bin 8 0.00040∗∗∗ 0.00058∗∗∗ 0.00078∗∗∗ 0.0011∗∗∗ 0.0015∗∗∗
(0.000034) (0.000044) (0.000057) (0.000089) (0.00018)
Bin 9 0.00045∗∗∗ 0.00067∗∗∗ 0.00092∗∗∗ 0.0013∗∗∗ 0.0017∗∗∗
(0.000041) (0.000061) (0.000076) (0.00010) (0.00019)
Bin 10 0.00050∗∗∗ 0.00075∗∗∗ 0.00100∗∗∗ 0.0014∗∗∗ 0.0018∗∗∗
(0.000049) (0.000071) (0.000092) (0.00012) (0.00022)
Bin 11 0.00075∗∗∗ 0.0011∗∗∗ 0.0015∗∗∗ 0.0022∗∗∗ 0.0031∗∗∗
(0.000034) (0.000054) (0.000075) (0.00012) (0.00019)
Within R2 0.89 0.82 0.77 0.68 0.48Observations 4061699 3553684 3057261 2215146 518545
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Each row reports the coefficient on the weight of bins 2 through 11 where the weight on thebottom bin is the omitted category. Each column corresponds to one line in figure 1. Team definition 1.Column titles indicate horizon h. Standard errors clustered at the establishment level. The regressionsinclude current wage and fixed effects for age decile, tenure decile, gender, education, occupation, andyear.
Table E.IV Results from specification (3) under team definition 1. Counterpart to tableC.II.
7
AllAbove
Team-MedianBelow
Team-Median2nd Pct. 4th Pct. 7th Pct. 9th Pct.
Bin 2 0.00028∗∗∗ 0.00038 0.00029∗∗∗ 0.00055∗∗ 0.000011 0.00090∗∗∗ 0.00019(0.000072) (0.00086) (0.000063) (0.00020) (0.00027) (0.00018) (0.00010)
Bin 3 0.00050∗∗∗ 0.0027∗∗ 0.00040∗∗∗ 0.00016 0.00042∗ 0.00048∗∗ 0.00032∗
(0.000058) (0.00095) (0.000052) (0.00023) (0.00021) (0.00016) (0.00013)
Bin 4 0.00046∗∗∗ 0.0021∗∗ 0.00032∗∗∗ 0.00020 0.00080∗∗∗ 0.00084∗∗∗ 0.00027(0.000048) (0.00080) (0.000047) (0.00016) (0.00021) (0.00018) (0.00016)
Bin 5 0.00029∗∗∗ 0.0017∗ 0.00032∗∗∗ 0.00052∗∗ 0.00040 0.00029 0.00062∗∗
(0.000066) (0.00075) (0.000086) (0.00016) (0.00021) (0.00023) (0.00023)
Bin 6 0.00035∗∗∗ 0.0018∗ 0.00036∗∗∗ 0.00053∗∗∗ 0.00025 0.0012∗∗∗ 0.00016(0.000048) (0.00074) (0.000046) (0.00016) (0.00021) (0.00023) (0.00012)
Bin 7 0.00056∗∗∗ 0.0020∗∗ 0.00058∗∗∗ 0.0011∗∗∗ 0.00097∗∗∗ 0.00086∗∗∗ 0.00049∗∗∗
(0.000060) (0.00075) (0.000066) (0.00015) (0.00024) (0.00018) (0.00010)
Bin 8 0.00078∗∗∗ 0.0022∗∗ 0.0011∗∗∗ 0.00088∗∗∗ 0.00055 0.0011∗∗∗ 0.0013∗∗∗
(0.000057) (0.00074) (0.000089) (0.00018) (0.00032) (0.00019) (0.00011)
Bin 9 0.00092∗∗∗ 0.0024∗∗ 0.0013∗∗∗ 0.00093∗∗∗ 0.0012∗ 0.0013∗∗∗ 0.0020∗∗∗
(0.000076) (0.00075) (0.00014) (0.00019) (0.00055) (0.00017) (0.00012)
Bin 10 0.00100∗∗∗ 0.0026∗∗∗ 0.0011∗∗∗ 0.00092∗∗∗ 0.0011∗∗∗ 0.0015∗∗∗ 0.014(0.000092) (0.00076) (0.00017) (0.00027) (0.00030) (0.00021) (0.0083)
Bin 11 0.0015∗∗∗ 0.0030∗∗∗ 0.0014∗∗∗ 0.0016∗∗∗ 0.0015∗∗∗ 0.0016∗∗∗ 0(0.000075) (0.00075) (0.00020) (0.00015) (0.00020) (0.00020) (.)
Within R2 0.77 0.72 0.79 0.10 0.053 0.064 0.20Observations 3057261 1479286 1577972 300910 305370 302901 314668
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Each row reports the coefficient on the weight of bins 2 through 11 where the weight on thebottom bin is the omitted category. Columns 2 and 3 report the results when the sample is restrictedto workers above (below) the team median wage. The remaining columns restrict the sample to workersfrom particular parts of the wage distribution. Team definition 1. Column titles indicate horizon h.Standard errors clustered at the establishment level. The regressions include current wage and fixedeffects for age decile, tenure decile, gender, education, occupation, and year.
Table E.V Results from specification (3) under team definition 1 for various restrictedsamples. Counterpart to table C.III.
8
BaselineTeams w/oApprentices
Teams w/oTop Coded Wages
Before2005
After2004
w+ 0.17∗∗∗ 0.065∗∗∗ 0.085∗∗∗ 0.17∗∗∗ 0.17∗∗∗
(0.012) (0.018) (0.0099) (0.014) (0.014)
w− 0.057∗∗∗ 0.048∗∗∗ 0.064∗∗∗ 0.064∗∗∗ 0.049∗∗∗
(0.0081) (0.0095) (0.0075) (0.0096) (0.0088)
Within R2 0.77 0.75 0.73 0.77 0.78Observations 3032228 304525 298771 2107976 924249
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β+ and β− as estimated from specification (2). Team definition 2. Column (1): Baseline. Column(2): Sample restricted to teams without workers in apprenticeship. Column (3): Sample restricted toteams without top-coded wages. Columns(4) and (5) restrict split sample by years. The regressionsinclude current wage and fixed effects for age decile, tenure decile, gender, education, occupation, andyear (whenever possible).
Table E.VI Subsamples. Team Definition 1. Counterpart to table C.IV.
All 2000 >10% CBNo CB, No
Benchmarking
w+ 0.13∗∗∗ 0.13∗∗∗ 0.096∗
(0.018) (0.020) (0.043)
w− 0.033∗∗ 0.047∗∗∗ 0.024(0.012) (0.013) (0.023)
Within R2 0.76 0.76 0.70Observations 389140 318634 14251
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: β and β− as estimated from specification (2). Team definition 1. Column (1): Benchmarkresults for year 2000 at horizon h = 3 years. Column (2): Restrict sample to establishments whichreport to pay at least 10% above their collective bargaining agreement. Column (3): Restrict sample toestablishments which neither have a collective bargaining agreement nor benchmark their wage structurewith one. The regressions include current wage and fixed effects for age decile, tenure decile, gender,education, occupation, and year.
Table E.VII Collective Bargaining. Team Definition 1. Counterpart to table C.V.
9
Baseline Est FE Team Occ x Yr
w+ − w− 0.045∗∗∗ 0.11∗∗∗ 0.10∗∗∗ 0.047∗∗∗
(0.0073) (0.0082) (0.0088) (0.0074)
Within R2 0.77 0.56 0.58 0.77Observations 3032228 3031943 3029247 3032165
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: We replace the separate right hand side variables w+ and w− in specification (2) with thegap w+ − w−. Team definition 1, horizon h = 3. Column (1): Baseline. Column (2): Baseline plusestablishment fixed effects. Column (3): Baseline plus establishment x year fixed effects. Column (4):Baseline plus occupation x year fixed effects. The regressions include current wage and fixed effects forage decile, tenure decile, gender, education, occupation, and year.
Table E.VIII Fixed Effects. Team Definition 1. Counterpart to table C.VI.
Horizon in Years
1 2 3 5
w+ 0.060∗∗∗ 0.094∗∗∗ 0.12∗∗∗ 0.16∗∗∗
(0.0058) (0.010) (0.014) (0.019)
w− 0.024∗∗∗ 0.039∗∗∗ 0.060∗∗∗ 0.079∗∗∗
(0.0046) (0.0072) (0.0096) (0.014)
Within R2 0.91 0.87 0.82 0.74Observations 1248289 1011117 767326 335135
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Baseline specification (2) with 5 additional lags of log wage.
Table E.IX Additional Lags. Team Definition 1. Counterpart to table C.VII.
10
Wage Bill Growth Controls
Horizon in Years 1 2 3 5
w+ 0.089∗∗∗ 0.13∗∗∗ 0.17∗∗∗ 0.25∗∗∗
(0.0070) (0.0098) (0.013) (0.018)
w− 0.032∗∗∗ 0.043∗∗∗ 0.061∗∗∗ 0.087∗∗∗
(0.0048) (0.0063) (0.0084) (0.012)
Within R2 0.89 0.83 0.78 0.69
Employment Growth Controls
w+ 0.089∗∗∗ 0.13∗∗∗ 0.18∗∗∗ 0.25∗∗∗
(0.0070) (0.0099) (0.013) (0.018)
w− 0.033∗∗∗ 0.043∗∗∗ 0.061∗∗∗ 0.088∗∗∗
(0.0050) (0.0065) (0.0085) (0.012)
Within R2 0.89 0.83 0.78 0.69Observations 2685544 2534701 2157995 1435300
Standard errors in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Notes: Additional Controls: Growth in total wage bill (or number) of the full time employed at theestablishment between t− 2 and t− 1, t− 1 and t, and t and t+ 1.
Table E.X Additional controls for establishment level wage bill (or employment) growth.Team Definition 1. Counterpart to table C.VIII.
11