income determinants: an empirical analysis · this idea that testing confounding variables gives us...
TRANSCRIPT
Income Determinants: An Empirical Analysis
Ryan T. Anzalone Wabash College
May 2014
Abstract: How do different personal factors affect the wage you will earn as a working
adult? This is a question that many have asked, but I will look deeper into this question to see
how things like age, educational attainment, gender, and length of time as a Unites States
resident affect a person’s predicted wage. This paper uses data from IPUMS USA to test the
effects of these variables. After regression, I found that age effects income with a quadratic
model, while educational attainment increases predicted income with an exponential model.
Table of Contents
I. Introduction…................…………………………………….1
II. Literature Review………………………………...………….2
III. Empirical Analysis……....…………………………...……….3
Data………………………........………………..3
Results….………………………………………..5
IV. Conclusions..........…………..…………………………………9
V. Tables……………………………….………………………..11
VI. Bibliography……………..…………….…….……………….13
1
I. Introduction
It is clearly evident that income isn’t randomly assigned to people, and that it is earned
through hard work along with many other factors. I would like to see what those other factors
are, and how significantly they affect income. It is clear that there is no be-all-end-all factor
which guarantees high income, but we can still look for positive or negative correlations in large
data sets for variables like: urban/rural status, age, gender, age at first marriage, times married,
race, education levels, public/private school education, what industry you work in, veteran status,
and how many years you’ve lived in the US.
I think that age, educational attainment, and public/private school education will have
significant influence on predicted wage. I believe there will be higher predicted wages for people
who are older because they likely have more experience and have higher paying jobs. I think
people with more education will also earn more because they can work in more complex jobs
which usually pay higher wages. I also think that private school education is superior to public
school education, and that this education gap will translate to an income gap. I will run
regressions and report the findings to see if these predictions are correct.
There has been extensive research conducted on the correlations between certain variable
factors and their effects on income. What I hope to achieve with this paper is to test the findings
of some and expand upon their research. I doubt I am the first person to look at these correlations
and run these regressions, but most of these variables have not been published on the matter.
Gender, education, and family situations all have well tested correlations, but what about things
2
like number of times married, or the number of years a person has lived in the US? Before
conducting any regressions, I believe there will be meaningful results in these data sets which
will shed some light on the topic of income and its factors. According the Lewis Solmon, there
are “statistically significant differences in the predicted earnings of graduates from the various
types of colleges” (Solmon, 75). I believe there will be statistically significant differences in the
predicted earnings for many other factors as well.
II. Literature Review
An article in the Public Administration Review, Unequal Pay: The Role of Gender,
explores the findings of Mohamad Alkadry on the difference between pay of males and females.
He starts out by acknowledging that there may have been confounding variables involved in the
common perception that men make more than women. Maybe women only seem to make less
because popular jobs among women may be lower paying jobs in general, regardless of gender.
“A 2003 study by the General Accounting Office found that women earned 79.9 percent of what
men earned” (Alkadry, pg 888). Alkadry thought that there could be a more refined study done
which looked at the pay of individuals who hold comparable positions at comparable agencies.
He addresses any question about confounding variables with his current study. The table below
was taken directly from his published paper. Note that this table only shows the pay disparities in
Procurement positions and that there may be different data in different types of positions. This
table shows that, even when many confounding variables are tested for, there is still a pay
disparity between men and women. It also found that some jobs (Senior Buyers in Procurement
Positions in this case) don’t have significant pay gaps at all.
3
This idea that testing confounding variables gives us a clearer picture of a research question
is the main driver behind my empirical paper. I’m going to look at many different factors for
income disparities, and try to find some omitted variables which will clear up my regressions.
III. Empirical Analysis
The data set that will be used in this empirical paper was obtained via ipums.org and the
2012 Census. The variables I chose to include are listed below.
4
This is a sample of 1,429,901 people in a 2012 census done in the United States. The
minimum age of any person in this sample is 18, and the max top-coded at 95, meaning that
anyone over the age of 95 is just listed at 95 to keep anonymity. The summary statistics for the
variable age can be found in Table 1.
The variable marrno represents each person’s number of marriages, which range from 0
to 3. Although it is not specifically stated, it is safe to assume that this variable is top-coded in a
similar way to age, so that even people with more than 3 marriages will only be listed at 3. The
summary statistics for the variable marrno can be found in Table 1.
In the process of breaking down these variables, I found it would be useful to create some
new dummy variables to better represent the race variable. Using the Stata command gen, I was
able to create a new variable with values 0 and 1. This variables, called white, can be seen, along
with its summary statistics in Table 1.
The variable YRSUSA1 represents the number of years each person has lived in the
United States. Summary statistics can be found in Table 1. The variable educ represents the level
of education each person has completed. Some notable values for educ are Grade 9 (educ=3),
Grade 12 (educ=6), Bachelor’s Degree (educ=10) and any additional education past the
undergraduate level (educ=11). The summary statistics can be found in Table 1.
There are some limitations with this data due to missing information and non-refined
variables. The ipums census had a variable available called schltype which I thought would be
the type of school the person attended, but it turned out to only survey for current enrollment, so
I dropped it from the data set. Because this was the only available ipums data on school type, I
can’t look at the effects of different types of school on income.
5
I would like to start my regressions with the following model of predicted incwage.
Model 1
Predicted incwage = β0 + β1(age)
From this regression’s output (Table 3), we can see that as age increases by 1, predicted
wage also increases by $754.91. When first looking at this regression, the results seem logical,
but is it possible that there are omitted variables which are biasing our results? Other variables
with strong correlations to age can definitely have a negative impact on the accuracy of the
above regression. To find these correlations, I will regress age on different variables. One such
output can be found in Table 2.
An increase in the number of times a person has been married increases their expected
age by 10.21 years. That’s a strong correlation and it is apparent that there is going to be bias
involved with our initial regression. A new regression which adds Times married follows this
model:
Model 2
Predicted incwage = β0 + β1(age) + β2(times married)
As you can see in Table 1, the coefficient of age to predicted incwage went down pretty
substantially. Instead of an extra year increases predicted wage by $755, it now only increases
predicted wage by $632.90. A completely unrestricted regression output is on the next page, and
it will take the form:
6
Model 3
Predicted incwage = β0 + β1(age) + β2(male) + β3(marrno) + β4(educ) + β5(YRSUSA1) +
β6(white)
This regression has some interesting results (Table 3). The coefficient of 509.48 on age
means that for every additional year a person is alive, their predicted wage goes up by $509.48.
Additionally, being male increases predicted wage by $20,050.74, being married multiple times
also increases predicted wage. Other variables which increases predicted wage are educ, and
YRSUSA1.
In order to see the relationships between these variables, I asked Stata to create some new
variables for me. The Stata command egen mInc = mean(incwage), by(age) created a new
variable called mInc which stores the mean value of incwage at each age. Plotting this new
variable by age gives us the following:
7
The parabolic shape of this curve suggests that we may want to consider age2 as a
possible variable to regress on incwage. The Stata regression of this new variable agesq = age2
and age on incwage follows this form:
Model 4
Predicted incwage = β0 + β1(agesq) + β2(age)
So this new model holds the form:
Predicted incwage = -51.66*(age2) + 5161.63(age) - 72355.18
The above graph is the predicted incwage values (blue dots) plotted against the mean
incwage values at each age. For everyone under the age of 75, this is almost a perfect fit. To
reduce the error in this data, I’m going to drop all the data points of ages above 75.
8
.
We also need to look at how educational attainment influences predicted incwage. Below
is a scatter plot of mean income at each level of education.
It seems to follow an exponential curve. Creating a variable to represent 2^educ is the
next step. A regression of this new variable on incwage produces the model:
Predicted incwage = 29.4*(2^educ) + 27,870
9
When plotted against the mean values of incwage for each level of educational
attainment, this model looks like a very good fit. The graph can be found below.
IV. Conclusion
After looking at these different relationships between age, education and income, it is
time to revise the unrestricted model from before.
Predicted incwage = β0 + β1(age) + β2(male) + β3(marrno) + β4(educ) + β5(YRSUSA1)
+ β6(white)
With the new variables agesq, 2educ (named expINCeduc), the regression will now look
like this:
10
Model 5
Predicted incwage = β0 + β1(agesq) + β2(age) + β3(expINCeduc) + β4(educ) + β5(male)
+ β6(white) + β7(YRSUSA1) + β8(marrno)
The results of this Model 4 are shown in Table 4.
To check this regression output for heteroskedasticity, the Breusch-Pagan (BP) test. The
BP tests to see if the variance of the residuals is homogenous. This homogeneity is the null
hypothesis. The alternative hypothesis is that the variance of the residuals is not homogenous.
The test returned a p-value of 0.000 which means we must reject the null hypothesis.
Heteroskedasticity is definitely present here, so we have to use robust SE’s in the regression.
Model 6 shows the regression output of the same variables as Model 5, except it uses robust
SE’s. Model 6 can be found in Table 4.
With this new regression, we finally have a model that describes predicted income well.
The massive t-values for each one of these variables show that they are all statistically
significant, and the difference seen in predicted incwage with an increase or decrease in any of
these variables cannot be due to chance alone, so they must have a true correlation with income.
There isn’t much we can do with this information except encourage people to get more education
because the other variables are mostly out of a person’s control.
11
Tables
Table 1: Summary Statistics
Table 2: Times married regressed on age
Table 3: Models 1, 2, 3
incwage regression Model 1 Model 2 Model 3
Age
754.908*
(2.95)
632.904*
(3.47)
509.483*
(3.20)
Times married
4424.356*
(66.37)
5050.308*
(60.90)
Male
20050.744*
(76.77)
Educational Attainment
8066.571*
(17.04)
Years in the US
243.043*
(4.29)
white
3767.599*
(96.85)
constant
-58054.23*
(187.44)
r2 0.044 0.047 0.204
Mean S.D. Min Max
age 42.819 14.559 18 95
marrno 0.913 0.743 0 3
YRSUSA1 3.252 9.308 0 95
educ 7.561 2.27 0 11
male 0.513 0.499 0 1
white 0.786 0.41 0 1
Summary Statistics
age Coefficient Std. Err. t-stat p-value
Times married 10.219 0.013 733.42 0.000
constant 33.475 0.016 2,039.57 0.000
12
Table 4: Models 4, 5, 6
incwage regression Model 4 Model 5 Model 6
age2
-51.661*
(0.21)
-40.250*
(0.19)
-40.250*
(0.19)
age
5161.633*
(17.81)
3976.387*
(17.21)
3976.387*
(16.07)
2^educ
19.601*
(0.11)
19.601*
(0.14)
Educational Attainment
2627.731*
(32.89)
2627.731*
(29.38)
male
19329.381*
(74.82)
19329.381*
(75.39)
white
4896.109*
(94.40)
4896.109*
(79.47)
Years in the US
86.238*
(4.23)
86.238*
(4.94)
Times Married
2472.004*
(60.91)
2472.004*
(58.23)
Constant
-72355.18*
(357.41)
-91351.13*
(399.89)
-91351.13*
(350.30)
r2 0.084 0.245 0.245
13
Works Cited
Lewis Solmon, The Effects on Income of Type of College Attended, Sociology of Education,
Volume 48 No. 1, pg 75-90. <http://www.jstor.org/stable/2112051>.
Mohamad Alkadry, Unequal Pay: The Role of Gender, Public Administration Review, Volume
66 No. 6, pg 888-898 <http://www.jstor.org/stable/4096605>.
"IPUMS-USA." Minnesota Population Center. University of Minnesota, n.d. Web. 08 May 2014.
<https://usa.ipums.org/usa/>.