naghshpour chap one
TRANSCRIPT
-
7/30/2019 Naghshpour Chap One
1/32
Regression for Economics
-
7/30/2019 Naghshpour Chap One
2/32
-
7/30/2019 Naghshpour Chap One
3/32
Regression for Economics
Shahdad Naghshpour
-
7/30/2019 Naghshpour Chap One
4/32
Regression for Economics
Copyright Business Expert Press, 2012.
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted in any form or by any
meanselectronic, mechanical, photocopy, recording, or any other
except for brief quotations, not to exceed 400 words, without the prior
permission of the publisher.
First published in 2012 by
Business Expert Press, LLC
222 East 46th Street, New York, NY 10017www.businessexpertpress.com
ISBN-13: 978-1-60649-405-9 (paperback)
ISBN-13: 978-1-60649-406-6 (e-book)
DOI 10.4128/9781606494066
Business Expert Press Economics and Finance collection
Collection ISSN: 2163-761X (print)
Collection ISSN: 2163-7628 (electronic)
Cover design by Jonathan Pennell
Interior design by Exeter Premedia Services Private Ltd.,
Chennai, India
First edition: 2012
10 9 8 7 6 5 4 3 2 1
Printed in the United States of America.
-
7/30/2019 Naghshpour Chap One
5/32
o Parisa
SN
-
7/30/2019 Naghshpour Chap One
6/32
-
7/30/2019 Naghshpour Chap One
7/32
Abstract
Te concept o regression was introduced by Sir Francis Galton, but
R.A. Fisher provided the statistical theory and application or it or
the frst time. Te 20th century witnessed the spread o regression
analysis into every scientifc branch. Regression analysis is the most
commonly used statistical method in the world. It is used in economics
and many other felds. Although ew would characterize this technique
as simple, regression is in act both simple and elegant. Te complexity
that many attribute to regression analysis is oten a reection o their
lack o amiliarity with the language o mathematics. But regressionanalysis can be understood even without a mastery o sophisticated
mathematical concepts. Tis book provides the oundation o the
regression analysis. All the examples are rom economics, and in almost
all the examples the real data is used to show the applications o the
method.
Tis book seeks to demystiy regression analysis. Te concepts related
to regression analysis are explained in a way that is comprehensible to
those whose mathematical skills are not expert. Tere is logic to regression
analysis that resembles the intrinsic logic that we apply in comprehending
the various events that fll our lives, which are probabilistic rather
than deterministic in nature. What hinders peoples comprehension
o regression analysis is the di culty many have in understanding
mathematical symbols and derivations. By removing this obstacle, this
book enables the logical reader to learn regression without possessing
superior mathematical skills. Although this proposed book will be largely
nonmathematical in its approach, it will not in any way give short shrit
to the subject o regression. Tis book is targeted to all business students
and executives who need to understand the concept o regression or
practical and proessional purposes.
Te regression analysis can be used to establish causal relationship
between actors and the response variable. However, in order to be
able to do it, the economic theory must be used to provide causal
relationship and apply the regression analysis to veriy the validity o
the theory.
-
7/30/2019 Naghshpour Chap One
8/32
Tis book utilizes Microsot Excel to obtain regression results.
Although spreadsheet sotware is not the sotware o choice or perormingsophisticated regression analysis, it is widely available. Moreover, the use
o Excel will preempt the need to buy and learn new sotware; in itsel
another impediment to learning and using regression analysis.
Keywords
regression, analysis, causality, inerence
-
7/30/2019 Naghshpour Chap One
9/32
Contents
Foreword...............................................................................................xi
Acknowledgments.................................................................................xiii
Introduction .........................................................................................xv
Chapter 1 Te Concept o Regression ................................................1
Chapter 2 Te Method o Least Squares ...........................................13
Chapter 3 Simple Linear Regression in Excel ....................................27
Chapter 4 Multiple Regression .........................................................41
Chapter 5 Goodness o Fit ..............................................................59
Chapter 6 Regression Coe cients ....................................................71
Chapter 7 Causality: Correlation Is Not Causality ............................83
Chapter 8 Qualitative Variables in Regression ..................................89
Chapter 9 Pitalls o Regression Analysis ........................................101
Appendix............................................................................................117
Glossary .............................................................................................129
Notes..................................................................................................133
References ...........................................................................................135
Index .................................................................................................137
-
7/30/2019 Naghshpour Chap One
10/32
-
7/30/2019 Naghshpour Chap One
11/32
Foreword
Statistics Is the Science of FindingOrder in Chaos
Regression analysis is by ar the most commonly used statistical analysis
tool in many areas o science, including Economics. Ater you fnish the
book, I hope you will agree with me that i there was one tool tailor-made
or economics, it must be regression analysis. Tey are many aspects o
regression that perectly match the needs o an economist.
Oten students o introductory statistics are overwhelmed because o
the diversity o the material. Tere are too many new concepts and too
many dierent topics, which may not seem related in any sensible way.
In regression analysis, the ocus is on one and only one topic, regression
analysis. Tis narrow ocus is due to several reasons. Reason one is that
ater having been exposed to introductory statistics, you are now ready toocus on a special topic. Reason two is that the topic is so vast that even
dedicated books are su cient to cover all aspects o the topic. Te present
manuscript does not even scratch the surace o the vast topic o regres-
sion analysis. My hope is that you learn to see economics rom an applied
angle and manage to ocus on specifc outcomes and their magnitude.
I want you to know that every claim in economics is a testable hypothesis,
and every theorem in economics can be written as a regression model and
thus tested or the magnitude o the expected outcome. Regression analy-
sis or its broader subject area, statistics, is not a substitute or economic
theory. Instead, it is a complementary tool that allows us to estimate the
magnitude o the theoretically predicted outcome and to test the results
against the claims o policy makers and planners.
-
7/30/2019 Naghshpour Chap One
12/32
-
7/30/2019 Naghshpour Chap One
13/32
Acknowledgments
I am indebted to my wie Donna who has helped me in more ways
than imaginable. I do not think I can thank her enough. I would like
to thank Michael Webb or his relentless assistance in all aspects o the
book. He has been my most reliable source and I could always count on
him. I also want to thank my graduate assistants Issam Abu-Ghallous and
Brian Carriere. Tey have provided many hours o help with all aspects
o the process. Without the help o Mike, Issam, and Brian, the bookwould not have been completed. I also would like to thank Madeline
Gillette, Anthony Calandrillo, and Matt Orzechowski who read parts o
the manuscript.
-
7/30/2019 Naghshpour Chap One
14/32
-
7/30/2019 Naghshpour Chap One
15/32
Introduction
Economics is a very interesting subject. Te scope o economic domain is
vast. Economics deals with market structure, consumer behavior, invest-
ment, growth, fscal policy, monetary policy, the roles o the bank, etc.
Te list can go on or quite some time. It also predicts how economic
agents behave in response to changes in economic and noneconomic
actors such as price, income, political party, stability, and so on. Te
economic theory, however, is not specifc. For example, the theory provesthat when the price o a good increases the quantity supplied increases,
provided all the other pertinent actors remain constant, which is also
known as ceteris paribus. What the theory does not and cannot state is
how much the quantity increases or a given increase in price. Te answer
to this question seems to be more interesting to most people than the
act that the quantity will increase as a result o an increase in price. Te
truth is that the theory that explains the above relationship is impor-
tant or economists. For the rest o the population, the knowledge o
that relationship is worthless i the magnitude is unknown. Assume or
10% increase in price the quantity increases by 1%. Tis has many di-
erent consequences than i the quantity increases by 10%, and totally
dierent consequences i the quantity increases by 20%. Te knowledge
o the magnitude o change is as important, i not more important, than
the knowledge o the direction o change. In other words, predictions
are valuable when they are specifc.
Statistics is the science that can answer specifc issues raised above.
Te science o statistics provides the necessary theories that can providethe oundation or answering such specifc questions. Statistics theory
indicates the necessary conditions to set up the study and collect data.
It provides the means to analyze and clariy the meaning o the fndings.
It also provides the oundation to explain the meaning o the fnding
using statistical inerence.
In order to be able to make an economic decision, it is necessary
to know the economic conditions. Tis is true or all economic agents,
rom the smallest to the largest. Te smallest economic agent might be
-
7/30/2019 Naghshpour Chap One
16/32
xvi INTRODUCTION
an individual with little earning and disposable income, while the largest
can be a multinational corporation with thousands o employees, not tomention governments. Briey, we will discuss some o the main needs and
uses o statistics in economics and then present some uses o regression
analysis in economics as well.
Te frst step in making any economic decision is to gain knowledge
o the state o economy. Economic condition is always in a state o
ux. Sometimes it seems that we are not very concerned with mundane
economic basics. For example, we may not try to orecast what the price
o a loa o bread is or a pound o meat. We know the average prices or
these items; we consume them on a regular basis and will continue doing
so as long as nothing drastic happens. However, i you were to buy a
new car you would most likely call around and check some showrooms
to learn about available eatures and prices because we tend not to have
up-to-date inormation on big-ticket items or goods and services that we
do not purchase regularly. Te process described above is a kind o sam-
pling, and the inormation that you obtain is called sample statistics,
which you use to make an inormed decision about the average price o
an automobile. When the process is perormed according to restrict andormal statistical methods, it is called statistical inerence. Te specifc
sample statistics is called sample mean. Mean is one o numerous sta-
tistical measures at the disposal o modern economists. Another useul
measure is the median. Te median is a value that divides observations
into two equal halves, one with values less than the median and the
other with values more than median. Statistics explains when each meas-
ure should be used and what determines which one is the appropriate
measure. Median is the appropriate measure when dealing with home
prices or income. Applications o statistical analysis in economics are
vast, and sometimes they reach to other disciplines that need econom-
ics or assistance. For example, when we need to build a bridge to meet
economic, social, and even cultural needs o a community, it is impor-
tant to fnd a reliable estimate o the necessary capacity o the bridge.
Statistics indicates the appropriate measure to be used by teaching us
whether we should use the median or the mode. It also provides insight
on the role that variance plays in this problem. In addition to identiying
the appropriate tools or the task on hand, statistics also provides the
-
7/30/2019 Naghshpour Chap One
17/32
INTRODUCTION xvii
methods o obtaining suitable data and procedure or perorming
analysis to deliver the necessary inerence.One cannot imagine an economic problem that does not depend on
statistical analysis. Every year, the Government Printing O ce compiles
the Economic Report o the President. Although the majority o the sta-
tistics in the report are act-based inormation about dierent aspects o
economics, many o the statistics are based on some statistical analysis,
albeit descriptive statistics. Descriptive statistics provides simple yet
powerul insight to economic agents and enable them to make more
inormed decisions.
Another component o statistical analysis is inerential statistics.
Inerential statistics allows the economist and political leaders to test
hypotheses about economic condition. For example, in the presence o
ination, the Federal Reserve Board o Governors may choose to reduce
money supply to cool down the economy and slow down the pace o
ination. Te knowledge o how much to reduce the supply o money is
not only based on economic theory, but also depends on proper estima-
tion o the fnal outcome.
Another widely used application o statistical analysis is in policy deci-sion. We hear a lot about the erosion o the middle class or that the mid-
dle class pays a larger percentage o its income in taxes than the lower
and upper classes. However, how do we know who is the middle class.
A set dollar amount o income would be inadequate because o ina-
tion, although, we must admit even a single dollar amount must also
be obtained using statistics. However, statistical analysis has a much
more meaningul and more elegant solution. Te concept o interquartile
range identifes the middle 50% o the population or income. Although
interquartile range was not designed to identiy the middle 50% and is
not explained in these terms, the combination o economics and statistics
is used to identiy the middle 50% or economics and policy decision
purposes.
Te knowledge o statistics can also help to identiy and comprehend
daily news and events. Recently, a report indicated that the chance o
accident or teenage drivers increases by 40% when there are passengers
in the car that are under 21 years o age. Tis is a meaningless report.
Few teenagers drive alone or have passengers over 21 years o age. otal
-
7/30/2019 Naghshpour Chap One
18/32
xviii INTRODUCTION
miles driven by teenagers when there passengers under 21 years o age ar
exceeds any other types o teenage driving. Other things equal, the moreyou drive, the higher the probability o an accident. Tis example indi-
cates that the knowledge o statistics is helpul in understanding everyday
events and in making sound analysis.
When an economic phenomenon is changed to produce a desirable
income, we need more powerul tools than simple statistics. Regression
analysis is one o the most widely used statistical tools at the disposal o
economists.
In regression analysis, the eect o one or more actor is measured to
determine another actor. Te frst group is also known as explanatory
variables, while the latter is known as endogenous variables. In econom-
ics it makes sense to reer to explanatory variables as policy instruments.
Policy instruments are variables that economists and policy makers can
change or control. Te supply o money is a policy instrument controlled
by the Federal Reserve. Te Fed has to collect data frst, which is done on
a periodic basis. Tese statistics inorm the Fed that there is a problem in
the economy, such as ination. Te Fed decides to reduce the supply o
money. It will wait or the economy to respond to the change in supply omoney. Ten economic indicators are measured again and tested against
the target set by the policy. I the policy objectives are not met, the action
is repeated until the desirable outcome is obtained.
When working with a regression model, one might wonder i it
was designed to serve economists. Even some o the commonly used
terminologies are the same in both felds. For example, both subjects use
explanatory variables to measure the response variable. ypical regres-
sion models do not consist o one explanatory variable and one response
variable. Instead, in addition to explanatory variables, the model has addi-
tional variables known as control variables. Control variables are actually
the same thing as economics shiters. Shiters in economics reer to
variables that are assumed to remain constant or the sake o identiying
the impact o the explanatory variables on the response variables. In
act, every economic theory seems to have the amous ceteris paribus,
which means other things being equal. When other things are not equal
and change, they do not distort the relationship between explanatory
and response variables. Tey simply shit the magnitude up or down,
-
7/30/2019 Naghshpour Chap One
19/32
INTRODUCTION xix
depending on the direction o the impact. Estimation o demand pro-
vides a good example. Economic theory states that an increase in pricereduces the quantity demanded, ceteris paribus. Te regression model or
this economic theory can be written as
Qd
=b0
+b1P+ e (I.1)
where e is the error term, which will be explained later. o complete the
process, we need to test the hypothesis that the coe cient o price, which
is also the slope o the demand curve, is negative. So we use statistics to
test the ollowing hypothesis:
H0: b
1= 0 H
1: b
1< 0
Te model, however, is not complete, because it is not subject to ceteris
paribusas it does not control anything. Simple control variables consist o
price o a complementary good, a substitute good, and income, to name
just a ew important ones. Te theory predicts that the eect o a change
in the price o a complementary good is inverse, the eect o a change inthe price o a substitute good is direct, and the eect o change in income
is direct. Tus, model (I.1) should be modifed as below.
Qd
=b0
+ b1P+ b
2P
c+b
3P
s+b
4Y+ + e, (I.2)
Te theoretical claims are written as
H0
: b1
= 0 H1
: b1
< 0
H0
: b2
= 0 H1
: b2
> 0
H0
: b3
= 0 H1
: b3
< 0,
where the subscripts use the frst letters o complementary and substi-
tute, and Yrepresents income. Te regression model clearly and perectly
matches the economic theory rom expected eects o each variable to the
concept oceteris paribus.
-
7/30/2019 Naghshpour Chap One
20/32
-
7/30/2019 Naghshpour Chap One
21/32
CHAPTER 1
The Concept of Regression
Relationship Between Variables
Oten we are interested in explaining a phenomenon using other actors.
Tere are numerous methods or accomplishing this objective. When thephenomenon is quantitatively measurable, the solution is much easier
and the methods are well established. One such method is regression.
In regression analysis, one variable (dependentvariable) is explained
by one or more variables (independentvariables). Beore explaining a
regression model, presenting an example o a simple model or explaining
consumption using income is benefcial. But we frst need to defne the
economic concept marginal propensity to consume (MPC).
Definition 1.1
Te marginal propensity to consume or MPC represents the amount
one would consume i one is given an extra dollar.
Consumption = subsistence consumption +
(marginal propensity to consume) (income).(1.1)
Conceptually, MPC is the same as the slope o regression line whenthere is only one independent variable. In equation (1.1), consump-
tion is the dependent variable and income is the independent variable.
Although the term dependent variable is commonly used in econom-
ics literature, other names such as endogenous variable, Y variable,
response variable, or even outputare oten used as well. Similarly, the
term independentvariable might be replaced byexogenous variable,
Xvariable, regressor, input, actor, or predictor variable.
-
7/30/2019 Naghshpour Chap One
22/32
2 REGRESSION FOR ECONOMICS
Equation (1.1) is a good example o the concept o regression, but it
is not a regression model. Te ormat or a regression model will be dis-cussed shortly. You are more likely to be amiliar with a mathematical
unction than a statistical unction such as regression. A mathematical
unction represents a nonprobabilistic association between a depend-
ent variable and one or more independent variables; the association is
exact and fxed (Figure 1.1a). A regression model is a simplifcation
o reality. It is actually aclaim o a relationship and thus, a testable
hypothesis. Te association between the dependent variable and the
independent variable(s) is probabilistic and not deterministic. It is
true on the average only. Figure 1.1b depicts pairs o (X, Y) observa-
tions relating dependent variable (Y) to the independent variable (X).
Many actors aect the actual value oYand cause the observation to
deviate rom the expected values. A regression model represents the
expected value.
Equation (1.1) is the equation o a line except that it is not written
in the customary orm (used in geometry). It is also a unction because
it provides a specifc outcome based on a linear rule, that is, as income
changes, consumption changes by the magnitude o theMPC. I incomebecomes zero, consumption drops to the level o subsistence consump-
tion, which is the level o consumption necessary to survive even i one
does not have any income. Note that here we are not interested in answer-
ing how one manages to pay or subsistence consumption, which could
be rom savings, selling household urniture, or something else. Tat is
Figure 1.1. Comparison of (a) a function with (b) a regression model.
a. A function
OX
Y
Y=b0
+b1
X
b. A regression line superimposed
on observations
OX
Y
Y=
b0+b1
X+e
-
7/30/2019 Naghshpour Chap One
23/32
THE CONCEPT OF REGRESSION 3
not the purpose o this model. Te purpose is to explain the level o con-
sumption in response to changes in income. Tis model is a simplifcationo reality. For example, it does not take into account the role that wealth
might play in explaining consumption. In a more elaborate model, addi-
tional independent variables could be included that might improve the
models ability to estimate the dependent variable more accurately and to
more closely approximate the reality.
Although this model is a good starting point, it is not a precise rep-
lication o reality. Nevertheless, it is the same as a simple consumption
unction explained in many introductory macroeconomics textbooks. As
such, it serves a similar purpose: introduces the concept, clarifes applica-
tion o the concept, and prepares or a more appropriate model.
Definition 1.2
Amodelis a simple representation o something real in lie.
Te level o representativeness is determined by the purpose o the
model and does not necessarily make a model more desirable, in part
because the purposes o a study aect the desirability o the level osophistication o the model.
Models need restrictions on their parameters to make sense. For
example, theMPChas to be positive and less than one. A negative MPC
means that as income increases, consumption decreases and eventually
drops below subsistence level, while an MPC greater than one means
that consumption at some point becomes larger than income. MPCval-
ues below zero or above one contradict reality and dey common sense.
Tereore, we restrictMPCto be between 0 and 1. In addition, negative
values or the independent variable o income and the dependent variable
o consumption are meaningless. Similarly, a negative subsistence level
would be impossible. However, there are situations where the estimate or
the subsistence level might turn out to be negative, but or the purpose o
this example they can be ignored.
Te our values o income, consumption, the MPC, and the sub-
sistence level are very dierent rom each other. Consumption and
income, the dependent and independent variables, are observable data.
Tis means we can gather data on actual income and consumption
-
7/30/2019 Naghshpour Chap One
24/32
4 REGRESSION FOR ECONOMICS
levels o a sample o people. Te data are typically published and cus-
tomarily represented in a column ormat. Subsistence consumptionand MPC, however, are known as parameters. Parameters are almost
always unknown and have to be estimated. Although every nation has
an MPC at any given point in time, the actual value is unknown, as
is the case with the subsistence level o consumption. Te parameters
are estimated by the model using regression analysis. In the jargons o
regression, parameters are sometimes called coef cients or slopes. Te
interpretation o coe cients and their appropriate analyses are covered
in Chapter 6.
Definition 1.3
A parameter is a characteristic o a population that is o interest.
Parameters are constant and usually unknown.
Examples o parameters include population mean, population vari-
ance, and regression coe cients. One o the main purposes o statistics
is to obtain inormation rom a sample that can be used to make iner-
ences about population parameters. Te estimated value obtained rom asample is called astatistic.
Definition 1.4
Astatisticis a numerical value calculated rom a sample that is variable
and known.
Te word statistic has several meanings depending on the context:
two o its meanings are presented in the previous paragraph. Te frst useo the word reers to the science and the discipline o statistics. Te second
use is more specifc and is based on the above defnition. In the science o
statistics, we use statistics to make inerence about parameters.
Te slope and intercept terminologies used in geometry are also
commonly used to reer to coe cients in regression analysis. In the
consumption model, the corresponding analogy to geometry is that
MPCis the slope and subsistence level is the intercept o the consump-
tion line. According to this model, a dollar increase in income increases
consumption by the magnitude oMPC, which by defnition is the slope
-
7/30/2019 Naghshpour Chap One
25/32
THE CONCEPT OF REGRESSION 5
o regression line. When income is zero, the amount o consumption is
equal to subsistence level and thereore, indicates the intercept.Te representative terms consumption and income used in
equation (1.1) only apply to this particular problem, which renders
them inapplicable when the problem is changed. Consider a model that
explains quantity demanded as a unction o price o a good. I the price
increases by one dollar, how much will the quantities demanded decrease?
An attempt to write this question in the orm o a model results in a
stalemate or a typical economist wishing to stick to vocabulary that has
economic meaning. In equation (1.2) below, the problematic value is des-
ignated by ? Te value that replaces ? answers the question i the
price increases by $1, (how much) will the quantity demanded decrease.
Te (how much) in the parenthesis does not have a defned economic
name, thus, or the time being it is represented by a question mark.
Quantity demanded =
demand when the good is ree + (?) (price)(1.2)
Te ? can be replaced by responsiveness o quantity demanded, orsome other unamiliar and arcane wording. Such arbitrary naming can only
cause conusion and should be avoided. A reasonably good alternative
or the (?), which would be close to the concept oMPCin equation (1.1),
could be coe cient o responsiveness o quantity demanded to changes
in price. One advantage o this term is the use o the previously defned
concept ocoef cient. While this phrasing still has the shortcomings o
the previous naming, it also has the added disadvantage o being long and
wordy. Furthermore, an astute student would recall that it resembles the
defnition oelasticity. In act, had the price and quantity been meas-
ured in units o natural logarithm, the question mark could be replaced by
price elasticity, as demonstrated in equation (1.3).
ln(quantity demanded) = demand when the good is ree +
(price elasticity o demand) (price),(1.3)
where ln indicates natural logarithm as is customary. Sometimes
equations that involve natural logarithm on both sides o the equation are
-
7/30/2019 Naghshpour Chap One
26/32
6 REGRESSION FOR ECONOMICS
called loglog, but this is a poor and inappropriate terminology, as is the
name double-log equation.
Definition 1.5
Price elasticity of demandis the percentage change in quantity demanded
divided by the percentage change in price.
By expressing the price and quantity in natural logarithm, the coe-
fcient o the slope o the price variable becomes the same as the demand
elasticity. Tis is due to properties o the slope o regression line and math-
ematical properties o the natural logarithm. In Chapter 9, using loga-
rithm we address some modeling and data problems. In equation (1.3)
there is no good explanation or intercept, so or simplicity and brevity
it can be called by its generic term, namely the intercept. Nevertheless, it
is better to think o the model in economics terms as much as possible.
Although writing models in their economics equivalent terms is
extremely useul, it can also be a cumbersome process. At times, it is
helpul to use symbols instead o words. For example, i we replace con-
sumption with C, income with Y, and marginal propensity to consumewith MPC in equation (1.1), as is customary, we obtain the ollowing
equation:
C= subsistence level o consumption + (MPC) (Y) (1.4)
One might choose to represent subsistence level o consumption
with SLC, but the acronym is not customary and thus, it does not help
much. A more generic symbol might prove more pragmatic.
Parameters are customarily represented by Greek letters, which make
most people apprehensive. Consider the Greek letters as names or param-
eters, which are generic terms. Equation (1.4) can be written as
C= b0+b
1Y (1.5)
A novice mathematics student might be ill at ease with equation (1.4)
or (1.5) because in mathematics it is customary to use the letter Yor the
dependent variable, while here it is used to represent the independent
-
7/30/2019 Naghshpour Chap One
27/32
THE CONCEPT OF REGRESSION 7
variable. Economists customarily use the letter Yor income and are airly
comortable with it. However, the ollowing ormat is not only preerredbut also more inormative:
Consumption = b0
+ b1
income (1.6)
Tis indicatesthat
i income changes by one unit, consumption
changes byb1
units in the direction o the sign ob1, which according
to consumption theory, should be positive. Tis theoretical expectation
o the outcome is the oundation o orming the alternative hypothesis.
For more inormation consult.1 For example, ib1
is 0.8, then as income
increases by $100, consumption will increase by $80. Tis expected out-
come can be verifed empirically, which makes it a testable hypothesis.
In order to test the magnitude o theMPC, the slope parameter (b) must
be estimated, as will be discussed later. Te next step ater estimating a
parameter is to test the estimated value against theoretical expectation.
In this example, it makes sense to test the estimate o the parameter to
determine i it is equal to the numeral one, which indicates zero savings
and zero borrowing. As it will become clear later, it would also make senseto test the estimated slope against the value o zero.
From a Mathematical Equation to a Regression Model
None o the equations that have been presented thus ar are actually
regression models. Tey are mathematical unctions and more specif-
cally, each is an equation o a line. Equations (1.1) and (1.4)(1.6) are
consumption lines, where consumption is a unction o income, while
equation (1.2) is a demand line or unction. Equation (1.3) is a line rep-
resenting the percentage change in quantity demanded as a unction o
percentage change in price. Its main parameter is the price elasticity o
demand, which is the coe cient o the independent variable percentage
change in price.
Te reason none o these equations are models is that they are exact
mathematical equations, as depicted in Figure 1.1a, and not a simplifca-
tion o a real phenomenon in lie. Tings in real lie occur with a degree
o uncertainty or probability and thus, they are random in nature. Adding
-
7/30/2019 Naghshpour Chap One
28/32
8 REGRESSION FOR ECONOMICS
a random component to these equations converts them into a regres-
sion model. Te random component is called error term, or randomerror, or reasons that will be explained shortly. Te customary symbol
is the Greek letter epsilon (e), but (U) and (V) are also common. In
Figure 1.1b, the vertical distances between the actual observations and the
regression model are the error terms.
C= subsistence level o consumption + (MPC) (Y) + e (1.7)
Consumption = b0+ b
1income + e (1.8)
C=b0
+ b1Y+ e (1.9)
Te above three equations (1.7)(1.9) are regression models and
express exactly the same thing. Tey are models that state, on the average,
consumption depends on income in a linear ashion. Tese are all the
same as claiming that income explains average consumption. Note that
the use o the term average reers to average outcome or a dependent
variable, which because o random error is probabilistic in nature and hasan average. It is dierent than the concept o average consumption, which
is consumption divided by income.
Soon you will learn that having a model is not su cient; a model
must be useul, which is a concept that needs to be defned and clarifed.
For sake o completeness, the dependent variable (C) represents consump-
tion. For slope, we use the acronymMPC. Te independent variable (Y)
represents income. Epsilon (e) is the error term;b0(beta zero) is the inter-
cept, which represents the subsistence level, and b1
(beta one) is the slope,
which in this case represents theMPC.
Students and scholars should develop the habit o ollowing the same
procedure or regression models as it is customary in the proession. Te
dependent variable, what is being explained, appears on the let-hand side
o the equal sign. Examples rom the above models include consumption,
quantity demanded, and percentage change in quantity demanded. Te
term that is not related to the independent variable, the intercept, appears
as the frst term on the right-hand side o the equal sign. It represents
the value o the dependent variable in the case where the independent
-
7/30/2019 Naghshpour Chap One
29/32
THE CONCEPT OF REGRESSION 9
variable ails to be signifcant, which is reected by a zero value or its
coe cient.Te independent variable and its coef cient are next on the
right-hand side o the equation. In the three examples above, there is one
independent variable in each model. Te independent variable or the
consumption model is income, while or the quantity demanded model it
is price. Finally, or the model estimating elasticity, the independent vari-
able is the percentage change in price. I there were more than one inde-
pendent variable, as will be the case soon, the variables ollow the same
pattern one ater the other but not necessarily in any particular order. In
act, the order in which independent variables are listed in a model has no
impact on the fnal output. Te coe cient o the independent variable is
also called slope o the line; however, it only makes sense i there is only
one independent variable, as has been the case with the examples so ar.
Customarily, the last term is the error term or e, which plays a very
important role in a regression model. It converts a mathematical unction
into a regression model that can be estimated using statistics. For a regres-
sion analysis to be valid, the error term must comply with certain require-
ments, which are customarily called assumptions. Te assumptions areplaced in the appendix because o the theoretical nature o the discussion.
The Meaning of Regression
As noted earlier, equations (1.1) and (1.4)(1.6) state the same thing,
while models (1.7), (1.8), or (1.9) are exactly identical. We choose
equation (1.6) and model (1.8) or comparison. Te dierence between
an equation like (1.6) and model (1.8) seems to be that model (1.8)
has one extra term, namely, the (e), which we learned is called the error
term. However, there are a number o major dierences between the two
equations. Some are simplistic, such as the act that equation (1.6) is a
mathematical unction, while equation (1.8) is a regression model. Te
other dierences need more explanation, which should clariy the dier-
ence between an equation and a model. A mathematical unction repre-
sents an exact relationship with exactly the same outcome each time it
is perormed. However, a model is a representation or simplifcation o
reality and includes a random error term to indicate that the outcome
-
7/30/2019 Naghshpour Chap One
30/32
10 REGRESSION FOR ECONOMICS
is stochastic rather than deterministic. Te term stochastic means that
a model is probabilistic in nature; thereore, every time a new sample isobtained and the regression model is estimated, the results are slightly
dierent, reecting the random nature o the model.
In equation (1.6), the parameters b0
and b1
are known. In contrast,
in model (1.8) they are unknown and must be estimated. Te customary
use o equation (1.6) is to fnd the value o consumption with knowledge
o known parameters b0
and b1
and a given value o income. Te act
that b0
and b1
are known means anyone who chooses to insert a given
value o the independent variable income in the equation would always
get the same answer. No real data is necessary. I one chooses to use real
data such as per capita income or a country or years 19732010, it
is possible to obtain one value or consumption or each year. On the
other hand, in model (1.8) the parametersb0
andb1
are unknown, which
means it is impossible to obtain a value or consumption even with a
known value or income until parameters b0
and b1
are estimated using
regression analysis. In using model (1.8) the data or consumption and
income are available. Tey are historical values that have been observed
and cannot be changed or replaced arbitrarily. Using these observed val-ues the objective is to estimate the unknown parameters to obtain a line
that best fts the data. Te study o regression analysis deals with methods
or obtaining estimates or b0
and b1
that meet certain criteria deemed
desirable and also to determine i there is a set o estimates that is best; a
concept that must be defned clearly and precisely and will be covered in
Appendix A. Customarily, estimated parameters are represented by Greek
letters with a ^, called a hat symbol, as 0b and
1b . Tese are pro-
nounced beta-hat-sub-zero and beta-hat-sub-one, respectively.
A model represents aclaim about a real-lie phenomenon. For exam-
ple, model (1.8) claims that there is a cause and eect relationship between
income and consumption, that is, as income increases consumption
increases. One cannot include vice versaat the end o last sentence, because
based on economic theory it is not true. In economics, income determines
consumption while consumption does not determine income, at least not
in an introductory discussion o the subject. Te theory that states income
determines consumption belongs to economics not statistics. Te act that
in macroeconomics, consumption also depends on income, via a dierent
-
7/30/2019 Naghshpour Chap One
31/32
THE CONCEPT OF REGRESSION 11
mechanism, is addressed later in a much more sophisticated analysis in
more advanced economic courses. A model, as a simplifcation o reality, isproposed to explain the causal relationship between income and consump-
tion. Regression analysis, as a statistical tool, is used to provide a theory
that determines i there is su cient evidence in real lie to support the
claim presented in economics. Te theories that justiy inerence based on
evidence belong to statistics not economics.
Tereore, every research model involves two dierent types o theo-
ries, one rom the discipline in which the research is conducted and the
other rom statistics. Te starting point or every research is the theoreti-
cal oundations o the discipline, which or us is economics. Te estima-
tion and inerence o the research are governed by theories in statistics.
Te frst set o theories originates in economics, which provides the oun-
dation or raising the research question and establishing the claim(s) o
the study. For research in other felds, the relevant subject provides the
appropriate theory or this purpose. Statistical theories govern the pro-
cedures and assure that outcomes have desirable properties and can be
generalized. Some o the desirable properties will be explained and veri-
fed in this manuscript. Lack o appropriate theories rom either the feldo economics or statistics invalidates the research outcome.
A consumption model like equation (1.8) is used to determine
whether there is empirical evidence to reute economic theory. Note that
economic theory does not make any assumption that parameters b0
and
b1
are known. Although it places restrictions on them, such as b1must be
a value between 0 and 1, when b1
representsMPC. Any number outside
the range 0 and 1 violates one or more economic rules or principles. A
slope greater than 1 means that a one unit increase in income would
increase consumption by more than 1 (or example, ib1
is 1.2, then a
$1.00 increase in income would increase consumption by $1.20), which
at least in this simplest o consumption models is impossible. Also, a
negative MPCmakes no economic sense. Teoretical properties o the
coe cient can also be tested statistically, as will be seen in Chapter 6.
In order to test any theory using a model there must be su cient data.
Because parameters o the proposed model (b0
and b1) are unknown, a
statistical method known as regression analysis is necessary. Regression
analysis is also called the method o least squares. Te simplest regression
-
7/30/2019 Naghshpour Chap One
32/32
12 REGRESSION FOR ECONOMICS
analysis uses a model that has onlyone independent variable, such as
income, which means it has two parameters,b0 andb0. Tese parametersare also known as intercept and slope, respectively. Tis simple regression
analysis requires one set o data, customarily arranged in two columns,
one or the independent variable and another one or the dependent
variable, which in this case are income and consumption, respectively.
Estimated parameters depend on a particular observed set o data and
are shown as 0b andb1.