naghshpour chap one

7/30/2019 Naghshpour Chap One

1/32

Regression for Economics


2/32


3/32


Shahdad Naghshpour


4/32


Copyright Business Expert Press, 2012.

All rights reserved. No part of this publication may be reproduced,

stored in a retrieval system, or transmitted in any form or by any

meanselectronic, mechanical, photocopy, recording, or any other

except for brief quotations, not to exceed 400 words, without the prior

permission of the publisher.

First published in 2012 by

Business Expert Press, LLC

222 East 46th Street, New York, NY 10017www.businessexpertpress.com

ISBN-13: 978-1-60649-405-9 (paperback)

ISBN-13: 978-1-60649-406-6 (e-book)

DOI 10.4128/9781606494066

Business Expert Press Economics and Finance collection

Collection ISSN: 2163-761X (print)

Collection ISSN: 2163-7628 (electronic)

Cover design by Jonathan Pennell

Interior design by Exeter Premedia Services Private Ltd.,

Chennai, India

First edition: 2012

10 9 8 7 6 5 4 3 2 1

Printed in the United States of America.


5/32

o Parisa

SN


6/32


7/32

Abstract

Te concept o regression was introduced by Sir Francis Galton, but

R.A. Fisher provided the statistical theory and application or it or

the frst time. Te 20th century witnessed the spread o regression

analysis into every scientifc branch. Regression analysis is the most

commonly used statistical method in the world. It is used in economics

and many other felds. Although ew would characterize this technique

as simple, regression is in act both simple and elegant. Te complexity

that many attribute to regression analysis is oten a reection o their

lack o amiliarity with the language o mathematics. But regressionanalysis can be understood even without a mastery o sophisticated

mathematical concepts. Tis book provides the oundation o the

regression analysis. All the examples are rom economics, and in almost

all the examples the real data is used to show the applications o the

method.

Tis book seeks to demystiy regression analysis. Te concepts related

to regression analysis are explained in a way that is comprehensible to

those whose mathematical skills are not expert. Tere is logic to regression

analysis that resembles the intrinsic logic that we apply in comprehending

the various events that fll our lives, which are probabilistic rather

than deterministic in nature. What hinders peoples comprehension

o regression analysis is the di culty many have in understanding

mathematical symbols and derivations. By removing this obstacle, this

book enables the logical reader to learn regression without possessing

superior mathematical skills. Although this proposed book will be largely

nonmathematical in its approach, it will not in any way give short shrit

to the subject o regression. Tis book is targeted to all business students

and executives who need to understand the concept o regression or

practical and proessional purposes.

Te regression analysis can be used to establish causal relationship

between actors and the response variable. However, in order to be

able to do it, the economic theory must be used to provide causal

relationship and apply the regression analysis to veriy the validity o

the theory.


8/32

Tis book utilizes Microsot Excel to obtain regression results.

Although spreadsheet sotware is not the sotware o choice or perormingsophisticated regression analysis, it is widely available. Moreover, the use

o Excel will preempt the need to buy and learn new sotware; in itsel

another impediment to learning and using regression analysis.

Keywords

regression, analysis, causality, inerence


9/32

Contents

Foreword...............................................................................................xi

Acknowledgments.................................................................................xiii

Introduction .........................................................................................xv

Chapter 1 Te Concept o Regression ................................................1

Chapter 2 Te Method o Least Squares ...........................................13

Chapter 3 Simple Linear Regression in Excel ....................................27

Chapter 4 Multiple Regression .........................................................41

Chapter 5 Goodness o Fit ..............................................................59

Chapter 6 Regression Coe cients ....................................................71

Chapter 7 Causality: Correlation Is Not Causality ............................83

Chapter 8 Qualitative Variables in Regression ..................................89

Chapter 9 Pitalls o Regression Analysis ........................................101

Appendix............................................................................................117

Glossary .............................................................................................129

Notes..................................................................................................133

References ...........................................................................................135

Index .................................................................................................137


10/32


11/32

Foreword

Statistics Is the Science of FindingOrder in Chaos

Regression analysis is by ar the most commonly used statistical analysis

tool in many areas o science, including Economics. Ater you fnish the

book, I hope you will agree with me that i there was one tool tailor-made

or economics, it must be regression analysis. Tey are many aspects o

regression that perectly match the needs o an economist.

Oten students o introductory statistics are overwhelmed because o

the diversity o the material. Tere are too many new concepts and too

many dierent topics, which may not seem related in any sensible way.

In regression analysis, the ocus is on one and only one topic, regression

analysis. Tis narrow ocus is due to several reasons. Reason one is that

ater having been exposed to introductory statistics, you are now ready toocus on a special topic. Reason two is that the topic is so vast that even

dedicated books are su cient to cover all aspects o the topic. Te present

manuscript does not even scratch the surace o the vast topic o regres-

sion analysis. My hope is that you learn to see economics rom an applied

angle and manage to ocus on specifc outcomes and their magnitude.

I want you to know that every claim in economics is a testable hypothesis,

and every theorem in economics can be written as a regression model and

thus tested or the magnitude o the expected outcome. Regression analy-

sis or its broader subject area, statistics, is not a substitute or economic

theory. Instead, it is a complementary tool that allows us to estimate the

magnitude o the theoretically predicted outcome and to test the results

against the claims o policy makers and planners.


12/32


13/32

Acknowledgments

I am indebted to my wie Donna who has helped me in more ways

than imaginable. I do not think I can thank her enough. I would like

to thank Michael Webb or his relentless assistance in all aspects o the

book. He has been my most reliable source and I could always count on

him. I also want to thank my graduate assistants Issam Abu-Ghallous and

Brian Carriere. Tey have provided many hours o help with all aspects

o the process. Without the help o Mike, Issam, and Brian, the bookwould not have been completed. I also would like to thank Madeline

Gillette, Anthony Calandrillo, and Matt Orzechowski who read parts o

the manuscript.


14/32


15/32

Introduction

Economics is a very interesting subject. Te scope o economic domain is

vast. Economics deals with market structure, consumer behavior, invest-

ment, growth, fscal policy, monetary policy, the roles o the bank, etc.

Te list can go on or quite some time. It also predicts how economic

agents behave in response to changes in economic and noneconomic

actors such as price, income, political party, stability, and so on. Te

economic theory, however, is not specifc. For example, the theory provesthat when the price o a good increases the quantity supplied increases,

provided all the other pertinent actors remain constant, which is also

known as ceteris paribus. What the theory does not and cannot state is

how much the quantity increases or a given increase in price. Te answer

to this question seems to be more interesting to most people than the

act that the quantity will increase as a result o an increase in price. Te

truth is that the theory that explains the above relationship is impor-

tant or economists. For the rest o the population, the knowledge o

that relationship is worthless i the magnitude is unknown. Assume or

10% increase in price the quantity increases by 1%. Tis has many di-

erent consequences than i the quantity increases by 10%, and totally

dierent consequences i the quantity increases by 20%. Te knowledge

o the magnitude o change is as important, i not more important, than

the knowledge o the direction o change. In other words, predictions

are valuable when they are specifc.

Statistics is the science that can answer specifc issues raised above.

Te science o statistics provides the necessary theories that can providethe oundation or answering such specifc questions. Statistics theory

indicates the necessary conditions to set up the study and collect data.

It provides the means to analyze and clariy the meaning o the fndings.

It also provides the oundation to explain the meaning o the fnding

using statistical inerence.

In order to be able to make an economic decision, it is necessary

to know the economic conditions. Tis is true or all economic agents,

rom the smallest to the largest. Te smallest economic agent might be


16/32

xvi INTRODUCTION

an individual with little earning and disposable income, while the largest

can be a multinational corporation with thousands o employees, not tomention governments. Briey, we will discuss some o the main needs and

uses o statistics in economics and then present some uses o regression

analysis in economics as well.

Te frst step in making any economic decision is to gain knowledge

o the state o economy. Economic condition is always in a state o

ux. Sometimes it seems that we are not very concerned with mundane

economic basics. For example, we may not try to orecast what the price

o a loa o bread is or a pound o meat. We know the average prices or

these items; we consume them on a regular basis and will continue doing

so as long as nothing drastic happens. However, i you were to buy a

new car you would most likely call around and check some showrooms

to learn about available eatures and prices because we tend not to have

up-to-date inormation on big-ticket items or goods and services that we

do not purchase regularly. Te process described above is a kind o sam-

pling, and the inormation that you obtain is called sample statistics,

which you use to make an inormed decision about the average price o

an automobile. When the process is perormed according to restrict andormal statistical methods, it is called statistical inerence. Te specifc

sample statistics is called sample mean. Mean is one o numerous sta-

tistical measures at the disposal o modern economists. Another useul

measure is the median. Te median is a value that divides observations

into two equal halves, one with values less than the median and the

other with values more than median. Statistics explains when each meas-

ure should be used and what determines which one is the appropriate

measure. Median is the appropriate measure when dealing with home

prices or income. Applications o statistical analysis in economics are

vast, and sometimes they reach to other disciplines that need econom-

ics or assistance. For example, when we need to build a bridge to meet

economic, social, and even cultural needs o a community, it is impor-

tant to fnd a reliable estimate o the necessary capacity o the bridge.

Statistics indicates the appropriate measure to be used by teaching us

whether we should use the median or the mode. It also provides insight

on the role that variance plays in this problem. In addition to identiying

the appropriate tools or the task on hand, statistics also provides the


17/32

INTRODUCTION xvii

methods o obtaining suitable data and procedure or perorming

analysis to deliver the necessary inerence.One cannot imagine an economic problem that does not depend on

statistical analysis. Every year, the Government Printing O ce compiles

the Economic Report o the President. Although the majority o the sta-

tistics in the report are act-based inormation about dierent aspects o

economics, many o the statistics are based on some statistical analysis,

albeit descriptive statistics. Descriptive statistics provides simple yet

powerul insight to economic agents and enable them to make more

inormed decisions.

Another component o statistical analysis is inerential statistics.

Inerential statistics allows the economist and political leaders to test

hypotheses about economic condition. For example, in the presence o

ination, the Federal Reserve Board o Governors may choose to reduce

money supply to cool down the economy and slow down the pace o

ination. Te knowledge o how much to reduce the supply o money is

not only based on economic theory, but also depends on proper estima-

tion o the fnal outcome.

Another widely used application o statistical analysis is in policy deci-sion. We hear a lot about the erosion o the middle class or that the mid-

dle class pays a larger percentage o its income in taxes than the lower

and upper classes. However, how do we know who is the middle class.

A set dollar amount o income would be inadequate because o ina-

tion, although, we must admit even a single dollar amount must also

be obtained using statistics. However, statistical analysis has a much

more meaningul and more elegant solution. Te concept o interquartile

range identifes the middle 50% o the population or income. Although

interquartile range was not designed to identiy the middle 50% and is

not explained in these terms, the combination o economics and statistics

is used to identiy the middle 50% or economics and policy decision

purposes.

Te knowledge o statistics can also help to identiy and comprehend

daily news and events. Recently, a report indicated that the chance o

accident or teenage drivers increases by 40% when there are passengers

in the car that are under 21 years o age. Tis is a meaningless report.

Few teenagers drive alone or have passengers over 21 years o age. otal


18/32

xviii INTRODUCTION

miles driven by teenagers when there passengers under 21 years o age ar

exceeds any other types o teenage driving. Other things equal, the moreyou drive, the higher the probability o an accident. Tis example indi-

cates that the knowledge o statistics is helpul in understanding everyday

events and in making sound analysis.

When an economic phenomenon is changed to produce a desirable

income, we need more powerul tools than simple statistics. Regression

analysis is one o the most widely used statistical tools at the disposal o

economists.

In regression analysis, the eect o one or more actor is measured to

determine another actor. Te frst group is also known as explanatory

variables, while the latter is known as endogenous variables. In econom-

ics it makes sense to reer to explanatory variables as policy instruments.

Policy instruments are variables that economists and policy makers can

change or control. Te supply o money is a policy instrument controlled

by the Federal Reserve. Te Fed has to collect data frst, which is done on

a periodic basis. Tese statistics inorm the Fed that there is a problem in

the economy, such as ination. Te Fed decides to reduce the supply o

money. It will wait or the economy to respond to the change in supply omoney. Ten economic indicators are measured again and tested against

the target set by the policy. I the policy objectives are not met, the action

is repeated until the desirable outcome is obtained.

When working with a regression model, one might wonder i it

was designed to serve economists. Even some o the commonly used

terminologies are the same in both felds. For example, both subjects use

explanatory variables to measure the response variable. ypical regres-

sion models do not consist o one explanatory variable and one response

variable. Instead, in addition to explanatory variables, the model has addi-

tional variables known as control variables. Control variables are actually

the same thing as economics shiters. Shiters in economics reer to

variables that are assumed to remain constant or the sake o identiying

the impact o the explanatory variables on the response variables. In

act, every economic theory seems to have the amous ceteris paribus,

which means other things being equal. When other things are not equal

and change, they do not distort the relationship between explanatory

and response variables. Tey simply shit the magnitude up or down,


19/32

INTRODUCTION xix

depending on the direction o the impact. Estimation o demand pro-

vides a good example. Economic theory states that an increase in pricereduces the quantity demanded, ceteris paribus. Te regression model or

this economic theory can be written as

Qd

=b0

+b1P+ e (I.1)

where e is the error term, which will be explained later. o complete the

process, we need to test the hypothesis that the coe cient o price, which

is also the slope o the demand curve, is negative. So we use statistics to

test the ollowing hypothesis:

H0: b

1= 0 H

1: b

1< 0

Te model, however, is not complete, because it is not subject to ceteris

paribusas it does not control anything. Simple control variables consist o

price o a complementary good, a substitute good, and income, to name

just a ew important ones. Te theory predicts that the eect o a change

in the price o a complementary good is inverse, the eect o a change inthe price o a substitute good is direct, and the eect o change in income

is direct. Tus, model (I.1) should be modifed as below.

Qd

=b0

+ b1P+ b

2P

c+b

3P

s+b

4Y+ + e, (I.2)

Te theoretical claims are written as

H0

: b1

= 0 H1

: b1

< 0

H0

: b2

= 0 H1

: b2

> 0

H0

: b3

= 0 H1

: b3

< 0,

where the subscripts use the frst letters o complementary and substi-

tute, and Yrepresents income. Te regression model clearly and perectly

matches the economic theory rom expected eects o each variable to the

concept oceteris paribus.


20/32


21/32

CHAPTER 1

The Concept of Regression

Relationship Between Variables

Oten we are interested in explaining a phenomenon using other actors.

Tere are numerous methods or accomplishing this objective. When thephenomenon is quantitatively measurable, the solution is much easier

and the methods are well established. One such method is regression.

In regression analysis, one variable (dependentvariable) is explained

by one or more variables (independentvariables). Beore explaining a

regression model, presenting an example o a simple model or explaining

consumption using income is benefcial. But we frst need to defne the

economic concept marginal propensity to consume (MPC).

Definition 1.1

Te marginal propensity to consume or MPC represents the amount

one would consume i one is given an extra dollar.

Consumption = subsistence consumption +

(marginal propensity to consume) (income).(1.1)

Conceptually, MPC is the same as the slope o regression line whenthere is only one independent variable. In equation (1.1), consump-

tion is the dependent variable and income is the independent variable.

Although the term dependent variable is commonly used in econom-

ics literature, other names such as endogenous variable, Y variable,

response variable, or even outputare oten used as well. Similarly, the

term independentvariable might be replaced byexogenous variable,

Xvariable, regressor, input, actor, or predictor variable.


22/32

2 REGRESSION FOR ECONOMICS

Equation (1.1) is a good example o the concept o regression, but it

is not a regression model. Te ormat or a regression model will be dis-cussed shortly. You are more likely to be amiliar with a mathematical

unction than a statistical unction such as regression. A mathematical

unction represents a nonprobabilistic association between a depend-

ent variable and one or more independent variables; the association is

exact and fxed (Figure 1.1a). A regression model is a simplifcation

o reality. It is actually aclaim o a relationship and thus, a testable

hypothesis. Te association between the dependent variable and the

independent variable(s) is probabilistic and not deterministic. It is

true on the average only. Figure 1.1b depicts pairs o (X, Y) observa-

tions relating dependent variable (Y) to the independent variable (X).

Many actors aect the actual value oYand cause the observation to

deviate rom the expected values. A regression model represents the

expected value.

Equation (1.1) is the equation o a line except that it is not written

in the customary orm (used in geometry). It is also a unction because

it provides a specifc outcome based on a linear rule, that is, as income

changes, consumption changes by the magnitude o theMPC. I incomebecomes zero, consumption drops to the level o subsistence consump-

tion, which is the level o consumption necessary to survive even i one

does not have any income. Note that here we are not interested in answer-

ing how one manages to pay or subsistence consumption, which could

be rom savings, selling household urniture, or something else. Tat is

Figure 1.1. Comparison of (a) a function with (b) a regression model.

a. A function

OX

Y

Y=b0

+b1

X

b. A regression line superimposed

on observations

OX

Y

Y=

b0+b1

X+e


23/32

THE CONCEPT OF REGRESSION 3

not the purpose o this model. Te purpose is to explain the level o con-

sumption in response to changes in income. Tis model is a simplifcationo reality. For example, it does not take into account the role that wealth

might play in explaining consumption. In a more elaborate model, addi-

tional independent variables could be included that might improve the

models ability to estimate the dependent variable more accurately and to

more closely approximate the reality.

Although this model is a good starting point, it is not a precise rep-

lication o reality. Nevertheless, it is the same as a simple consumption

unction explained in many introductory macroeconomics textbooks. As

such, it serves a similar purpose: introduces the concept, clarifes applica-

tion o the concept, and prepares or a more appropriate model.

Definition 1.2

Amodelis a simple representation o something real in lie.

Te level o representativeness is determined by the purpose o the

model and does not necessarily make a model more desirable, in part

because the purposes o a study aect the desirability o the level osophistication o the model.

Models need restrictions on their parameters to make sense. For

example, theMPChas to be positive and less than one. A negative MPC

means that as income increases, consumption decreases and eventually

drops below subsistence level, while an MPC greater than one means

that consumption at some point becomes larger than income. MPCval-

ues below zero or above one contradict reality and dey common sense.

Tereore, we restrictMPCto be between 0 and 1. In addition, negative

values or the independent variable o income and the dependent variable

o consumption are meaningless. Similarly, a negative subsistence level

would be impossible. However, there are situations where the estimate or

the subsistence level might turn out to be negative, but or the purpose o

this example they can be ignored.

Te our values o income, consumption, the MPC, and the sub-

sistence level are very dierent rom each other. Consumption and

income, the dependent and independent variables, are observable data.

Tis means we can gather data on actual income and consumption


24/32


levels o a sample o people. Te data are typically published and cus-

tomarily represented in a column ormat. Subsistence consumptionand MPC, however, are known as parameters. Parameters are almost

always unknown and have to be estimated. Although every nation has

an MPC at any given point in time, the actual value is unknown, as

is the case with the subsistence level o consumption. Te parameters

are estimated by the model using regression analysis. In the jargons o

regression, parameters are sometimes called coef cients or slopes. Te

interpretation o coe cients and their appropriate analyses are covered

in Chapter 6.

Definition 1.3

A parameter is a characteristic o a population that is o interest.

Parameters are constant and usually unknown.

Examples o parameters include population mean, population vari-

ance, and regression coe cients. One o the main purposes o statistics

is to obtain inormation rom a sample that can be used to make iner-

ences about population parameters. Te estimated value obtained rom asample is called astatistic.

Definition 1.4

Astatisticis a numerical value calculated rom a sample that is variable

and known.

Te word statistic has several meanings depending on the context:

two o its meanings are presented in the previous paragraph. Te frst useo the word reers to the science and the discipline o statistics. Te second

use is more specifc and is based on the above defnition. In the science o

statistics, we use statistics to make inerence about parameters.

Te slope and intercept terminologies used in geometry are also

commonly used to reer to coe cients in regression analysis. In the

consumption model, the corresponding analogy to geometry is that

MPCis the slope and subsistence level is the intercept o the consump-

tion line. According to this model, a dollar increase in income increases

consumption by the magnitude oMPC, which by defnition is the slope


25/32


o regression line. When income is zero, the amount o consumption is

equal to subsistence level and thereore, indicates the intercept.Te representative terms consumption and income used in

equation (1.1) only apply to this particular problem, which renders

them inapplicable when the problem is changed. Consider a model that

explains quantity demanded as a unction o price o a good. I the price

increases by one dollar, how much will the quantities demanded decrease?

An attempt to write this question in the orm o a model results in a

stalemate or a typical economist wishing to stick to vocabulary that has

economic meaning. In equation (1.2) below, the problematic value is des-

ignated by ? Te value that replaces ? answers the question i the

price increases by $1, (how much) will the quantity demanded decrease.

Te (how much) in the parenthesis does not have a defned economic

name, thus, or the time being it is represented by a question mark.

Quantity demanded =

demand when the good is ree + (?) (price)(1.2)

Te ? can be replaced by responsiveness o quantity demanded, orsome other unamiliar and arcane wording. Such arbitrary naming can only

cause conusion and should be avoided. A reasonably good alternative

or the (?), which would be close to the concept oMPCin equation (1.1),

could be coe cient o responsiveness o quantity demanded to changes

in price. One advantage o this term is the use o the previously defned

concept ocoef cient. While this phrasing still has the shortcomings o

the previous naming, it also has the added disadvantage o being long and

wordy. Furthermore, an astute student would recall that it resembles the

defnition oelasticity. In act, had the price and quantity been meas-

ured in units o natural logarithm, the question mark could be replaced by

price elasticity, as demonstrated in equation (1.3).

ln(quantity demanded) = demand when the good is ree +

(price elasticity o demand) (price),(1.3)

where ln indicates natural logarithm as is customary. Sometimes

equations that involve natural logarithm on both sides o the equation are


26/32


called loglog, but this is a poor and inappropriate terminology, as is the

name double-log equation.

Definition 1.5

Price elasticity of demandis the percentage change in quantity demanded

divided by the percentage change in price.

By expressing the price and quantity in natural logarithm, the coe-

fcient o the slope o the price variable becomes the same as the demand

elasticity. Tis is due to properties o the slope o regression line and math-

ematical properties o the natural logarithm. In Chapter 9, using loga-

rithm we address some modeling and data problems. In equation (1.3)

there is no good explanation or intercept, so or simplicity and brevity

it can be called by its generic term, namely the intercept. Nevertheless, it

is better to think o the model in economics terms as much as possible.

Although writing models in their economics equivalent terms is

extremely useul, it can also be a cumbersome process. At times, it is

helpul to use symbols instead o words. For example, i we replace con-

sumption with C, income with Y, and marginal propensity to consumewith MPC in equation (1.1), as is customary, we obtain the ollowing

equation:

C= subsistence level o consumption + (MPC) (Y) (1.4)

One might choose to represent subsistence level o consumption

with SLC, but the acronym is not customary and thus, it does not help

much. A more generic symbol might prove more pragmatic.

Parameters are customarily represented by Greek letters, which make

most people apprehensive. Consider the Greek letters as names or param-

eters, which are generic terms. Equation (1.4) can be written as

C= b0+b

1Y (1.5)

A novice mathematics student might be ill at ease with equation (1.4)

or (1.5) because in mathematics it is customary to use the letter Yor the

dependent variable, while here it is used to represent the independent


27/32


variable. Economists customarily use the letter Yor income and are airly

comortable with it. However, the ollowing ormat is not only preerredbut also more inormative:

Consumption = b0

+ b1

income (1.6)

Tis indicatesthat

i income changes by one unit, consumption

changes byb1

units in the direction o the sign ob1, which according

to consumption theory, should be positive. Tis theoretical expectation

o the outcome is the oundation o orming the alternative hypothesis.

For more inormation consult.1 For example, ib1

is 0.8, then as income

increases by $100, consumption will increase by $80. Tis expected out-

come can be verifed empirically, which makes it a testable hypothesis.

In order to test the magnitude o theMPC, the slope parameter (b) must

be estimated, as will be discussed later. Te next step ater estimating a

parameter is to test the estimated value against theoretical expectation.

In this example, it makes sense to test the estimate o the parameter to

determine i it is equal to the numeral one, which indicates zero savings

and zero borrowing. As it will become clear later, it would also make senseto test the estimated slope against the value o zero.

From a Mathematical Equation to a Regression Model

None o the equations that have been presented thus ar are actually

regression models. Tey are mathematical unctions and more specif-

cally, each is an equation o a line. Equations (1.1) and (1.4)(1.6) are

consumption lines, where consumption is a unction o income, while

equation (1.2) is a demand line or unction. Equation (1.3) is a line rep-

resenting the percentage change in quantity demanded as a unction o

percentage change in price. Its main parameter is the price elasticity o

demand, which is the coe cient o the independent variable percentage

change in price.

Te reason none o these equations are models is that they are exact

mathematical equations, as depicted in Figure 1.1a, and not a simplifca-

tion o a real phenomenon in lie. Tings in real lie occur with a degree

o uncertainty or probability and thus, they are random in nature. Adding


28/32


a random component to these equations converts them into a regres-

sion model. Te random component is called error term, or randomerror, or reasons that will be explained shortly. Te customary symbol

is the Greek letter epsilon (e), but (U) and (V) are also common. In

Figure 1.1b, the vertical distances between the actual observations and the

regression model are the error terms.

C= subsistence level o consumption + (MPC) (Y) + e (1.7)

Consumption = b0+ b

1income + e (1.8)

C=b0

+ b1Y+ e (1.9)

Te above three equations (1.7)(1.9) are regression models and

express exactly the same thing. Tey are models that state, on the average,

consumption depends on income in a linear ashion. Tese are all the

same as claiming that income explains average consumption. Note that

the use o the term average reers to average outcome or a dependent

variable, which because o random error is probabilistic in nature and hasan average. It is dierent than the concept o average consumption, which

is consumption divided by income.

Soon you will learn that having a model is not su cient; a model

must be useul, which is a concept that needs to be defned and clarifed.

For sake o completeness, the dependent variable (C) represents consump-

tion. For slope, we use the acronymMPC. Te independent variable (Y)

represents income. Epsilon (e) is the error term;b0(beta zero) is the inter-

cept, which represents the subsistence level, and b1

(beta one) is the slope,

which in this case represents theMPC.

Students and scholars should develop the habit o ollowing the same

procedure or regression models as it is customary in the proession. Te

dependent variable, what is being explained, appears on the let-hand side

o the equal sign. Examples rom the above models include consumption,

quantity demanded, and percentage change in quantity demanded. Te

term that is not related to the independent variable, the intercept, appears

as the frst term on the right-hand side o the equal sign. It represents

the value o the dependent variable in the case where the independent


29/32


variable ails to be signifcant, which is reected by a zero value or its

coe cient.Te independent variable and its coef cient are next on the

right-hand side o the equation. In the three examples above, there is one

independent variable in each model. Te independent variable or the

consumption model is income, while or the quantity demanded model it

is price. Finally, or the model estimating elasticity, the independent vari-

able is the percentage change in price. I there were more than one inde-

pendent variable, as will be the case soon, the variables ollow the same

pattern one ater the other but not necessarily in any particular order. In

act, the order in which independent variables are listed in a model has no

impact on the fnal output. Te coe cient o the independent variable is

also called slope o the line; however, it only makes sense i there is only

one independent variable, as has been the case with the examples so ar.

Customarily, the last term is the error term or e, which plays a very

important role in a regression model. It converts a mathematical unction

into a regression model that can be estimated using statistics. For a regres-

sion analysis to be valid, the error term must comply with certain require-

ments, which are customarily called assumptions. Te assumptions areplaced in the appendix because o the theoretical nature o the discussion.

The Meaning of Regression

As noted earlier, equations (1.1) and (1.4)(1.6) state the same thing,

while models (1.7), (1.8), or (1.9) are exactly identical. We choose

equation (1.6) and model (1.8) or comparison. Te dierence between

an equation like (1.6) and model (1.8) seems to be that model (1.8)

has one extra term, namely, the (e), which we learned is called the error

term. However, there are a number o major dierences between the two

equations. Some are simplistic, such as the act that equation (1.6) is a

mathematical unction, while equation (1.8) is a regression model. Te

other dierences need more explanation, which should clariy the dier-

ence between an equation and a model. A mathematical unction repre-

sents an exact relationship with exactly the same outcome each time it

is perormed. However, a model is a representation or simplifcation o

reality and includes a random error term to indicate that the outcome


30/32


is stochastic rather than deterministic. Te term stochastic means that

a model is probabilistic in nature; thereore, every time a new sample isobtained and the regression model is estimated, the results are slightly

dierent, reecting the random nature o the model.

In equation (1.6), the parameters b0

and b1

are known. In contrast,

in model (1.8) they are unknown and must be estimated. Te customary

use o equation (1.6) is to fnd the value o consumption with knowledge

o known parameters b0

and b1

and a given value o income. Te act

that b0

and b1

are known means anyone who chooses to insert a given

value o the independent variable income in the equation would always

get the same answer. No real data is necessary. I one chooses to use real

data such as per capita income or a country or years 19732010, it

is possible to obtain one value or consumption or each year. On the

other hand, in model (1.8) the parametersb0

andb1

are unknown, which

means it is impossible to obtain a value or consumption even with a

known value or income until parameters b0

and b1

are estimated using

regression analysis. In using model (1.8) the data or consumption and

income are available. Tey are historical values that have been observed

and cannot be changed or replaced arbitrarily. Using these observed val-ues the objective is to estimate the unknown parameters to obtain a line

that best fts the data. Te study o regression analysis deals with methods

or obtaining estimates or b0

and b1

that meet certain criteria deemed

desirable and also to determine i there is a set o estimates that is best; a

concept that must be defned clearly and precisely and will be covered in

Appendix A. Customarily, estimated parameters are represented by Greek

letters with a ^, called a hat symbol, as 0b and

1b . Tese are pro-

nounced beta-hat-sub-zero and beta-hat-sub-one, respectively.

A model represents aclaim about a real-lie phenomenon. For exam-

ple, model (1.8) claims that there is a cause and eect relationship between

income and consumption, that is, as income increases consumption

increases. One cannot include vice versaat the end o last sentence, because

based on economic theory it is not true. In economics, income determines

consumption while consumption does not determine income, at least not

in an introductory discussion o the subject. Te theory that states income

determines consumption belongs to economics not statistics. Te act that

in macroeconomics, consumption also depends on income, via a dierent


31/32


mechanism, is addressed later in a much more sophisticated analysis in

more advanced economic courses. A model, as a simplifcation o reality, isproposed to explain the causal relationship between income and consump-

tion. Regression analysis, as a statistical tool, is used to provide a theory

that determines i there is su cient evidence in real lie to support the

claim presented in economics. Te theories that justiy inerence based on

evidence belong to statistics not economics.

Tereore, every research model involves two dierent types o theo-

ries, one rom the discipline in which the research is conducted and the

other rom statistics. Te starting point or every research is the theoreti-

cal oundations o the discipline, which or us is economics. Te estima-

tion and inerence o the research are governed by theories in statistics.

Te frst set o theories originates in economics, which provides the oun-

dation or raising the research question and establishing the claim(s) o

the study. For research in other felds, the relevant subject provides the

appropriate theory or this purpose. Statistical theories govern the pro-

cedures and assure that outcomes have desirable properties and can be

generalized. Some o the desirable properties will be explained and veri-

fed in this manuscript. Lack o appropriate theories rom either the feldo economics or statistics invalidates the research outcome.

A consumption model like equation (1.8) is used to determine

whether there is empirical evidence to reute economic theory. Note that

economic theory does not make any assumption that parameters b0

and

b1

are known. Although it places restrictions on them, such as b1must be

a value between 0 and 1, when b1

representsMPC. Any number outside

the range 0 and 1 violates one or more economic rules or principles. A

slope greater than 1 means that a one unit increase in income would

increase consumption by more than 1 (or example, ib1

is 1.2, then a

$1.00 increase in income would increase consumption by $1.20), which

at least in this simplest o consumption models is impossible. Also, a

negative MPCmakes no economic sense. Teoretical properties o the

coe cient can also be tested statistically, as will be seen in Chapter 6.

In order to test any theory using a model there must be su cient data.

Because parameters o the proposed model (b0

and b1) are unknown, a

statistical method known as regression analysis is necessary. Regression

analysis is also called the method o least squares. Te simplest regression


32/32


analysis uses a model that has onlyone independent variable, such as

income, which means it has two parameters,b0 andb0. Tese parametersare also known as intercept and slope, respectively. Tis simple regression

analysis requires one set o data, customarily arranged in two columns,

one or the independent variable and another one or the dependent

variable, which in this case are income and consumption, respectively.

Estimated parameters depend on a particular observed set o data and

are shown as 0b andb1.

naghshpour chap one

Documents