1 lecture ten. 2 lecture part i: regression part ii: experimental method

45
1 Lecture Ten

Upload: osborn-poole

Post on 30-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

1

Lecture Ten

Page 2: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

2

Lecture

• Part I: Regression

• Part II: Experimental Method

Page 3: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

3

Outline: Regression

• The Assumptions of Least Squares

• The Pathologies of Least Squares

• Diagnostics for Least Squares

Page 4: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

Assumptions

• Expected value of the error is zero, E[e)t)]= 0

• The error is independent of the explanatory variable, E{e(t) [x(t)-Ex(t)]}=0

• The errors are independent of one another, E[e(i)e(j)] = 0 , i not equal to j.

• The variance is homoskedatic, E[e(i)]2=E[e(j)]2

• The error is normal with mean zero and variance

2

Page 5: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

5

18.4 Error Variable: Required Conditions

• The error is a critical part of the regression model.• Four requirements involving the distribution of

must be satisfied.– The probability distribution of is normal.– The mean of is zero: E() = 0.

– The standard deviation of is for all values of x.

– The set of errors associated with different values of y are all independent.

Page 6: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

The Normality of

From the first three assumptions we have:y is normally distributed with meanE(y) = 0 + 1x, and a constant standard deviation

From the first three assumptions we have:y is normally distributed with meanE(y) = 0 + 1x, and a constant standard deviation

0 + 1x1

0 + 1x2

0 + 1x3

E(y|x2)

E(y|x3)

x1 x2 x3

E(y|x1)

The standard deviation remains constant,

but the mean value changes with x

Page 7: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

7

Pathologies

• Cross section data: error variance is heteroskedatic. Example, could vary with firm size. Consequence, all the information available is not used efficiently, and better estimates of the standard error of regression parameters is possible.

• Time series data: errors are serially correlated, i.e auto-correlated. Consequence, inefficiency.

Page 8: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

8

Pathologies ( Cont. )• Explanatory variable is not independent of

the error. Consequence, inconsistency, i.e. larger sample sizes do not lead to lower standard errors for the parameters, and the parameter estimates (slope etc.) are biased.

• The error is not distributed normally. Example, there may be fat tails. Consequence, use of the normal may underestimate true 95 % confidence intervals.

Page 9: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

9

Pathologies (Cont.)

• Multicollinearity: The independent variables may be highly correlated. As a consequence, they do not truly represent separate causal factors, but instead a common causal factor.

Page 10: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

10

18.9 Regression Diagnostics - I

• The three conditions required for the validity of the regression analysis are:– the error variable is normally distributed.– the error variance is constant for all values of x.– The errors are independent of each other.

• How can we diagnose violations of these conditions?

Page 11: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

11

Residual Analysis• Examining the residuals (or standardized

residuals), help detect violations of the required conditions.

• Example 18.2 – continued:– Nonnormality.

• Use Excel to obtain the standardized residual histogram.

• Examine the histogram and look for a bell shaped. diagram with a mean close to zero.

Page 12: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

12

Diagnostics ( Cont. )

• Multicollinearity may be suspected if the t-statistics for the coefficients of the explanatory variables are not significant but the coefficient of determination is high. The correlation between the explanatory variable can then be calculated. To see if it is high.

Page 13: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

13

Diagnostics

• Is the error normal? Using EViews, with the view menu in the regression window, a histogram of the distribution of the estimated error is available, along with the coefficients of skewness and kurtosis, and the Jarque-Bera statistic testing for normality.

Page 14: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

14

Diagnostics (Cont.)

• To detect heteroskedasticity: if there are sufficient observations, plot the estimated errors against the fitted dependent variable

Page 15: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

Heteroscedasticity• When the requirement of a constant variance is violated

we have a condition of heteroscedasticity.• Diagnose heteroscedasticity by plotting the residual

against the predicted y.

+ + ++

+ ++

++

+

+

+

+

+

+

+

+

+

+

++

+

+

+

The spread increases with y

y

Residualy

+

+++

+

++

+

++

+

+++

+

+

+

+

+

++

+

+

Page 16: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

16

Homoscedasticity• When the requirement of a constant variance is not violated we

have a condition of homoscedasticity.• Example 18.2 - continued

-1000

-500

0

500

1000

13500 14000 14500 15000 15500 16000

Predicted Price

Re

sid

ua

ls

Page 17: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

17

Diagnostics ( Cont.)

• Autocorrelation: The Durbin-Watson statistic is a scalar index of autocorrelation, with values near 2 indicating no autocorrelation and values near zero indicating autocorrelation. Examine the plot of the residuals in the view menu of the regression window in EViews.

Page 18: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

18

Non Independence of Error Variables

– A time series is constituted if data were collected over time.

– Examining the residuals over time, no pattern should be observed if the errors are independent.

– When a pattern is detected, the errors are said to be autocorrelated.

– Autocorrelation can be detected by graphing the residuals against time.

Page 19: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

19

Patterns in the appearance of the residuals over time indicates that autocorrelation exists.

+

+++ +

++

++

+ +

++ + +

+

++ +

+

+

+

+

+

+Time

Residual Residual

Time+

+

+

Note the runs of positive residuals,replaced by runs of negative residuals

Note the oscillating behavior of the residuals around zero.

0 0

Non Independence of Error Variables

Page 20: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

20

Fix-Ups

• Error is not distributed normally. For example, regression of personal income on explanatory variables. Sometimes a transformation, such as regressing the natural logarithm of income on the explanatory variables may make the error closer to normal.

Page 21: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

21

Fix-ups (Cont.)• If the explanatory variable is not

independent of the error, look for a substitute that is highly correlated with the dependent variable but is independent of the error. Such a variable is called an instrument.

Page 22: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

22

Data Errors: May lead to outliers

• Typos may lead to outliers and looking for ouliers is a good way to check for serious typos

Page 23: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

23

Outliers• An outlier is an observation that is unusually small or

large.• Several possibilities need to be investigated when an

outlier is observed:– There was an error in recording the value.

– The point does not belong in the sample.

– The observation is valid.

• Identify outliers from the scatter diagram.

• It is customary to suspect an observation is an outlier if its |standard residual| > 2

Page 24: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

24

+

+

+

+

+ +

+ + ++

+

+

+

+

+

+

+

The outlier causes a shift in the regression line

… but, some outliers may be very influential

++++++++++

An outlier An influential observation

Page 25: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

25

Procedure for Regression Diagnostics• Develop a model that has a theoretical basis.• Gather data for the two variables in the model.• Draw the scatter diagram to determine whether a linear

model appears to be appropriate.• Determine the regression equation.• Check the required conditions for the errors.• Check the existence of outliers and influential observations• Assess the model fit.

• If the model fits the data, use the regression equation.

Page 26: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

26

Part II: Experimental Method

Page 27: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

27

Outline

• Critique of Regression

Page 28: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

28

Critique of Regression

• Samples of opportunity rather than random sample

• Uncontrolled Causal Variables– omitted variables– unmeasured variables

• Insufficient theory to properly specify regression equation

Page 29: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

29

Experimental Method: # Examples

• Deterrence

• Aspirin

• Miles per Gallon

Page 30: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

30

Deterrence and the Death Penalty

Page 31: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

31

Isaac Ehrlich Study of the Death Penalty: 1933-1969

Isaac Ehrlich Study of the Death Penalty: 1933-1969

Homicide Rate Per CapitaControl Variables

probability of arrestprobability of conviction given charged Probability of execution given conviction

Causal Variableslabor force participation rateunemployment ratepercent population aged 14-24 yearspermanent incometrend

Page 32: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

32

Long Swings in the Homicide Rate in the US: 1900-1980

Source: Report to the Nation on Crime and Justice

Page 33: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

33

Ehrlich Results: Elasticities of Homicide with respect to Controls

Ehrlich Results: Elasticities of Homicide with respect to Controls

Control Elasticity Average Valueof Control

Prob. of Arrest -1.6 0.90

Prob. of ConvictionGiven Charged

-0.5 0.43

Prob. of ExecutionGiven Convicted

-0.04 0.026

Source: Isaac Ehrlich, “The Deterrent Effect of Capital Punishment

Page 34: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

34

Critique of Ehrlich by Death Penalty Opponents

Critique of Ehrlich by Death Penalty Opponents

Time period used: 1933-1968period of declining probability of execution

Ehrlich did not include probability of imprisonment given conviction as a control variable

Causal variables included are unconvincing as causes of homicide

Page 35: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

35

United States Bureau of Justice Statisticshttp://www.ojp.usdoj.gov/bjs/

Page 36: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

36

Experimental Method

• Police intervention in family violence

Page 37: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

37

http://www.ojp.usdoj.gov/bjs/

United States Bureau of Justice Statistics

Page 38: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

38

United States Bureau of Justice Statisticshttp://www.ojp.usdoj.gov/bjs/

Page 39: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

39

Police Intervention with Experimental Controls

Police Intervention with Experimental Controls

A 911 call from a family memberthe case is randomly assigned for “treatment”

A police patrol responds and visits the householdpolice calm down the family membersbased on the treatment randomly assigned, the

police carry out the sanctions

Page 40: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

40

Why is Treatment Assigned Randomly?

Why is Treatment Assigned Randomly?

To control for unknown causal factorsassign known numbers of cases, for example

equal numbers, to each treatmentwith this procedure, there should be an even

distribution of difficult cases in each treatment group

Page 41: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

41

911 call(characteristics of household Participants unknown)

Random Assignment

code blue code gold

patrol responds patrol responds

settles the household settles the household

verbally warn the husband take the husband to jail for the night

Page 42: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

42

Experimental Method: Clinical Trials• Doctors Volunteer

• Randomly assigned to two groups

• treatment group takes an aspirin a day

• the control group takes a placebo (sugar pill) per day

• After 5 years, 11,037 experimentals have 139 heart attacks (fatal and non fatal) pE = 0.0126

• after 5 years, 11034 controls have 239 heart attacks, pc= 0.0217

Page 43: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

Conclusions from the Clinical Trials• Hypotheses: H0 : pC = pE , or pC - pE = 0.; Ha : (pC -

pE ) 0.

• Statistic:Z = [ C - E ) – (pC - pE )]/( pC - pE )

• Var C - E ) = Var( C ) + Var ( E )

• recall, from the variance for a proportionSE SE( C -

E )={[ c (1- c )]/nc + [ E(1- E )]/nE }1/2

• { [0.={[0217 ( 1- 0.0217)/ 11,034] + [0.0126 ( 1 – 0.0126)/ 11,039}1/2

• = 0.00175, so z = (.2017-.0126)/.00175• z= 5.2•

pp

pp p p

p p p p p p

Page 44: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

44

Experimental Method

• Experimental Design: Paired Comparisons

Page 45: 1 Lecture Ten. 2 Lecture Part I: Regression Part II: Experimental Method

Cab Brand A Brand B Difference

1 27.01 26.95 0.06

2 20.00 20.44 -0.44

3 23.41 25.05 -1.64

4 25.22 26.32 -1.10

5 30.11 29.56 0.55

6 25.55 26.60 -1.05

7 22.23 22.93 -0.70

8 19.78 20.23 -0.45

9 33.45 33.95 -0.50

10 25.22 26.01 -0.79

Sample Mean 25.20 25.80 -0.60

Standard Deviation 4.27 4.10 0.61

Table 1: Miles Per Gallon for Brand A and Brand B