addendum

45
KNR 445 Regressio n slide 1 Addendum Testing assumptions of simple linear regression 1

Upload: oro

Post on 24-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Addendum. Testing assumptions of simple linear regression. 1. Now, how does one go about it?. The approach taken in this course will be to teach you to control a In other words, teach cautious ways to go about your business, so that if you get a result you can interpret it appropriately - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

KNR 445Regressionslide 1AddendumTesting assumptions of simple linear regression1

KNR 445Regressionslide 2Now, how does one go about it?The approach taken in this course will be to teach you to control a In other words, teach cautious ways to go about your business, so that if you get a result you can interpret it appropriatelyThis requires that you know what to do to protect a......and that means testing the assumptions of the procedure...and knowing what happens to a if they are violated12

KNR 445Regressionslide 3Now, how does one go about it?And just as a by the way...Theres lots of slides in here that well flash by...but they provide a real step by step guide to completing some basic tests in the mid-term, so please be aware that the information is here!1

KNR 445Regression slide 4Testing assumptions of regression - 1Measurement levelIndependent must be interval or dichotomousDependent must be intervalHow to test?You already knowIf condition violated?Dont use regression!

1

KNR 445Regressionslide 5Testing the assumptions for regression - 2Normality (interval level variables)Skewness & Kurtosis must lie within acceptable limits (-1 to +1)How to test?You can examine a histogram, but SPSS also provides procedures, and these have convenient rules that can be applied (see following slides)If condition violated?Regression procedure can overestimate significance, so should add a note of caution to the interpretation of results (increases type I error rate)12

3

5KNR 445Regressionslide 6

Testing the assumptions - normalityTo compute skewness and kurtosis for the included cases, select Descriptive Statistics|Descriptives from the Analyze menu.1

KNR 445Regressionslide 7

Testing the assumptions - normalitySecond, click on the Options button to specify the statistics to compute.First, move the variables to the Variable(s) list box. In this case there are two interval variables (the IV and the DV)1

KNR 445Regressionslide 8

Testing the assumptions - normalitySecond, click on the Continue button to complete the options.First, mark the checkboxes for Kurtosis and Skewness.1

KNR 445Regression: slide 9

Testing the assumptions - normalityClick on the OK button to indicate the request for statistics is complete.1

KNR 445Regressionslide 10

SPSS output to evaluate normalityThe simple linear regression requires that the interval level variables in the analysis be normally distributed. The skewness of NUMBER OF HOURS WORKED LAST WEEK for the sample (-0.333) is within the acceptable range for normality (-1.0 to +1.0) , but the kurtosis (1.007) is outside the range. The assumption of normality is not satisfied for NUMBER OF HOURS WORKED LAST WEEK. The skewness of RS OCCUPATIONAL PRESTIGE SCORE (1980) for the sample (0.359) is within the acceptable range for normality (-1.0 to +1.0) and the kurtosis (-0.692) is within the range. The assumption of normality is satisfied for RS OCCUPATIONAL PRESTIGE SCORE (1980). The assumption of normality required by the simple linear regression is not satisfied. A note of caution should be added to any findings based on this analysis.12

KNR 445Regressionslide 11Testing the assumptions 3Linearity & homoscedasticity for interval level variablesHow to test?Scatterplot (see following slides)If condition violated?Can underestimate significance loses power, increases possibility of type II error

1

Testing the assumptions linearity and homoscedasticity

First, select the chart builder.

2. Second, choose scatter/dot from the chart gallery KNR 445Regressionslide 1212

3

Testing the assumptions linearity and homoscedasticityKNR 445Regressionslide 131

KNR 445Regressionslide 14

The scatterplot for evaluating linearityThe simple linear regression assumes that the relationship between the independent variable RS OCCUPATIONAL PRESTIGE SCORE (1980)" and the dependent variable "NUMBER OF HOURS WORKED LAST WEEK" is linear. The assumption is usually evaluated by visual inspection of the scatterplot. Violation of the linearity assumption may result in an understatement of the strength of the relationship between the variables.1

KNR 445Regressionslide 15The scatterplot for evaluating linearityLinear all is well with 1Non-linear will underestimate significance

KNR 445Regressionslide 16

The scatterplot for evaluating homoscedasticityThe simple linear regression assumes that the range of the variance for the dependent variable is uniform for all values of the independent variable. For an interval level independent variable, the assumption is evaluated by visual inspection of the scatterplot of the two variables. Violation of the homogeneity assumption may result in an understatement of the strength of the relationship between the variables.1

KNR 445Regressionslide 17The scatterplot for evaluating homoscedasticityHomoscedastic all is well with 1

Heteroscedastic will underestimate significanceKNR 445Regressionslide 18Testing the assumptions for simple regression 4Linearity & homoscedasticity for a dichotomous independent variableHow to test?Linearity only 2 levels, so not relevant here (see next slide)Homoscedasticity via Levenes test of homogeneity of variance in ANOVA (see following slides)If condition violated?Can underestimate significance loses power, increases possibility of type II error1

KNR 445Regressionslide 19

Testing the assumptions linearity for a dichotomous IVWhen the independent variable is dichotomous, we do not have a meaningful scatterplot that we can interpret for linearity.

The assumption of a linear relationship between the independent and dependent variable is only tested when the independent variable is interval level.1

KNR 445Regressionslide 20

Testing the assumptions - homoscedasticity for a dichotomous IVTo conduct the test of homoscedasticity, we will use the One-Way ANOVA procedure.

Select the command Compute Means | One-Way ANOVA from the Analyze menu.1

KNR 445Regressionslide 21

Testing the assumptions - homoscedasticity for a dichotomous IVSecond, move the variable compuse to the Factor text box.First, move the variable prestg80 to to the Dependent list box.Third, click on the Options button to specify the statistics to compute.1

KNR 445Regressionslide 22

Testing the assumptions - homoscedasticity for a dichotomous IVFirst, mark the Homogeneity-of-variance check box to request the Levene test.Second, click on the Continue button to complete the request.1

KNR 445Regressionslide 23

Testing the assumptions - homoscedasticity for a dichotomous IVClick on the OK button to indicate the request for statistics is complete.1

KNR 445Regressionslide 24

Result of test of homoscedasticity for a dichotomous independent variableThe simple linear regression assumes that the variance for the dependent variable is uniform for all groups. This assumption is evaluated with Levene's test for equality of variances. The null hypothesis for this test states that the variances of all groups are equal. The desired outcome for this test is to fail to reject the null hypothesis. Since the probability associated with the Levene test (0.141) is greater than the level of significance (0.05), the null hypothesis is not rejected. The requirement for equal variances is satisfied.1

KNR 445Regressionslide 25Assumptions tested run the analysisNow youve tested the assumptions, heres a quick run through of how to run the test and how to interpret resultsFirst an example with two interval level variables1

KNR 445Regressionslide 26

Running the analysis interval IVsTo conduct a simple linear regression, select the Regression | Linear from the Analyze menu.KNR 445Regressionslide 27

Running the analysis interval IVsFirst, move the dependent variable hrs1" to the text box for the Dependent variable.Second, move the independent variable prestg80" to the list of Independent variables.Third, click on the OK button to complete the request.KNR 445Regressionslide 28

The existence of a relationshipThe determination of whether or not there is a relationship between the independent variable and the dependent variable is based on the significance of the regression in the ANOVA table.

The probability of the F statistic for the regression relationship is 0.041, less than or equal to the level of significance of 0.05. We reject the null hypothesis that there is no relationship between the independent and the dependent variable. KNR 445Regressionslide 29

The strength of the relationshipThe strength of the relationship is based on the R-square statistic, which is the square of the R, the correlation coefficient.

We evaluate the strength of the relationship using the rule of thumb for interpreting R:Between 0 and 0.20 - Very weak Between 0.20 and 0.40 - WeakBetween 0.40 and 0.60 - ModerateBetween 0.60 and 0.80 - StrongBetween 0.80 and 1.00 - Very strongKNR 445Regressionslide 30

The direction of the relationshipThe direction of the relationship, direct or inverse, is based on the sign of the B coefficient for the independent variable.

Since 0.138 is positive, there is a positive relationship between occupational prestige and hours worked.KNR 445Regressionslide 31

Interpret the interceptThe intercept (Constant) is the position on the vertical y-axis where the regression line crosses the axis.

It is interpreted as the value of the dependent variable when the value of the independent variable is zero. It is seldom a useful piece of information.KNR 445Regressionslide 32

Interpret the slopeThe B coefficient of the independent variable is called the slope. It represents the amount of change in the dependent variable for a one-unit change in the independent variable.

Each time that occupational prestige increases or decreases by one point, we would expect the subject to work 0.138 more or 0.138 fewer hours.KNR 445Regressionslide 33

Significance test of the slopeIf there is no relationship between the variables, the slope would be zero. The hypothesis test of the slope tests the null hypothesis that the b coefficient, or slope, is zero.

In simple linear regression, the significance of this test matches that of the overall test of relationship between dependent and independent variables. In multiple regression, the test of overall relationship will differ from the test of each individual independent variable.KNR 445Regressionslide 34

Conclusion of the analysisFor the population represented by this sample, there is a very weak relationship between "RS OCCUPATIONAL PRESTIGE SCORE (1980)" and "NUMBER OF HOURS WORKED LAST WEEK."

Specifically, we would expect a one unit increase in occupational prestige score to produce a 0.138 increase in number of hours worked in the past week.Because of the earlier problems stated with normality, the statistical conclusion must be expressed with caution.KNR 445Regressionslide 35Running the analysis mixed IVsNow an example with an interval dependent variable and a dichotomous independent variable...KNR 445Regression:slide 36

SPSS output to evaluate normalityThe simple linear requires that the interval level variables in the analysis be normally distributed. The skewness of RS OCCUPATIONAL PRESTIGE SCORE (1980) for the sample (0.324) is within the acceptable range for normality (-1.0 to +1.0) and the kurtosis (-0.817) is within the range.The assumption of normality is satisfied for RS OCCUPATIONAL PRESTIGE SCORE (1980). KNR 445Regressionslide 37

The strength of the relationshipThe strength of the relationship is based on the R-square statistic in the Model Summary table of the regression output. R-square is the square of the R, the correlation coefficient.

We evaluate the strength of the relationship using the rule of thumb for interpreting R:Between 0 and 0.20 - Very weak Between 0.20 and 0.40 - WeakBetween 0.40 and 0.60 - ModerateBetween 0.60 and 0.80 - StrongBetween 0.80 and 1.00 - Very strongKNR 445Regression: slide 38

The direction of the relationshipThe direction of the relationship, direct or inverse, is based on the sign of the B coefficient for the independent variable.

Since -7.406 is negative, there is an inverse relationship between using a computer and occupational prestige. What this means exactly will depend on the way the computer use variable is coded.KNR 445Regression: slide 39

Interpret the interceptThe intercept (Constant) is the position on the vertical y-axis where the regression line crosses the axis.

It is interpreted as the value of the dependent variable when the value of the independent variable is zero. It is seldom a useful piece of information.KNR 445Regressionslide 40

Interpret the slopeThe b coefficient for the independent variable "R USE COMPUTER" is -7.406. The b coefficient is the amount of change in the dependent variable "RS OCCUPATIONAL PRESTIGE SCORE (1980)" associated with a one unit change in the independent variable.

Since the independent variable is dichotomous, a one unit increase implies a change from the category YES(code value = 1) to the category NO(code value = 2).KNR 445Regressionslide 41

Significance test of the slopeIf there is no relationship between the variables, the slope would be zero. The hypothesis test of the slope tests the null hypothesis that the b coefficient, or slope, is zero.

In simple linear regression, the significance of this test matches that of the overall test of relationship between dependent and independent variables. In multiple regression, the test of overall relationship will differ from the test of each individual independent variable.KNR 445Regressionslide 42

Conclusion...For the population represented by this sample, there is a weak relationship between "R USE COMPUTER" and "RS OCCUPATIONAL PRESTIGE SCORE (1980)."

Specifically, we would expect survey respondents who used a compute to average 7.406 less for occupational prestige score than survey respondents who worked part-time.No problems with assumptions, so no need to express caution in this case.KNR 445Regressionslide 43Simple linear regression chart - 1The following is a guide to the decision process for answering simple linear regression questions.Is the level of measurement okay?

Independent: interval or dichotomousDependent: interval

Incorrect application of a statisticIs the assumption of normality satisfied?

Skewness, kurtosis of dependent variable: 1.0 to +1.0Add caution if the question turns out to be trueYesNoNoYes12

KNR 445Regressionslide 44Simple linear regression chart - 2Is the assumption of linearity satisfied?

Examine scatterplot

Is the assumption of homoscedasticity satisfied?

Levene test for dichotomous independent variableExamine scatterplot for interval independent variableYesYesNoNoAdd caution if the question turns out to be trueAdd caution if the question turns out to be true1

KNR 445Regressionslide 45Simple linear regression chart - 3Does the size and direction of the intercept and the slope agree with the problem statement?Fail to reject null hypothesisYesNoReject null hypothesis

Is the probability of the F for the regression relationship less than or equal to the level of significance?YesNoFail to reject null hypothesis1