sw388r6 data analysis and computers i slide 1 simple linear regression key points about statistical...

64
SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework Problem Solving the Problem with SPSS Logic for Simple Linear Regression

Upload: eustace-daniel

Post on 24-Dec-2015

228 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 1

Simple Linear Regression

Key Points about Statistical Test

Visualizing Regression Analysis

Sample Homework Problem

Solving the Problem with SPSS

Logic for Simple Linear Regression

Page 2: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 2

Regression Analysis

Regression analysis is the generic term for several statistical tests for evaluating the relationship between interval level dependent and independent variables.

When we are considering the relationship between one dependent variable and one independent variable, we use Simple Linear Regression.

When we are considering the relationship between one dependent variable and more than one independent variable, we use Multiple Regression.

SPSS uses the same procedure for both Simple Linear Regression and Multiple Regression, which adds some complications to our interpretation.

Page 3: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 3

Purpose of Simple Linear Regression - 1

The purpose of simple linear regression analysis is to answer three questions that have been identified as requirements for understanding the relationship between an independent and a dependent variable:

Is there a relationship between the two variables?

How strong is the relationship (e.g. trivial, weak, or strong; how much does it reduce error)?

What is the direction of the relationship (high scores are predictive of high or low scores)?

Page 4: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 4

Purpose of Simple Linear Regression - 2

The question of the existence of a relationship between the variables is answered by the hypothesis test in regression analysis.

The strength of the relationship is based on interpretation of the correlation coefficient, r (as trivial, small, medium, large) and/or the coefficient of determination, r-squared (as the proportion that error was produced or accuracy was improved).

The question of the direction of the relationship is based on the interpretation of the sign of the b coefficient or the beta coefficient.

Page 5: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 5

Simple Linear Regression: Examples

There is a relationship between undergraduate GPA’s and graduate GPA’s.

GRE scores are a useful predictor of graduate GPA’s.

For social work students, the relationship between GPA and future income enables us to predict future earnings based on academic performance.

Page 6: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 6

Simple Linear Regression - 1

When we studied measures of central tendency, we showed that the best measure of central tendency for an interval level variable (assuming it is not badly skewed) was the mean.

When we used the mean as the estimated score for all cases in the distribution, the total error computed for all of the cases was smaller than the error would be for any other value used for the estimate.

Error was measured as the deviation or difference between the mean and the score for each case, squared and summed.

Page 7: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 7

Simple Linear Regression - 2

Simple linear regression tests the existence of a relationship between an independent and a dependent variable by determining whether or not there is a statistically significant reduction in total error if we use the scores on the independent variable to estimate the scores on the dependent variable.

Regression analysis finds the equation or formula for the straight line that minimizes the total error.

The regression equation takes the algebraic form for a straight line: y = a + bx, where y is the dependent variable, x is the independent variable, b is the slope of the line, and a is the point at which the line crosses the y axis.

Page 8: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 8

The Regression Equation

The regression equation is the algebraic formula for the regression line, which states the mathematical relationship between the independent and the dependent variable.

We can use the regression line to estimate the value of the dependent variable for any value of the independent variable.

The stronger the relationship between the independent and dependent variables, the closer these estimates will come to the actual score that each case had on the dependent variable.

Page 9: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 9

Components of the Regression Equation

The regression equation has two components. The first component is a number called the y-

intercept that defines where the regression line crosses the vertical y axis.

The second component is called the slope of the line, and is a number that multiplies the value of the independent variable.

These two elements are combined in the general form for the regression equation: the estimated score on the dependent variable = the y-intercept + the slope × the score on the

independent variable

Page 10: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 10

The Standard Form of the Regression Equation

The standard form for the regression equation or formula is:

Y = a + bX

where Y is the estimated score for the dependent variable X is the score for the independent variable b is the slope of the regression line, or the multiplier

of X a is the intercept, or the point on the vertical axis

where the regression line crosses the vertical y-axis

Page 11: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 11

Depicting the Regression Equation

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

0.00.5

1.01.5

2.02.5

3.03.5

4.04.5

5.0

x

y

y = 1.0 + 0.5 x

The y-intercept is the point on the vertical y-axis where the regression line crosses the axis, i.e. 1.0.

The slope is the multiplier of x. It is the amount of change in y for a change of one unit in x.

If x changes one unit from 2.0 to 3.0, depicted by the blue arrow, y will change by 0.5 units, from 2.0 to 2.5 as depicted by the red arrow.

The regression equation includes both the y-intercept and the slope of the line. The y-intercept is 1.0 and the slope is 0.5.

Page 12: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 12

Deriving the Regression Equation - 1

In this plot, none of the points fall on the regression line.

The difference between the actual value for the dependent variable and the predicted value for each point is shown by the red lines. These differences are called the residuals, and represent the errors between the actual and predicted values.

y = 0.8 + 0.6 x

0

1

2

3

4

5

0 1 2 3 4 5

x

y

Page 13: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 13

Deriving the Regression Equation - 2

The regression equation is computed to minimize the total amount of error in predicting values for the dependent variable. The method for deriving the equation is called the "method of least squares," meaning that the regression line minimizes the sum of the squared residuals, or errors between actual and predicted values.

y = 0.8 + 0.6 x

0

1

2

3

4

5

0 1 2 3 4 5

x

y

Page 14: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 14

Interpreting the Regression Equation: the Intercept

The intercept is the point on the vertical axis where the regression line crosses the axis. It is the predicted value for the dependent variable when the independent variable has a value of zero.

This may or may not be useful information depending on the context of the problem.

Page 15: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 15

Interpreting the Regression Equation: the Slope

The slope is interpreted as the amount of change in the predicted value of the dependent variable associated with a one unit change in the value of the independent variable.

If the slope has a negative sign, the direction of the relationship is negative or inverse, meaning that the scores on the two variables move in opposite directions.

If the slope has a positive sign, the direction of the relationship is positive or direct, meaning that the scores on the two variables move in the same direction.

Page 16: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 16

Interpreting the Regression Equation: the Slope equals 0

If there is no relationship between two variables, the slope of the regression line is zero and the regression line is parallel to the horizontal axis.

A slope of zero means that the predicted value of the dependent variable will not change, no matter what value of the independent variable is used.

If there is no relationship, using the regression equation to predict values of the dependent variable is no improvement over using the mean of the dependent variable.

Page 17: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 17

Simple Linear Regression: Hypotheses

The hypothesis tested in simple linear regression is based on the slope or angle of the regression line.

Hypotheses: Null: the slope of the regression line as measured by the b

coefficient = 0, i.e. there is no relationship Research: the slope of the regression line as measured by

the b coefficient ≠ 0, i.e. there is a relationship

The b coefficient is tested with a two-tailed t-test

Decision: Reject null hypothesis if pSPSS ≤ alpha

Page 18: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 18

Simple Linear Regression: Level of Measurement

Dependent variable is interval level (ordinal with caution)

Independent variable is interval level (ordinal with caution) or dichotomous

Page 19: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 19

Simple Linear Regression: Sample Size Requirements - 1

In previous semesters, the rule of thumb for required sample size that I have used was a minimum of 5 cases for each independent variable included in the analysis, and preferably 15 cases for each independent variable. This rule was based on the text Multivariate Data Analysis by Hair, Black, Babin, Anderson, and Tatham.

Since attempting to incorporate more material on power analysis, I find that rule to be inadequate because we are unlikely to achieve statistical significance in all but the simplest problems that contain very strong relationships.

Page 20: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 20

Simple Linear Regression: Sample Size Requirements - 2

In Using Multivariate Statistics, Tabachnick and Fidell recommend that the required number of cases should be the larger of the number of independent variables x 8 + 50 or the number of independent variables + 105.

Following this rule, simple linear regression with one independent variable would require a sample of 106 cases.

Page 21: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 21

Simple Linear Regression: Assumptions

The relationship between the variables is linear

The residuals (errors) have the same variance for all values of the independent variable

The residuals (errors) are independent, i.e. not correlated from one case to the next

The residuals (errors) are normally distributed

We will defer the evaluation of assumptions until the next class.

Page 22: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 22

The t-test for the test of the Beta coefficient :

β = -.34, t(225) = 6.53, p < .01

Example: Social support significantly predicted depression scores, β = -.34, t(225) = 6.53, p < .01. Social support also explained a significant proportion of variance in depression scores, R2 = .12, F(1, 225) = 42.64, p < .01.

Simple Linear Regression: APA Style

Value of t-statistic

Significance of t-statistic

The beta coefficient

Degrees of freedom for t-test (not in SPSS output)

Page 23: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 23

Visualizing Regression Analysis - 1

While we will base our problem solving on numeric statistical results computed by SPSS, we can use a scatterplot to demonstrate regression graphically.

We will use the variable "highest year of school completed" [educ] as the independent variable and "occupational prestige score" [prestg80] as the dependent variable from the GSS2000R data set to demonstrate the relationship graphically.

Page 24: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 24

Visualizing Regression Analysis - 2

The dots in the body of the chart represented the cases in the distribution.The independent

variable is plotted on the x-axis, or the horizontal axis.

The dependent variable is plotted on the y-axis, or the vertical axis.

A scatterplot of prestg80 by educ produced by SPSS.

Page 25: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 25

Visualizing Regression Analysis - 3

The differences between the mean line and the dots (shown as pink lines), are the deviations.

The sum of the squared deviations is the measure of total error when the mean is used as the estimated score for each case.

I have drawn a green horizontal line through the mean of prestg80 (44.17).

NOTE: the plots were created in SPSS by adding features to the default plot.

Page 26: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 26

Visualizing Regression Analysis - 4

A regression line and the regression equation are added in red to the scatterplot.

The pink deviations from the mean have been replaced with the orange deviations from the regression line. Deviations between cases and the regression line are called residuals.

Page 27: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 27

Visualizing Regression Analysis - 5

The existence of a relationship between the variables is supported when the sum of the squared orange residuals is significantly less than the sum of the squared pink deviations

Recall that both deviations and residuals can be referred to as errors. If there is a relationship, we can characterize it as a reduction in error.

Page 28: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 28

Visualizing Regression Analysis – 6

While it is difficult for us to square and sum deviations and residuals, SPSS regression output provides us with the answer.

The squared sum of the pink deviations from the mean is the Total Sum of Squares in the ANOVA table (49104.91).

The squared sum of the orange residuals from the regression line is the Residual Sum of Squares in the ANOVA table (37086.80).

Page 29: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 29

Visualizing Regression Analysis – 7

The difference between the Total Sum of Squares and the Residual Sum of Squares is the Regression Sum of Squares.

The Regression Sum of Squares is the amount of error that can be eliminated by using the regression equation to estimate values of prestg80 instead of the mean of prestg80.

The Regression Sum of Squares in the ANOVA table is 12018.11.

Page 30: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 30

Visualizing Regression Analysis – 8

We can compute the proportion or error that was reduced by the regression by dividing the Regression Sum of Squares by the Total Sum of Squares:

12018.11 ÷ 49104.91 = 0.245

Page 31: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 31

Visualizing Regression Analysis – 9

The reduction in error that we computed (0.245) is equal to the R Square that SPSS provides in the Model Summary table.

R² is the coefficient of determination which is usually characterized as:

• the proportion of variance in the dependent variable explained by the independent variable, or

• the reduction in error (or increase in accuracy).

In multiple regression, the symbol for coefficient of determination is R². In simple linear regression, the symbol is r².

Page 32: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 32

Visualizing Regression Analysis – 10

The correlation coefficient, Multiple R, is the positive square root of R Square.

This can be misleading in Simple Linear Regression when the correlation for the relationship between the two variables, r, can have a negative sign for an inverse relationship. Aside from the direction of the relationship, the value of Multiple R will be the same as the value for r in Simple Linear Regression.

Page 33: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 33

Visualizing Regression Analysis – 11

The ANOVA table tests the null hypothesis that R² = 0, i.e. the reduction in error associated with the regression is zero.

The test of the this hypothesis is reported for Multiple Regression as a test of an overall relationship between the dependent variable and the independent variables.

In Simple Linear Regression, we usually report the hypothesis test that the slope = 0, though we would reach the same conclusion no matter which test we report.

Page 34: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 34

Visualizing Regression Analysis – 12

The test of the null hypothesis that the slope of the regression line (b coefficient) = 0 is reported in the Coefficients table.

Note that the significance of the t-test is that same as the significance of the F-test. Furthermore, in simple linear regression, the value of the F-statistic (81.662) is the same as the square of the t-statistic (9.037).

Page 35: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 35

Visualizing Regression Analysis - 13

We can depict the hypothesis test visually.

The null hypothesis for simple linear regression is that the slope of the regression line is zero. The slope of the green mean line is zero. The null hypothesis means that the red regression line would be the equal to the green line.

In this example, the red regression line is obviously different from the green mean line, which is verified by the value of the slope in the regression equation (2.36) and the t-test of B.

Page 36: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 36

Visualizing Regression Analysis – 14

The regression equation is based on the Unstandardized Coefficients (B) in the table of Coefficients.

The B coefficient labeled (Constant) is the intercept. The B coefficient for the variable educ is the slope of the regression line.

The regression equation for the relationship between prestg80 and educ is:

prestg80 = 12.928 + 2.359 x educ

Page 37: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 37

Visualizing Regression Analysis – 15

The Standardized Coefficients (Beta) in the table of Coefficients are the regression coefficients for the relationship between the standardized dependent variable (z-scores) and the standardized independent variable (z-scores).

Since standardizing variables removes the unit of measurement from the coefficients, we can compare the Beta coefficients to interpret the relative importance of each independent variable in Multiple Regression.

In Simple Linear Regression, Beta will be equal to r, the correlation coefficient. Multiple R, r, and Beta all have the same numeric value, though Multiple R will be positive even when r and Beta are negative.

Page 38: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 38

Visualizing Regression Analysis – 16

The sign of the Beta coefficient, as well as the sign of the B coefficient, tells us the direction of the relationship.

If the coefficients are positive, the relationship is characterized as direct or positive, meaning that higher values of the dependent variable are associated with higher values of the independent variables.

If the coefficients are negative, the relationship is characterized as inverse or negative, meaning that lower values of the dependent variable are associated with higher values of the independent variables.

Page 39: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 39

Visualizing Regression Analysis - 17

The regression line represents the estimated value of prestg80 for every value of educ.

To obtain the estimate, we draw a line perpendicular to the value on the x-axis to the point where it intersects the regression line. We then draw a line from the intersection point to the y-axis. The intersection point on the y-axis is the estimated value for the dependent variable.

Page 40: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 40

Visualizing Regression Analysis - 18

If we draw a vertical line from the educ value of 5 to the regression line and then to the horizontal axis, we see that the estimated value for prestg80 is about 25.

We can compute the exact value by substituting in the regression equation:

Prestg80 = 12.93 + 2.36 x 5 = 24.73

Page 41: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 41

Visualizing Regression Analysis - 19

If we draw a vertical line from the educ value of 15 to the regression line and then to the horizontal axis, we see that the estimated value for prestg80 is about 50.

We can compute the exact value by substituting in the regression equation:

Prestg80 = 12.93 + 2.36 x 15 = 48.33

Page 42: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 42

Based on information from the data set GSS2000R, is the following statement true, false, or an incorrect application of a statistic? Assume that the assumptions of linear regression are satisfied. Use .05 for alpha.

Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (β = 0.546, t(250) = 10.30, p < .001). Survey respondents who had higher academic degrees had more prestigious occupations. The accuracy of predicting scores for the dependent variable "occupational prestige score" will improve by approximately 30% if the prediction is based on scores for the independent variable "highest academic degree" (r² = 0.298).

o Trueo True with cautiono Falseo Incorrect application of a statistic

Sample homework problem: Simple linear regression

This is the general framework for the problems in the homework assignment on simple linear regression problems. The description is similar to findings one might state in a research article.

Page 43: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 43

Based on information from the data set GSS2000R, is the following statement true, false, or an incorrect application of a statistic? Assume that the assumptions of linear regression are satisfied. Use .05 for alpha.

Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (β = 0.546, t(250) = 10.30, p < .001). Survey respondents who had higher academic degrees had more prestigious occupations. The accuracy of predicting scores for the dependent variable "occupational prestige score" will improve by approximately 30% if the prediction is based on scores for the independent variable "highest academic degree" (r² = 0.298).

o Trueo True with cautiono Falseo Incorrect application of a statistic

Sample homework problem: Data set and alpha

The first paragraph identifies:

• The data set to use, e.g. GSS2000R.Sav

• The alpha level for the hypothesis test

Page 44: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 44

Based on information from the data set GSS2000R, is the following statement true, false, or an incorrect application of a statistic? Assume that the assumptions of linear regression are satisfied. Use .05 for alpha.

Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (β = 0.546, t(250) = 10.30, p < .001). Survey respondents who had higher academic degrees had more prestigious occupations. The accuracy of predicting scores for the dependent variable "occupational prestige score" will improve by approximately 30% if the prediction is based on scores for the independent variable "highest academic degree" (r² = 0.298).

o Trueo True with cautiono Falseo Incorrect application of a statistic

Sample homework problem: Specifications for the test - 1

The second paragraph states the finding that we want to verify with a simple linear regression. The finding identifies:

• The independent variable• The dependent variable• The strength of the relationship• The direction of the relationship

Page 45: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 45

Based on information from the data set GSS2000R, is the following statement true, false, or an incorrect application of a statistic? Assume that the assumptions of linear regression are satisfied. Use .05 for alpha.

Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (β = 0.546, t(250) = 10.30, p < .001). Survey respondents who had higher academic degrees had more prestigious occupations. The accuracy of predicting scores for the dependent variable "occupational prestige score" will improve by approximately 30% if the prediction is based on scores for the independent variable "highest academic degree" (r² = 0.298).

o Trueo True with cautiono Falseo Incorrect application of a statistic

Sample homework problem: Specifications for the test - 2

The second paragraph also states additional statements the can be included in findings;

• A interpretative statement about direction of the relationship

• The proportional reduction in error (PRE) interpretation of the coefficient of determination r2

Page 46: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 46

Based on information from the data set GSS2000R, is the following statement true, false, or an incorrect application of a statistic? Assume that the assumptions of linear regression are satisfied. Use .05 for alpha.

Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (β = 0.546, t(250) = 10.30, p < .001). Survey respondents who had higher academic degrees had more prestigious occupations. The accuracy of predicting scores for the dependent variable "occupational prestige score" will improve by approximately 30% if the prediction is based on scores for the independent variable "highest academic degree" (r² = 0.298).

o Trueo True with cautiono Falseo Incorrect application of a statistic

Sample homework problem: Simple linear regression

The answer will be True if all parts of the finding in the problem statement are correct.

The answer to a problem will Incorrect application of a statistic if the level of measurement or sample size requirement is violated.

The answer to a problem will be True with caution if the analysis supports the finding in the problem statement, but one or both of the variables is ordinal level.

The answer will be False if any part of the finding in the problem statement is not correct.

Page 47: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 47

Solving the problem with SPSS:Level of measurement

Simple linear regression requires that the dependent variable be interval and the independent variable be interval or dichotomous. "Occupational prestige score" [prestg80] is interval level, satisfying the requirement for the dependent variable. "Highest academic degree" [degree] is ordinal level. However, we will follow the common convention of using ordinal variables with interval level statistics, adding a caution to any true findings.

Page 48: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 48

Solving the problem with SPSS: Simple linear regression- 1

Before we can address the other issues involved in solving the problem, we need to generate the SPSS output.

Select Regression > Linear… from the Analyze menu.

Page 49: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 49

Solving the problem with SPSS: Simple linear regression- 2

First, move the dependent variable prestg80 to the Dependent list box.

Second, move the independent variable degree to the Independents list box.

The problem states that:Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (ß = 0.546, t(250) = 10.30, p < .001).

We first enter the independent and dependent variable in the dialog box Unless the problem statement clearly specifies which variable is having an effect on the other, we treat the variable mentioned first as the independent variable and the one mentioned second as the dependent variable.

Third, click on the Statistics button to add the additional statistics.

Page 50: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 50

Solving the problem with SPSS: Simple linear regression- 3

Second, click on the Continue button to close the dialog box.

First, in addition to the SPSS defaults, we add the check box for Descriptives statistics.

Page 51: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 51

Solving the problem with SPSS: Simple linear regression- 4

When we return to the Linear Regression dialog box, we click on OK to obtain the output.

Page 52: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 52

Solving the problem with SPSS:Sample size

NOTE: this sample size requirement is much larger than what I have used in the past. Including the issue of power analysis indicates that previous guidelines would be substantially under-powered.

Using the rule of thumb from Tabachnick and Fidell that the required number of cases should be the larger of the number of independent variables x 8 + 50 or the number of independent variables + 105, simple linear regression requires 106 cases. With 252 valid cases, the sample size requirement is satisfied.

Page 53: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 53

Solving the problem with SPSS:Interpreting the relationship - 1

From the table of Coefficients, we see that the Beta (ß) of 0.546 stated in the finding is correct, as is the value for the t-test of the b-coefficient (10.303) and the probability of the t- statistic (<.001)

The first sentence in the finding states that:Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (ß = 0.546, t(250) = 10.30, p < .001).

Page 54: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 54

Solving the problem with SPSS:Interpreting the relationship - 2

SPSS does not provide the degrees of freedom for the t-test. However, it is easily calculated as the number of cases in the sample – the number of predictors – 1, or 252 – 1 -1 = 250 for this problem.

The first sentence in the finding states that:Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (ß = 0.546, t(250) = 10.30, p < .001).

Since the probability of the test statistic (t = 10.30, p < .001) was less than or equal to alpha (.05) the relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] was statistically significant.

Page 55: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 55

Solving the problem with SPSS:Interpreting the relationship - 3

The relationship was correctly characterized as strong. Using Cohen 's criteria for characterizing the strength of relationships, the correlation coefficient (r = 0.546) was correctly interpreted as a large or strong relationship.

Cohen’s criteria:

• r < .1 = Trivial• .1 ≤ r < .3 = Small• .3 ≤ r < .5 = Medium or moderate• r ≥ .5 = Large

In multiple regression, the Multiple R will always be positive because it represents the strength of the relationship for whatever number of independent variables is included.

The r for individual relationships has the same value as Multiple R in simple linear regression, but may be positive or negative depending on the direction of the relationship.

Page 56: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 56

Solving the problem with SPSS:Interpreting the relationship - 4

The relationship was correctly characterized positive. The sign of the Beta coefficient, as well as the B coefficient is positive or direct, implying that the numeric values of both variables move in the same direction – high with high and low with low.

The first sentence in the finding that states that Simple linear regression revealed a strong, positive relationship between "highest academic degree" [degree] and "occupational prestige score" [prestg80] (ß = 0.546, t(250) = 10.30, p < .001) is correct.

Page 57: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 57

Solving the problem with SPSS:Interpreting the relationship - 5

The sign of beta (ß = 0.546) was positive supporting the statement about the direction of the relationship.

Since the sign of the beta coefficient (ß = 0.546) was positive, the relationship between the variables is direct. Higher scores for the independent variable "highest academic degree" [degree] are associated with scores on the dependent variable "occupational prestige score" [prestg80].

The statement that survey respondents who had higher academic degrees had more prestigious occupation" is correct.

The second sentence in the finding states that:Survey respondents who had higher academic degrees had more prestigious occupations.

Page 58: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 58

Solving the problem with SPSS:Interpreting the relationship - 6

The third sentence in the finding states that:The accuracy of predicting scores for the dependent variable "occupational prestige score" will improve by approximately 30% if the prediction is based on scores for the independent variable "highest academic degree" (r² = 0.298).

Using the proportional reduction in error interpretation of the coefficient of interpretation, r², the statement that "the accuracy of predicting scores for the dependent variable "occupational prestige score" will improve by approximately 30% if the prediction is based on scores for the independent variable "highest academic degree" (r² = 0.298)" is correct.

This statement is also true, so the answer to the question is True with caution. The caution is for because the indepdnent variable “degree” is ordinal level.

Page 59: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 59

Logic for simple linear regression:Level of measurement

Measurement level of

independent variable?

Interval/Ordinal/Dichotomous

Measurement level of

dependent variable?

Interval/ordinal Nominal/Dichotomous

Inappropriate application of

a statistic

Strictly speaking, the test requires an interval level variable. We will allow ordinal level variables with a caution.

Inappropriate application of

a statistic

Nominal

Page 60: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 60

Logic for simple linear regression:Sample size requirement

Inappropriate application of

a statistic

Compute linear regression including descriptive statistics

Valid cases satisfies

computed requirement? No

YesThe sample size requirement is the larger of :

• the number of independent variables x 8 + 50

• the number of independent variables + 105

Page 61: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 61

Logic for simple linear regression:Significant, non-trivial relationship

Probability for t-test of B coefficient less than or equal to alpha?

False

Effect size (Multiple R) is not trivial by Cohen’s scale, i.e. equal to or larger than 0.10?

Yes

No

No

FalseYes

There are other assumptions that we will assume we satisfy for this week’s assignment.

In simple linear regression, r and Beta have the same numeric value as Multiple R, but may have a different sign. The are also measures of effect size.

Page 62: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 62

Logic for simple linear regression:Strength of relationship

Strength of relationship (effect size) correctly interpreted based Multiple R?

No

FalseYes

Page 63: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 63

Logic for simple linear regression:Direction of the relationship

Direction of relationship correctly interpreted based on B or Beta coefficient?

No

FalseYes

Page 64: SW388R6 Data Analysis and Computers I Slide 1 Simple Linear Regression Key Points about Statistical Test Visualizing Regression Analysis Sample Homework

SW388R6Data Analysis

and Computers I

Slide 64

Logic for simple linear regression:Proportional reduction in error

Reduction in error correctly interpreted based Multiple R²?

No

FalseYes

The statistics in the SPSS output match all of the statistics cited in the problem?

No

FalseYes

True

Add caution if dependent or independent variable is ordinal.