stats 3000 week 2 - winter 2011

64
Section D 1. G oodnessoffit Ch. 6 2. Testofindependence Ch. 6 3. Sim ple regression and correlation (notincluded on test3) a) Regression Ch. 9 b) Correlation Ch. 9 c) Inferencesaboutregression and correlation Ch.9

Upload: lauren-crosby

Post on 27-Jan-2015

107 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Stats 3000 Week 2 - Winter 2011

Section D

1. Goodness of fit Ch. 6 2. Test of independence Ch. 6 3. Simple regression and correlation (not included

on test 3) a) Regression Ch. 9 b) Correlation Ch. 9 c) Inferences about regression and

correlation Ch.9

Page 2: Stats 3000 Week 2 - Winter 2011

Data comes in pairs of quantitative variables. Given such paired data (bivariate data), we want to determine whether there is a relationship between the two quantitative variables and, if so, identify what the relationship is.

Regression analysis allows us to identify an equation that best fits the data, and to predict values of one variable based on another variable.

Descriptive Methods in Regression

Page 3: Stats 3000 Week 2 - Winter 2011
Page 4: Stats 3000 Week 2 - Winter 2011

What is Linear Regression?

• The straight-line linear regression model is a means of relating one quantitative variable to another quantitative variable

• A way of predicting the value of one variable from another.– It is a hypothetical model of the relationship

between two variables.– The model used is a linear one.– Therefore, we describe the relationship using the

equation of a straight line.

Page 5: Stats 3000 Week 2 - Winter 2011

LINEAR REGRESSION ANALYSIS (PREDICTION)

Process of finding the equation of the straight line that best

predicts the value of one variable from a given value of the other.

Procedures that allow us to predict one variable (Y) based

on knowledge of another variable (X)

The goal is to be able to predict new values of Y based on values of X

Page 6: Stats 3000 Week 2 - Winter 2011

Generally, Y is called the dependent variable (Predicted)

(Criterion) (Outcome) (ordinate) X is called the independent variable (Predictor) (abscissa).

Page 7: Stats 3000 Week 2 - Winter 2011

Scatter Plot: shows the relationship between X and Y.

Student High School GPA (X)

University GPA (Y)

1

2.00

1.60

2 2.25 2.00 3 2 60 1.80 4 2.65 2.80 5 2.80 2.10 6 3.10 2.00 7 2.90 2.65 8 3.25 2.25 9 3.30 2.60 10 3.60 3.00 11 3.25 3.10

Page 8: Stats 3000 Week 2 - Winter 2011

4.003.753.503.253.002.752.502.252.001.751.501.251.00

.75

.50

.25

.5 1.0 1.5 2 2.5 3.0 3.5 4.0.25 .75 1.25 1.75 2.25 2.75 3.25 3.75

. . . . . ....

..

High School(X)

University(Y)

Page 9: Stats 3000 Week 2 - Winter 2011

Describing a Straight Line

Linear equation: When the relationship between X and Y is linearLinear equation: Y = bX + a

Regression line: Line whose equation is used for predictionLine that best describes the relationship between y, the dependent variable and x, the independent variable.

Page 10: Stats 3000 Week 2 - Winter 2011

Linear regression builds on the equation for a straight line because the relationship between the two variables is assumed to be linear

A straight line should yield the best “fit” of the data points in a scatterplot (a linear model)

Page 11: Stats 3000 Week 2 - Winter 2011

Y = bX + a (regression equation)***** Y = predicted value of Y b = slope of the line; is called the regression coefficient X = value of independent variable a = intercept

Page 12: Stats 3000 Week 2 - Winter 2011

Slope: Change in the value of y for one-unit increase in X

0 1 X

Y ^

ˆChange in value of Slope = =

Change in value of

Yb

X

Page 13: Stats 3000 Week 2 - Winter 2011
Page 14: Stats 3000 Week 2 - Winter 2011

Intercept: The point at which a line intersects they axis. It is the value of Y, when X = 0. Determine the location of the line.

Page 15: Stats 3000 Week 2 - Winter 2011

Intercepts and Slopes

Page 16: Stats 3000 Week 2 - Winter 2011
Page 17: Stats 3000 Week 2 - Winter 2011

Least squares criterion: Statistical method for finding the best prediction

line. Best prediction line will minimize error in

predicting Y from X.

Best regression line will be closest to the actual data points.

Residuals - the difference between a score and its predicted value

Page 18: Stats 3000 Week 2 - Winter 2011

x y1 42 244 85 32

ˆ(Y Y) = residual, error in prediction = e ***********************

Best regression line minimizes the value of

2ˆY Y (is at a minimum) **************

SSresidual = Sum of squares residual= 2ˆY Y ****

Page 19: Stats 3000 Week 2 - Winter 2011
Page 20: Stats 3000 Week 2 - Winter 2011

x

2

x

xy

VarianceiancevarCo

s

Covb ***********************

Covariance

The degree to which X and Y, vary together (covary); The variation in one variable (X) that is shared by another (Y)

( )( )Cov ***********ariance

1*

X X Y Y

n

22 ( )

1x

X Xs

n

***********************

*********************a=Y-bX **

Page 21: Stats 3000 Week 2 - Winter 2011

A medical researcher is interested in the possibility of a linear relationship between a patient's age and the effectiveness of a certain drug (hours). The drug is administered to 8 randomly selected patients.

Age (X) Effectiveness (Y) 34 6.3 42 8.1 37 7.9 55 9.8 47 8.6 43 8.4 52 9.1 39 8.6

35.8Y

625.43X

Page 22: Stats 3000 Week 2 - Winter 2011

x

2

x

xy

VarianceiancevarCo

s

Covb ***********************

( )( )Covariance *********************

1**

X X Y Y

n

( )( )X X Y Y (34 43.625)(6.3 8.35) (42 43.625)(8.1 8.35) (37 43.625)(7.9 8.35)

(55 43.625)(9.8 8.35) ........ (39 43.625)(8.6 8.350 45.54875

var 45.54875 / 7 6.507

Co iance

Page 23: Stats 3000 Week 2 - Winter 2011

22 ( )

1x

X Xs

n

***********************

2 2 2

2 2

(34 43.625) (42 43.625) (37 43.625)

(55 43.625) ........ (39 43.625)53.125

7

6.507

.12253.125

b

a Y bX

a (8.35) (.122)(43.625) 3.03

Y .122X 3.03

Prediction for 44 years old

hours398.8Y

)44)(122(.03.3Y

Page 24: Stats 3000 Week 2 - Winter 2011

A researcher suspects that there is a relationship between the number of promisesa political candidate makes and the number of promises that are fulfilled once the candidate is elected. He examines the track record of 10 politicians. Use spss to construct a regression equation that predicts the number of promises made and promises kept by politicians.

Page 25: Stats 3000 Week 2 - Winter 2011
Page 26: Stats 3000 Week 2 - Winter 2011
Page 27: Stats 3000 Week 2 - Winter 2011
Page 28: Stats 3000 Week 2 - Winter 2011
Page 29: Stats 3000 Week 2 - Winter 2011
Page 30: Stats 3000 Week 2 - Winter 2011

slope

The information in the column “unstandardized coefficients” column B embodies the regression equation: (constant) is the intercept

Y 0.118x 9.268

Page 31: Stats 3000 Week 2 - Winter 2011

Standard error of estimate .y x(s )

Is a measure of the amount of error in prediction, in

units of the Y variable. Is the standard deviation of the distribution of

obtained Y scores about predicted values of Y, Y.

Standard error of estimate: a measure of the error in prediction used as the basis for a measure of the accuracy of prediction

Page 32: Stats 3000 Week 2 - Winter 2011

2

residual

y . x

ˆY Y SSs

n*****

2 df**

x.ys represents the average error in prediction

over an entire scatterplot.

Page 33: Stats 3000 Week 2 - Winter 2011

Age (X) Effectiveness (Y) Y 2ˆ( )y y

34 6.3 7.178 .771 42 8.1 8.154 .003 37 7.9 7.544 .127 55 9.8 9.74 .004 47 8.6 8.764 .027 43 8.4 8.276 .015 52 9.1 9.374 .075 39 8.6 7.788 .660

2

.

ˆ1.682

.28 .532 6

y x

Y Ys hours

n

Averaged dispersion of the effectiveness scores around their predicted values.

Page 34: Stats 3000 Week 2 - Winter 2011
Page 35: Stats 3000 Week 2 - Winter 2011

Section D Goodness of fit Ch. 6 Test of independence Ch. 6 Regression Ch. 9 Correlation Ch. 9 Inferences about regression and correlation Ch.9

Page 36: Stats 3000 Week 2 - Winter 2011

Scatterplot

• To see if scores may be related construct a graph of the scores, called a scatterplot– The variable labeled X is plotted on the

horizontal axis (the abscissa)– The Y variable is plotted on the vertical axis (the

ordinate)– The score of a subject on each of the two

measures is indicated by one point on the scatterplot

Page 37: Stats 3000 Week 2 - Winter 2011

Conclusions drawn from scatterplots are subjective. A more precise and objective method for detecting straight-line patterns is the linear correlation coefficient.

The linear correlation coefficient r (often simply called the correlation coefficient) measures the strength of the linear relationship between the paired x and y values in a sample.

Page 38: Stats 3000 Week 2 - Winter 2011

Is a statistical technique used to measure the relationship between two variables. (magnitude and direction)

Descriptive Methods in Correlation

Page 39: Stats 3000 Week 2 - Winter 2011

Pearson Correlation (r) r is a descriptive statistic used to measure the degree of straight line relationship between 2 variables.

r also determines the precision with which predictions can be made using the regression line (r2 = coefficient of determination)

Page 40: Stats 3000 Week 2 - Winter 2011

The value of r is not affected by the choice of x or y.

Interchange all x and y values and the value of r will not

change.

r measures the strength of a linear relationship. It is

not designed to measure the strength of a relationship

that is not linear.

Page 41: Stats 3000 Week 2 - Winter 2011

xy

x y

2

x

2

y

xy

Cov Co var iancer

s s (S.D.of X)(S.D.of Y)

(X X)s S.D. of X

n 1

(Y Y)s S.D. of Y

***

***

***

**

n 1

(X X)(Y Y)s Covarianc *e

n 1

Page 42: Stats 3000 Week 2 - Winter 2011

Sign r is a measure of the extent to which paired scores occupy the same (+) or opposite (-) positions within their own distribution.

x y

X X Y Yz z

s s

Page 43: Stats 3000 Week 2 - Winter 2011

xy

x y2 2

Cov s s

( )( )

( ) ( )***

X X Y Yr

X X Y Y

Page 44: Stats 3000 Week 2 - Winter 2011

Sign of r is determined by covariance or by the numerator. X Y X Y + = two variables move in same direction ( +z, +z, -z, -z) X Y X Y = two variables move in oppositive direction ( +z, - z, -z, +z)

Page 45: Stats 3000 Week 2 - Winter 2011
Page 46: Stats 3000 Week 2 - Winter 2011

RAW SCORES

1 60 162 45 133 40 124 20 85 10 6

Subject X Y

X

sx

35

20 Y

sy

11

4

z

X Xsx

z

Y Ysy

Page 47: Stats 3000 Week 2 - Winter 2011

Z SCORES

1 1.25 1.25 2 0.50 0.5 3 0.25 0.25 4 -0.75 - 0.75 5 -1.25 -1.25

Subject X Y

r = 1

Page 48: Stats 3000 Week 2 - Winter 2011

2 xy xy

x x y

Cov Covb r

s s s

( )( ) xyCov X X Y Y

b and r will have the same sign 1. The magnitude of r ranges between 0 and 1. 2. The sign of r is either positive or negative.

Page 49: Stats 3000 Week 2 - Winter 2011

Degree of linear relationship 0 < r .3 then weak .3 < r .55 then slight .55 < r .8 then moderate .8 < r 1 then strong Thus, r can take any value between -1 and 1.

3. Generally, if regardless of sign,

Page 50: Stats 3000 Week 2 - Winter 2011

Direction of relationship

• A correlation coefficient indicates the direction of the relationship by the positive or negative sign of the coefficient

• A positive r indicates• A positive (direct)relationship between

variables X and Y• As the scores on variable X increase, the scores

on variable Y tend to increase• A negative r indicates

• A negative (inverse)relationship between variables X and Y

• As the scores on variable X increase, the scores on variable Y tend to decrease

Page 51: Stats 3000 Week 2 - Winter 2011

........

..... .. ... ..

.... ....... .

.. ...

....

...

........

.. .... ... ...

. .. .... ...

...

... ..

.. . .

... ... .

..

..

.... .

.

.

. .. .. ....

..

... .. .

...

. ..

.. . .. .

..

.....

. .. .. . .

..

..

.... . ...

....

..

. ..

....

.... ..

.. ..

.. .

.. ...

. ....

.... ...

.. ..

.... .. ...

..

....

...

Y

Y

r = 1 r = 0.9 r = 0.5

r = -1 r = -0.9 r = -0.5

(no error inpredictions)

Page 52: Stats 3000 Week 2 - Winter 2011

. ...... .

..

. . ... ... . ... . .. .... ..

...

...

.....

..

. .... .

... ... ...

..

... ... . ... . ..... . ... .

. ......

.. ..

..

....

.

.. .

... ...

....

. ... . ... ..

.... ..

...

.

.....

..

. ...

.

...

. ...

..

..

............ ......

.

...... ...

. .. ..

.. .

....

.

.

... .

.

....

.. .

.

.

. .

.

. ...

.

.

.

.

.

.

...

.

.. ...

.. ..

..

..

.

..

. Y on X

X

Y

Y

X

Page 53: Stats 3000 Week 2 - Winter 2011

Example: Most of us have heard that tall people generally have larger feet than short people. Is that really true and, if so, what is the relationship between height and foot length? To examine this, Professor Dennis Young obtained data on shoe size and height for a sample of students at Carleton University.

Page 54: Stats 3000 Week 2 - Winter 2011

SIZE (X) x- X 2(x X) HEIGHT

(Y)

Y Y

2(Y Y) (x- X ) ( Y Y )

10.5 -0.46 0.22 70 -1.46 2.144133 0.679847

13 2.04 4.16 72 0.54 0.28699 1.092857

10.5 -0.46 0.21 74.5 3.04 9.215561 -1.39643

12 1.04 1.08 71 -0.46 0.215561 -0.48286

10.5 -0.46 0.21 71 -0.46 0.215561 0.213571

13 2.04 4.16 77 5.54 30.64413 11.29286

11.5 0.54 0.29 72 0.54 0.28699 0.289286

10 -0.96 0.92 72 0.54 0.28699 -0.51429

8.5 -2.46 6.05 67 -4.46 19.92985 10.98214

10.5 -0.46 0.21 73 1.54 2.358418 -0.70643

10.5 -0.46 0.21 72 0.54 0.28699 -0.24643

11 0.04 0.00 70 -1.46 2.144133 -0.05857

9 -1.96 3.84 69 -2.46 6.072704 4.83

13 2.04 4.16 70 -1.46 2.144133 -2.98714

N=14

X=10.964286

2(x X)=

25.736

Y=71.4643

2(Y Y)=

76.23214

(X X) (Y Y) 22.98842

Page 55: Stats 3000 Week 2 - Winter 2011

xy

x y

Covr

s s

2

x

x

(X X)s

n 1

 25.736s

131.407

Page 56: Stats 3000 Week 2 - Winter 2011

2

y

y

(Y Y)s

n 1

76.23214s

132.422

xy

xy

(X X)(Y Y)s

n 122.98842

s13

1.768

Page 57: Stats 3000 Week 2 - Winter 2011

xy

x y

Cov 1.768r

s s (1.407)(2.422)0.5189

Page 58: Stats 3000 Week 2 - Winter 2011
Page 59: Stats 3000 Week 2 - Winter 2011
Page 60: Stats 3000 Week 2 - Winter 2011
Page 61: Stats 3000 Week 2 - Winter 2011
Page 62: Stats 3000 Week 2 - Winter 2011
Page 63: Stats 3000 Week 2 - Winter 2011

Exercise 14 see webct, exercise folder

Textbook exercises: 9.2, 9.3, 9.10, 9.11, 9.13, 9.15. When appropriate verify your answers with SPSS. Get data for spss from webct, spss folder, spss exercises subfolder.

Readings to prepare for week 3, January 17-22

Chapter 9

Sections: 9.7, 9.8, 9.10, 9.11

Page 64: Stats 3000 Week 2 - Winter 2011

SPSS assignment # 2 due next week