1 exploring relationships between quantitative variables scatterplots, association, and correlation...

86
1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES 29 – 58

Upload: samuel-boone

Post on 03-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

1

EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

• SCATTERPLOTS, ASSOCIATION, AND CORRELATION

ADDITIONAL REFERENCE READING MATERIAL

• COURSEPACK PAGES 29 – 58

Page 2: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

2

LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES X AND Y

WHY STUDY LINEAR RELATIONSHIPS?•LINEAR RELATIONSHIPS ARE THE EASIEST TO UNDERSTAND AND ANALYZE;•MOST RELATIONSHIPS ARE OFTEN APPROXIMATELY LINEAR;•VARIABLES WITH A NONLINEAR RELATIONSHIP CAN OFTEN BE TRANSFORMED SO THAT THE RELATIONSHIP OF THE TRANSFORMED VARIABLES IS LINEAR. FOR EXAMPLE, CONSIDER THE EQUATION RELATING BRAIN WEIGHT W, AND BODY WEIGHT Z.

mcZW

Page 3: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

3

EXAMPLES

• RELATIONSHIP BETWEEN SMOKING AND LUNG CANCER;

• RELATIONSHIP BETWEEN ALTITUDE AND THE BOILING POINT OF WATER;

• RELATIONSHIP BETWEEN TEMPERATURE AND OZONE CONCENTRATION IN THE AIR;

IN THESE EXAMPLES, TWO VARIABLES ARE INVOLVED NAMELY: THE RESPONSE VARIABLE Y, AND THE EXPLANATORY VARIABLE, X.

Page 4: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Three Tools We Will Use

• Scatterplot, a two-dimensional graph of data values

• Correlation, a statistic that measures the strength and direction of a linear relationship between two quantitative variables.

• Regression equation, an equation that describes the average relationship between a quantitative response and explanatory variable.

4

Page 5: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

5

LEAST SQUARES LINE (REGRESSION LINE)

• GIVEN A SET OF n OBSERVATIONS,• QUESTION: WHAT LINE “BEST” FITS THE

OBSERVATIONS?• WE SHALL ANSWER THIS QUESTION GRAPHICALLY

USING A SCATTERPLOT, AND ANALYTICALLY USING LEAST SQUARES REGRESSION FORMULA.

• SCATTERPLOTS: A SCATTERPLOT IS A PLOT OF

THE POINTS

),,(),...,,( 11 nn YXYX

),(),...,,( 11 nn YXYX

Page 6: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

6

WHAT LINE “BEST FITS” THE SET OF OBSERVATIONS?

•GRAPHICAL SOLUTION

Page 7: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

7

EXAMPLE: GIVEN THE SET OF OBSERVATIONS, (1,2), (2,5), (3,4), (4,1), (5,8), (6,3), (7,2), PLOT A SCATTERGRAM.

• SCATTERGRAM

X

Y

4

8

6

2

2 4 6 8

X

XX

X

X

X

X

Page 8: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Height and Handspan

8

Data shown are the first 12 observations of a data set that includes the heights (in inches) and fully stretched handspans (in centimeters) of 167 college students

Page 9: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Height and Handspan

Taller people tend to have greater handspan measurements than shorter people do.

When two variables tend to increase together, we say that they have a positive association.

The handspan and height measurements may have a

linear relationship.

9

Page 10: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Driver Age and MaximumLegibility Distance of Highway Signs

• A research firm determined the maximum distance at which each of 30 drivers could read a newly designed sign.

• The 30 participants in the study ranged in age from 18 to 82 years old.

• We want to examine the relationship between age and the sign legibility distance

10

Page 11: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Driver Age and MaximumLegibility Distance of Highway Signs

11

Page 12: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Driver Age and MaximumLegibility Distance of Highway Signs

• We see a negative association with a linear pattern.

• We will use a straight-line equation to model this relationship.

12

Page 13: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

13

LOOKING AT SCATTERPLOTS• SCATTERPLOTS ARE THE BEST WAY TO START

OBSERVING THE RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES.

• BY JUST LOOKING AT THEM, YOU CAN SEE PATTERNS, TRENDS, RELATIONSHIPS, AND EVEN THE OCCASIONAL EXTRAORDINARY VALUE SITTING APART FROM THE OTHERS.

• THERE ARE FOUR THINGS WE LOOK FOR IN A SCATTERPLOT.– DIRECTION– FORM– STRENGTH– UNUSUAL FEATURES

Page 14: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Looking for Patterns with Scatterplots

Questions to Ask about a Scatterplot

• What is the average pattern? Does it look like a straight line, or is it curved?

• What is the direction of the pattern?

• How much do individual points vary from the average pattern?

• Are there any unusual data points?

14

Page 15: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

What we Look for in a scatterplot

We examine a scatterplot to study association. How do values on the response variable change as values of the explanatory variable change?

You can describe the overall pattern of a scatterplot by the trend, direction, and strength of the relationship between the two variables.

Trend: linear, curved, clusters, no pattern Direction: positive, negative, no direction Strength: how closely the points fit the trend

Also look for outliers from the overall trend.

15

Page 16: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

16

DIRECTION

POSITIVE NEGATIVE NEITHER

THE PATTERN RUNS THE PATTERN RUNS

FROM THE BOTTOM LEFT FROM THE UPPER LEFT

TO THE UPPER RIGHT. TO THE LOWER RIGHT.

X

XX

XX

X

XXX

X

XX

X

XX

X

XX

XXX

XXX

X X

X

XXX

X

X

X

X

X

X

X

X

XX

XX

X

XXX

X X

X X

X

X

XX

X X X

X

X

Page 17: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

17

FORM

STRAIGHT

CURVED

EXOTIC

NO PATTERNS

Page 18: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

18

FORM: POSITIVE STRAIGHT DIRECTION

POSITIVELY STRAIGHT RELATIONSHIP

X

X

X

X

X

X

X

X

X

X

X

X

X

X

XX

X

X

X

XX

X

Page 19: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: 100 Cars on the Lot of a Used-Car Dealership

Question: Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer?

– Positive association– Negative association – No association

19

Page 20: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

20

FORM: NEGATIVE STRAIGHT DIRECTION

• NEGATIVELY STRAIGHT RELATIONSHIP

XX X

XX

XX

XX

XX

XX

X

X XX

X

X

X

X

X

X

X

Page 21: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Positive and Negative Associations

•Two quantitative variables x and y are – Positively associated when

• high values of x tend to occur with high values of y.• low values of x tend to occur with low values of y.

– Negatively associated when high values of one variable tend to pair with low values of the other variable.

21

Page 22: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Positive, Negative Associations, Linear Relationships

• Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increase.

• Two variables have a negative association when the values of one variable tend to decrease as the values of the other variable increase.

• Two variables have a linear relationship when the pattern of their relationship resembles a straight line.

22

Page 23: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

23

FORM: CURVED RELATIONSHIP

• CURVED ASSOCIATION BETWEEN X AND Y

X

X

X

XX X X

XX

XX

XX

X

X

XXXXX

XX

XX

X X X XX

XX

XX

XX XX

X

X

X X

X

Page 24: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

24

FORM:EXOTIC – SHARP POINTS

• OUTSTANDING FEATURE – SHARP POINTS

XX

XX

XX

XXXX

XX

XX

X

XX

XXXXX

XXXX

X

XX

XX

XX

XX

X

XX

Page 25: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

25

FORM: NO CLEAR PATTERNS

Page 26: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

26

STRENGTH

STRONG MODERATE WEAK

X XX X

XXX

XXXX

X XXX

XXXX

XX

X XXXXX

XX

X

X

XX

X

XX

X

XX

X

XX

X X

X

X

X

X

X

XX

X

X

X

X

XX

X

X

X

Page 27: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

27

UNUSUAL FEATURES

OUTLIERS SUBGROUPS

X XX X

XXXX X

XX

X

XX

X XX

X

X

XX X

X

X

X

XX

XX XX

XXX

X X

Page 28: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Unusual Features

28

Page 29: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

29

EXAMPLES

Page 30: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

30

REGRESSION WISDOM: CORRELATION COEFFICIENT

• CORRELATION COEFFICIENT r

THE CORRELATION COEFFICIENT IS A NUMERICAL MEASURE OF THE DIRECTION AND STRENGTH OF A LINEAR ASSOCIATION.

• FOR A SET OF PAIRED DATA

THE LINEAR CORRELATION COEFFICIENT r IS GIVEN BY

),(),...,,(),,( 2211 nn YXYXYX

yx

n

iii

SSn

yyxxr

)1(

))((1

Page 31: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

31

• WHERE AND ARE RESPECTIVELY THE STANDARD DEVIATION OF X AND Y.

• EXAMPLE: FIND THE LINEAR CORRELATION COEFFICIENT FOR THE FOLLOWING FOUR PAIRS OF NUMBERS: (6,5), (10,3), (14,7), (19,8), (21,12).

xS yS

Page 32: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

32

PROPERTIES OF CORRELATION COEFFICIENT, r

• WITH POSITIVE r MEANING

POSITIVE RELATIONSHIP AND NEGATIVE r MEANING NEGATIVE RELATIONSHIP BETWEEN THE TWO VARIABLES.

• r = 1 IF AND ONLY IF POINTS LIE ON A LINE WITH POSITIVE SLOPE.

• r = -1 IF AND ONLY IF POINTS LIE ON A LINE WITH NEGATIVE SLOPE.

• THE VALUE OF r DOES NOT CHANGE IF THE UNITS OF MEASUREMENT ARE CHANGED.

• r MEASURES THE STRENGTH AND DIRECTION OF THE LINEAR RELATIONSHIP BETWEEN Y AND X.

11 r

Page 33: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

33

•THE VALUE OF r DOES NOT DEPEND ON WHICH OF THE TWO VARIABLES IS LABELED X.

• THE VALUE OF r IS A MEASURE OF THE EXTENT TO WHICH X AND Y ARE LINEARLY RELATED – THAT IS, THE EXTENT TO WHICH THE POINTS IN THE SCATTER PLOT FALL CLOSE TO A STRAIGHT LINE. A VALUE OF r CLOSE TO ZERO DOES NOT RULE OUT ANY STRONG RELATIONSHIP BETWEEN X AND Y; THERE COULD STILL BE A STRONG RELATIONSHIP BUT ONE THAT IS NOT LINEAR.

0 +1-1 -0.5 0.5 0.8-0.8

STRONG STRONG

MODERATE

WEAK

MODERATE

Page 34: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Scatterplots and Correlation Coefficient

• Let’s get a feel for the correlation r by looking at its values for the scatterplots shown below

34

Page 35: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

35

Page 36: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

36

CORRELATION CONDITIONS

• QUANTITATIVE VARIABLES CONDITION

• CORRELATION APPLIES ONLY TO QUANTITATIVE VARIABLES.

• BE SURE NOT TO APPLY CORRELATION TO CATEGORICAL DATA MASQUERADING AS QUANTITATIVE.

• CHECK THE VARIABLES’ UNITS AND WHAT THEY MEASURE.

Page 37: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

37

STRAIGHT ENOUGH CONDITION

• MAKE SURE THE FORM OF THE SCATTERPLOT IS STRAIGHT ENOUGH THAT A LINEAR RELATIONSHIP MAKES SENSE.

• CORRELATION MEASURES THE STRENGTH ONLY OF THE LINEAR ASSOCIATION, AND WILL BE MISLEADING IF THE RELATIONSHIP IS NOT LINEAR.

• IF A RELATIONSHIP IS CURVED, THEN SUMMARIZING ITS STRENGTH WITH A CORRELATION WOULD BE MISLEADING.

Page 38: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

38

OUTLIER CONDITION

• OUTLIERS CAN DISTORT THE CORRELATION DRAMATICALLY.

• AN OUTLIER CAN MAKE AN OTHERWISE WEAK CORRELATION LOOK BIG OR HIDE A STRONG CORRELATION.

• AN OUTLIER CAN EVEN GIVE AN OTHERWISE POSITIVE ASSOCIATION A NEGATIVE CORRELATION (AND VICE VERSA)

• WHEN YOU SEE AN OUTLIER, IT’S OFTEN A GOOD IDEA TO REPORT THE CORRELATION WITH AND WITHOUT THAT POINT.

Page 39: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

39

WHICH LINE “BEST FITS” THE SET OF OBSERVATIONS?

•THE ANALYTICAL APPROACH

Page 40: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Regression Line

•The first step of a regression analysis is to identify the response and explanatory variables.

– We use y to denote the response variable.

– We use x to denote the explanatory variable.

40

Page 41: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

41

FITTING THE MODEL: THE LEAST SQUARES METHOD

• CONSIDER THE EXAMPLE:• SUPPOSE AN APPLIANCE

STORE CONDUCTS A FIVE-MONTH EXPERIMENT TO DETERMINE THE EFFECT OF ADVERTISING ON SALES REVENUE. THE RESULTS ARE SHOWN IN THE TABLE.

MONTH ADVERTISING EXPENDITURE, X ($100s)

SALE REVENUE

Y, ($1,000s)

1 1 1

2 2 1

3 3 2

4 4 2

5 5 4

Page 42: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

42

FIRST STEP IS TO MAKE A SCATTERGRAM

SALES REVENUE

($1000s)

x

Y

AD. EXPENDITURE ($100s)

1 2 3 4 5

1

2

3

4

X X

X X

X

Page 43: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

43

WHAT IS THE BEST FIT?• SCATTERGRAM WITH POSSIBLE FITS

X

Y

1

2

3

4

1 2 3 4 5 6

X X

X X

X

Page 44: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

44

GENERAL EQUATION OF A REGRESSION LINE

y = a + b.x + error LINEAR PART • THE LINEAR PART OF THE EQUATION REQUIRES

DETERMINATION OF TWO COEFFICIENTS – a (THE Y-INTERCEPT) AND b (THE SLOPE) IN ORDER TO PREDICT VALUES OF Y.

• ONCE a AND b ARE OBTAINED, THE STRAIGHT LINE IS KNOWN AND CAN BE PLOTTED ON THE SCATTER DIAGRAM. THEN WE COULD MAKE A VISUAL COMPARISON OF HOW WELL OUR PARTICULAR STATISTICAL MODEL (A STRAIGHT LINE) FITS THE ORIGINAL DATA.

Page 45: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Regression Equation

• The regression line predicts the value for the response variable y as a straight-line function of the value x of the explanatory variable.

• • Let denote the predicted value of y. The equation

for the regression line has the form

• In this formula, a denotes the y-intercept and b denotes the slope.

45

bxay ˆ

Page 46: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Height Based on Human Remains

Regression Equation:

is the predicted height and is the length of a femur (thighbone), measured in centimeters.

Use the regression equation to predict the height of a person whose femur length was 50

46

ˆ y 61.4 2.4(50) 181.4

xy 4.24.61ˆ

Page 47: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

47

COMPUTING a AND b

LETTING AND REPRESENT THE MEANS OF

AND RESPECTIVELY,

THE RESULTING FORMULAS FOR THE INTERCEPT

AND SLOPE ARE GIVEN BY

X Y

nXXX ,...,, 21 nYYY ,...,, 21

n

ii

n

iii

xx

yyxxb

1

2

1

)(

))((ˆ

Page 48: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

48

ALTERNATIVE FORMULA WHEN THE STANDARD DEVIATIONS OF X AND Y

VARIABLES ARE KNOWN

xbya

s

srb

x

y

ˆˆ

ˆ

Page 49: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

49

NOTE: THE ‘HAT’ IS PLACED OVER THE LETTERS a AND b TO REMIND US THAT THESE ARE THE VALUES WHICH MINIMIZES THE SUM OF SQUARED DEVIATIONS.

• CLASS WORK

• Example From Midterm 1 Review Sheet

Page 50: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Baseball Scoring Vs Batting average

• Given the following statistics from data on baseball scoring versus batting average, find the regression line.

50

Page 51: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Baseball Scoring Vs Batting Average

51

32.22597.007.2645.4 xbya

Page 52: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Example: Baseball Scoring Vs Batting Average

• The regression line to predict team scoring from batting average is

52

2.32 26.1x

Page 53: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Slope: Positive, Negative, Zero

53

Page 54: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Interpreting the Slope

•Slope: measures the change in the predicted variable (y) for a 1 unit increase in the explanatory variable (x).

•Example: A 1 cm increase in femur length results in a 2.4 cm increase in predicted height.

54

Page 55: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

The Slope and Correlation

•Slope:

– Numerical value depends on the units used to measure the variables.

– Does not tell us whether the association is strong or weak.

– The two variables must be identified as response and explanatory variables.

– The regression equation can be used to predict values of the response variable for given values of the explanatory variable.

55

Page 56: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Interpreting the y - Intercept

•y-Intercept: – The predicted value for y when x = 0; – This fact helps in plotting the line;– May not have any interpretative value if no

observations had x values near 0;

It does not make sense for femur length to be 0 cm, so the y-intercept for the equation

is not a relevant predicted height.56

ˆ 61.4 2.4y x

Page 57: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

57

REGRESSION WISDOM: PREDICTION, RESIDUALS, CORRELATION

• PREDICTION• OBTAINING THE REGRESSION FORMULA IS NOT

THE END OF THE ANALYSIS. MOSTLY, WE ARE INTERESTED IN PREDICTING FUTURE OUTCOMES WITH THE REGRESSION FUNCTION.

• TYPES OF PREDICTION• EXTRAPOLATION: EXTRAPOLATION IS THE

USE OF A REGRESSION LINE FOR PREDICTION OUTSIDE THE RANGE OF VALUES OF THE EXPLANATORY VARIABLE X THAT IS USED TO OBTAIN THE LINE. SUCH PREDICTION CANNOT BE TRUSTED.

Page 58: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Extrapolation is Dangerous

•Extrapolation: Using a regression line to predict y-values for x-values outside the observed range of the data.

– Riskier the farther we move from the range of the given x-values.

– There is no guarantee that the relationship given by the regression equation holds outside the range of sampled x-values.

58

Page 59: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

59

INTERPOLATION:

• INTERPOLATION IS THE USE OF A REGRESSION LINE FOR PREDICTION WITHIN THE RANGE OF VALUES OF THE EXPLANATORY VARIABLE X THAT IS USED TO OBTAIN THE LINE. INTERPOLATION IS GENERALLY SAFE.

REMARKS• EXTRAPOLATION SHOULD BE HANDLED

WITH CAUTION. LIMIT PREDICTIONS TO X VALUES WHICH ARE WITHIN THE RANGE OF THE DATA USED TO COMPUTE THE LEAST SQUARES LINE.

Page 60: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

60

INTERPOLATING AND EXTRAPOLATING

• ILLUSTRATION

Page 61: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

61

•DO NOT MAKE PREDICTIONS OUTSIDE THE CONTEXT OF THE STUDY IN WHICH THE DATA WERE COLLECTED. FOR EXAMPLE, IT IS INAPPROPRIATE

TO USE THE LEAST SQUARES LINE FITTED BY THE BABY DATA OF AMERICANS TO PREDICT WEIGHTS FOR BABIES BORN IN CHINA.

DIAGNOSTICS: AFTER OBTAINING THE REGRESSION LINE, WE WOULD LIKE TO KNOW HOW WELL THE REGRESSION LINE FITS THE DATA. ALSO, WE WOULD LIKE TO KNOW IF THERE IS ANY POTENTIAL POINT THAT AFFECTS THE REGRESSION LINE. A DIAGNOSTIC ANALYSIS SUCH AS THE ANALYSIS OF RESIDUAL IS VERY USEFUL.

Page 62: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

62

REGRESSION WISDOM - RESIDUALS

• THE DISCREPANCY BETWEEN DATA AND MODEL IS CALLED RESIDUAL. HENCE, A RESIDUAL IS THE DIFFERENCE BETWEEN AN OBSERVED VALUE OF Y AND THE VALUE PREDICTED BY THE REGRESSION LINE. THAT IS,

RESIDUAL = OBSERVED Y – PREDICTED Y

=

WHERE

yy ˆ

ixbay ˆˆˆ

Page 63: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

63

NOTATION: RESIDUAL IS DENOTED BY THE LETTER e

• THE RESIDUALS FOR INDIVIDUAL i IS DENOTED BY

EXAMPLE: A LINEAR MODEL RELATING HURRICANES’ WIND SPEEDS TO THEIR CENTRAL PRESSURES IS

MaxWindSpeed = 955.27 – 0.897CentralPressureHURRICANE KATRINA HAD A CENTRAL PRESSURE AT 920 MILLIBARS. WHAT DOES OUR REGRESSION MODEL PREDICT FOR HER MAXIMUM WIND SPEED? HOW GOOD IS THAT PREDICTION, GIVEN THAT KATRINA’S ACTUAL WIND SPEED WAS MEASURED AT 110 KNOTS?

iiiii yyxbaye ˆ)ˆˆ(

Page 64: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Analysis of Residuals

• ANALYSIS OF RESIDUAL HELPS US TO ASSESS THE ADEQUACY OF A MODEL AND HELPS TO IDENTIFY OUTLIERS OR OTHER INTERESTING DATA POINT

• WHEN A REGRESSION MODEL IS APPROPRIATE, IT SHOULD MODEL THE UNDERLYING RELATIONSHIP. NOTHING INTERESTING SHOULD BE LEFT BEHIND. SO AFTER WE FIT A REGRESSION MODEL, WE USUALLY PLOT THE RESIDUALS IN THE HOPE OF FINDING … NOTHING.

64

Page 65: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

65

Analysis of Residuals – Residual Plots

• RESIDUAL PLOT IS A SCATTERPLOT OF THE RESIDUALS [ON THE VERTICAL AXIS] AGAINST THE EXPLANATORY VARIABLE, X, ON THE HORIZONTAL AXIS.

• THE PLOT SHOULD NOT HAVE ANY INTERESTING FEATURES, LIKE A DIRECTION OR SHAPE. IT SHOULD STRETCH HORIZONTALLY, WITH ABOUT THE SAME AMOUNT OF SCATTER THROUGHOUT. IT SHOULD SHOW NO BENDS, AND IT SHOULD HAVE NO OUTLIERS.

Page 66: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

66

NOTE: SUM OF RESIDUALS = 0

• A RESIDUAL PLOT

e

x

● ●

Page 67: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

67

CLASS EXAMPLES

Page 68: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

68

INTERPRETATION OF POSITIVE AND NEGATIVE RESIDUALS

• POSITIVE RESIDUAL: THE MODEL OR PREDICTED VALUES UNDERESTIMATE THE ACTUAL DATA VALUE.

• NEGATIVE RESIDUAL: THE MODEL OR PREDICTED VALUES OVERESTIMATE THE ACTUAL DATA VALUE

Page 69: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

69

REMARK• MOST COMPUTER STATISTICS PACKAGES PLOT

THE RESIDUALS AGAINST THE PREDICTED VALUES, RATHER THAN AGAINST THE X-VALUES. WHEN THE SLOPE IS NEGATIVE, THE TWO VERSIONS ARE MIRROR IMAGES. WHEN THE SLOPE IS POSITIVE, THEY ARE VIRTUALLY IDENTICAL EXCEPT FOR THE AXIS LABELS. SINCE ALL WE CARE ABOUT IS THE PATTERNS (OR, BETTER, LACK OF PATTERNS) IN THE RESIDUAL PLOT, IT REALLY DOES NOT MATTER WHICH WAY WE PLOT THE RESIDUALS.

Page 70: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

70

CLASS EXAMPLES

Page 71: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

71

COEFFICIENT OF DETERMINATION

• THE COEFFICIENT OF DETERMINATION MEASURES THE PROPORTION OF VARIATION THAT IS EXPLAINED BY THE INDEPENDENT VARIABLE X, IN THE REGRESSION MODEL. THAT IS, MEASURES THE PROPORTION OF THE TOTAL VARIABILITY IN Y THAT IS REMOVED BY ADDING X TO THE LINEAR MODEL.

• NOTATION:

THE COEFFICIENT OF DETERMINATION IS USEFUL WHEN

INTERPRETING r. ITS SYMBOL EXPLAINS HOW IT IS

COMPUTED; TO OBTAIN IT, SIMPLY SQUARE r – THE CORRELATION COEFFICIENT.

2r

Page 72: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Coefficient of Determination

•The typical way to interpret is as the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x.

•When a strong linear association exists, the regression equation predictions tend to be much better than the predictions using only .

•We measure the proportional reduction in error and call it, .

72

2r

y

2r

Page 73: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Coefficient of Determination

• measures the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x.

•A correlation of .9 means that

– 81% of the variation in the y-values can be explained by the explanatory variable, x.

73

%8181.9. 2

2r

Page 74: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

74

PROPERTIES OF COEFFICIENT OF DETERMINATION

1. 2. IF AND ONLY IF ALL POINTS LIE ON A LINE.

3. DOES NOT CHANGE IF THE UNITS OF MEASUREMENT ARE CHANGED.

4. MEASURES THE STRENGTH OF LINEAR ASSOCIATION BETWEEN THE VARIABLES Y AND X. IT IS POSSIBLE THAT X AND Y ARE STRONGLY RELATED, BUT IS CLOSE TO 0.

10 2 r

12 r2r

2r

2r

Page 75: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

75

Remark

• THE COEFFICIENT OF DETERMINATION, WHEN CONVERTED TO A PERCENTAGE, INDICATES HOW MUCH VARIANCE IS ACCOUNTED FOR BY THE VARIANCE ON THE OTHER VARIABLE

• Examples From Midterm Review 1 Sheet

Page 76: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Outliers and Influential Points

•A regression outlier is an observation that lies far away from the trend that the rest of the data follows.

•An observation is influential if

– its x value is relatively low or high compared to the remainder of the data.

– the observation is a regression outlier.Influential observations tend to pull the regression

line toward that data point and away from the rest of the data points.

76

Page 77: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Be Cautious of Influential Points

•One reason to plot the data before you do a correlation or regression analysis is to check for unusual observations.

•Search for observations that are regression outliers, being well removed from the trend that the rest of the data follow.

77

Page 78: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Outliers and Influential Points

78

Page 79: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Outliers and Influential Points

• An Observation Is a Regression Outlier if it is Far Removed from the Trend that the Rest of the Data Follow. The top two points are regression outliers. Not all regression outliers are influential in affecting the correlation or slope. Question: Which regression outlier in this figure is influential?

79

Page 80: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Correlation Does not Imply Causation

•In a regression analysis, suppose that as x goes up, y also tends to go up (or down). Can we conclude that there’s a causal connection, with changes in x causing changes in y?

– A strong correlation between x and y means that there is a strong linear association that exists between the two variables.

– A strong correlation between x and y, does not mean that x causes y to change.

80

Page 81: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Correlation Does not Imply Causation (Extra – Credit Exercise)

Data are available for all fires in Chicago last year on x = number

of firefighters at the fire and y = cost of damages due to the fire. 1. Would you expect the correlation to be negative, zero, or

positive? 2. If the correlation is positive, does this mean that having more firefighters at a fire causes the damages to be worse? Yes or

No? 3. Identify a third variable that could be considered a

common cause of x and y:

Distance from the fire station Intensity of the fire Size of the fire

81

Page 82: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

Lurking Variables & Confounding

A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest.

Ice cream sales and drowning – lurking variable = temperature

Reading level and shoe size – lurking variable = age Childhood obesity rate and GDP-lurking variable =

time

When two explanatory variables are both associated with a response variable but are also associated with each other, there is said to be confounding.

Lurking variables are not measured in the study but have the potential for confounding.

82

Page 83: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

The Effect of Lurking Variables on Associations

• Lurking variables can affect associations in many ways. For instance, a lurking variable may be a common cause of both the explanatory and response variable.

• In practice, there’s usually not a single variable that causally explains a response variable or the association between two variables. More commonly, there are multiple causes . When there are multiple causes, the association among them makes it difficult to study the effect of any single variable.

83

Page 84: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

The Effects of Confounding on Associations

• When two explanatory variables are both associated with a response variable but are also associated with each other, confounding occurs.

• It is difficult to determine whether either of them truly causes the response because a variable’s effect could be at least partly due to its association with the other variable.

84

Page 85: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

85

LEVERAGE AND INFLUENTIAL POINTS

LEVERAGE

DATA POINTS WITH X-VARIABLES FAR FROM THE MEAN OF X ARE SAID TO EXERT LEVERAGE ON A LINEAR MODEL. HIGH LEVERAGE POINTS PULL THE LINE CLOSE TO THEM, AND SO THEY CAN HAVE A LARGE EFFECT ON THE LINE, SOMETIMES COMPLETELY DETERMINING THE SLOPE AND Y- INTERCEPT. WITH HIGH ENOUGH LEVERAGE, THEIR RESIDUALS CAN APPEAR TO BE DECEPTIVELY SMALL.

Page 86: 1 EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES SCATTERPLOTS, ASSOCIATION, AND CORRELATION ADDITIONAL REFERENCE READING MATERIAL COURSEPACK PAGES

86

INFLUENTIAL POINT

• IF OMITING A POINT FROM THE DATA RESULTS IN A VERY DIFFERENT REGRESSION MODEL, THEN THAT POINT IS CALLED AN INFLUENTIAL POINT.

• ILLUSTRATIVE EXAMPLES