chapter 3jga001/chapter 4 alford.pdf · 3.1 - 8 example 4.1 continued solution step 1...

Post on 16-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

3.1 - 1

3.1 - 2

Overview

4.1 Tables and Graphs for the Relationship Between Two Variables

4.2 Introduction to Correlation

4.3 Introduction to Regression

3.1 - 3

4.1 Tables and Graphs for the Relationship Between Two Variables Objectives:

By the end of this section, I will be

able to…

1) Construct and interpret crosstabulations for two categorical variables.

2) Construct and interpret clustered bar graphs for two categorical variables.

3) Construct and interpret scatterplots for two quantitative variables.

3.1 - 4

Crosstabulations

Tabular method for simultaneously summarizing the data for two categorical (qualitative) variables

Constructing a Crosstabulation

Step 1

Put the categories of one variable at the top of each column, and the categories of the other variable at the beginning of each row.

3.1 - 5

Crosstabulations

Steps for Constructing a Crosstabulation

Step 2

For each row and column combination, enter the number of observations that fall in the two categories.

Step 3

The bottom of the table gives the column totals, and the right-hand column gives the row totals.

3.1 - 6

Table 4.1 Prestigious career survey data set

3.1 - 7

Example 4.1 - Crosstabulation of the prestigious career survey

Construct a crosstabulation of career

and gender.

3.1 - 8

Example 4.1 continued

Solution

Step 1

Crosstabulation given in Table 4.2.

Categories for the variable gender are at the top

Categories for the variable career are on the left.

Each student in the sample is associated with a certain cell

For example, a male student who reported Military Officer appears as one of the four students in the “Male” column and the “Military Officer” row.

3.1 - 9

Example 4.1 continued

Step 2

For each row and column combination, enter the number of observations that fall in the

two categories.

This is shown in Table 4.2.

Step 3

“Total” column contains the sum of the counts of the cells in each row (category) of the career variable

This sum represents the frequency distribution for this variable.

3.1 - 10

Example 4.1 continued

Step 3 - continued

“Total” contains sums the counts of the cells in each column (category) of the gender variable

This sum represents the frequency distribution for this variable.

Thus, we see that crosstabulations contain the frequency distributions of each of the two variables.

3.1 - 11

Example 4.1 continued

Table 4.2 Crosstabulation of prestigious career survey data set

3.1 - 12

Example 4.1 continued

Use the crosstabulation to look for patterns

For example, does there appear to be a difference between males and females responses?

Most of the students who responded “Doctor” were females, and most of the students who responded “Military Officer” were males.

3.1 - 13

Clustered Bar Graphs

Useful for comparing two categorical variables

Used in conjunction with crosstabulations

Each set of bars in a clustered bar graph represents a single category of one variable

One can construct clustered bar graphs using either frequencies or relative frequencies

3.1 - 14

Scatterplots

Used to summarize the relationship between two quantitative variables that have been

measured on the same element

Graph of points (x, y) each of which represents one observation from the data set

One of the variables is measured along the horizontal axis and is called the x variable

The other variable is measured along the vertical axis and is called the y variable

3.1 - 15

Predictor Variable and Response Variable

The value of the x variable can be used to predict or estimate the value of the

y variable The x variable is referred to as the predictor

variable

The y variable is called the response variable

3.1 - 16

Scatterplot Terminology

Note the terminology in the caption to Figure 4.2.

When describing a scatterplot, always indicate the y variable first and use the term versus (vs.) or against the x variable.

This terminology reinforces the notion that the y variable depends on the x variable.

3.1 - 17

FIGURE 4.2

Scatterplot of sales price versus square footage.

3.1 - 18

Positive relationship

As the x variable increases in value, the

y variable also tends to increase.

FIGURE 4.3 (a) Scatterplot of a positive relationship

3.1 - 19

Negative relationship

As the x variable increases in value, the y variable tends to decrease

FIGURE 4.3 (b) scatterplot of a negative relationship

3.1 - 20

No apparent relationship

As the x variable increases in value, the y variable tends to remain unchanged

FIGURE 4.3 (c) scatterplot of no apparent relationship.

3.1 - 21

Example 4.4 - Relationship between lot size and price in Glen Ellyn, Illinois

Using Figure 4.2, investigate the relationship

between lot square footage and lot price.

3.1 - 22

Example 4.4 continued

Solution

The scatterplot in Figure 4.2 most resembles Figure 4.3a, where a positive relationship

exists between the variables. Thus, smaller lot sizes tend to be associated

with lower prices, and larger lot sizes tend to be associated with higher prices.

Put another way, as the lot size increases, the lot price tends to increase as well.

3.1 - 23

Summary

Crosstabulation summarizes the relationship between two categorical variables.

A crosstabulation is a table that gives the counts for each row-column combination, with totals for the rows and columns.

Clustered bar graphs are useful for comparing two categorical variables and are often used in conjunction with crosstabulations.

For two numerical variables, scatterplots summarize the relationship by plotting all the (x, y) points.

3.1 - 24

4.2 Introduction to Correlation

Objective:

By the end of this section, I will be

able to…

1) Calculate and interpret the value of the correlation coefficient.

3.1 - 25

Correlation Coefficient r Measures the strength and direction of the

linear relationship between two variables.

sx is the sample standard deviation of the x data values.

sy is the sample standard deviation of the y data values.

)( )(

( 1) x y

y yx xr

n s s

3.1 - 26

Example

Page 183-184

3.1 - 27

Example

3.1 - 28

Example

Positively correlated, negatively correlated, not correlated?

3.1 - 29

Example

ANSWER: positively correlated

3.1 - 30

Example

3.1 - 31

Example

do 20(b) first:

20(a)

2553x 2583y

6.5105

2553

n

xx 6.516

5

2583

n

yy

3.1 - 32

Example

20(c)

97.104

2.481

1

2

n

xxsx

32.54

2.113

1

2

n

yysy

3.1 - 33

Example

20(d)

8792.0

)320.5)(968.10)(15(

2.205

)1(

)()(

yxssn

yyxxr

3.1 - 34

Equivalent Computational Formula for Calculating the Correlation Coefficient r

2 22 2

/

/ /

xy x y nr

x x n y y n

3.1 - 35

Option 1

• Enter the data in two lists.

• Press STAT and select TESTS

• LinRegTTest is option F (scroll arrow up 3 places)

• Enter the names of the lists from step 1.

• Arrow down to Calculate and then press Enter

• The r value is the last value displayed; round this value to three decimal places

Calculate r Directly Using Calculator

3.1 - 36

Option 2 (page 193)

Calculate r Directly Using Calculator

3.1 - 37

Example

Page 184 20(e)

Use the calculator to verify the answer in 20(d)

3.1 - 38

Interpreting the Correlation Coefficient r

1) Values of r close to 1 indicate a positive

relationship between the two variables.

The variables are said to be positively

correlated.

As x increases, y tends to increase as well.

3.1 - 39

Interpreting the Correlation Coefficient r

2) Values of r close to -1 indicate a negative relationship between the two variables.

The variables are said to be negatively correlated.

As x increases, y tends to decrease.

3.1 - 40

Interpreting the Correlation Coefficient r

3) Other values of r indicate the lack of either a positive or negative linear relationship between the two variables.

The variables are said to be uncorrelated

As x increases, y tends to neither increase nor decrease linearly.

3.1 - 41

Guidelines for Interpreting the Correlation Coefficient r

If the correlation coefficient between two variables is

greater than 0.7, the variables are positively correlated.

between 0.33 and 0.7, the variables are mildly positively correlated.

between –0.33 and 0.33, the variables are not correlated.

between –0.7 and –0.33, the variables are mildly negatively correlated.

less than –0.7, the variables are negatively correlated.

3.1 - 42

Example

Page 184

3.1 - 43

continued

Solution

we found the correlation coefficient for the relationship between SAT I verbal and math scores to be r = 0.0.8792.

r = 0.8792 is close to 1.

We would therefore say that SAT I verbal and math scores are strongly positively correlated.

As verbal score increases, math score also tends to increase.

3.1 - 44

Example

ANSWER:

3.1 - 45

Example

Page 184

3.1 - 46

Example

Page 184

3.1 - 47

Example

ANSWER: positive

3.1 - 48

Example

Page 184

3.1 - 49

Example

ANSWER: somewhere in the middle

3.1 - 50

Example

Page 184

3.1 - 51

Example

ANSWER:

3.1 - 52

Common Error Interpreting Correlation

correlation does not imply causality

EXAMPLE: Umbrella sales are negatively correlated with

attendance at baseball games in outdoor stadiums (that is, as the amount of umbrella sales increases, the attendance at baseball games in outdoor stadiums tend to decrease). It is not correct to conclude that increased umbrella sales causes a decrease in attendance. Both of these are probably caused by a hidden variable: rainfall.

3.1 - 53

Summary

Section 4.2 introduces the correlation

coefficient r, a measure of the strength of linear association between two numeric variables.

Values of r close to 1 indicate that the variables are positively correlated.

Values of r close to –1 indicate that the variables are negatively correlated.

Values of r close to 0 indicate that the variables are not correlated.

3.1 - 54

4.3 Introduction to Regression

Objectives:

By the end of this section, I will be

able to…

1) Calculate the value and understand the meaning of the slope and the y intercept of the regression line.

2) Predict values of y for given values of x.

3.1 - 55

Interpreting the Slope of a Line

For a line with equation:

we interpret a nonzero slope m as

mxby

y increases (if m is positive) or decreases (if m is negative) by m units for every one unit increase in x.

3.1 - 56

Interpreting the y-intercept of a Line

For a line with equation:

we interpret a y-intercept b as

mxby

The y value is b when the x value is 0.

3.1 - 57

Equation of the Regression Line

Approximates the relationship between two random variables x and y

The equation is

where the regression coefficients are the slope, b1, and the y-intercept, b0.

The “hat” over the y (pronounced “y-hat”) indicates that this is an estimate of y and not necessarily an actual value of y.

xbby 10ˆ

3.1 - 58

Relationship Between Slope and Correlation Coefficient

The slope b1 of the regression line and the correlation coefficient r always have the same sign.

b1 is positive if and only if r is positive.

b1 is negative if and only if r is negative.

3.1 - 59

Regression coefficients b0 and b1

All of the quantities needed to calculate b0 and b1 have already been computed in the formula for r.

Numerators for b1 and r are exactly the same.

21xx

yyxxb xbyb 10

3.1 - 60

Example

Page 194, problems 10-12 use this table

3.1 - 61

Example

2553x 2583y

6.5105

2553

n

xx 6.516

5

2583

n

yy

3.1 - 62

Example

Page 194, problem 10(a)

426.02.481

2.20521

xx

yyxxb

3.1 - 63

Example

Page 194, problem 10(b)

084.2996.510426.06.51610 xbyb

3.1 - 64

Example

Page 194, problem 10(c)

xbby 10ˆ

xy 426.0084.299ˆ

3.1 - 65

Calculator

1. Enter the data in two lists.

2. Make a scatter plot of the data (use 2nd Y= to get STAT PLOT, choose Plot1 On, first scatterplot icon, then zoom 9)

(your plot will look different)

3.1 - 66

Calculator

3. Plot the regression line. Choose:

4.

STAT → CALC #4 LinReg(ax+b)

Include the parameters L1, L2, Y1.

NOTE: Y1 comes from VARS → YVARS, #Function, Y1

3.1 - 67

Calculator

5. Choose Y= and the equation for the regression line will be stored in Y1 Then choose GRAPH and the regression line will be plotted.

3.1 - 68

Calculator

6. Choose TRACE and you can see X and Y values on scatterplot or regression line

3.1 - 69

Example

Page 194, problem 11(a)

A slope of 0.426 means that the estimated SAT I Math score increases by 0.426 points for every increase of 1 point in the SAT I verbal score.

3.1 - 70

Example

Page 194, problem 11(b)

The y-intercept of 299.084 means that the estimated SAT I Math score is 299.084 when the SAT I Verbal score is 0.

3.1 - 71

Example

Page 194, problem 12(a)

0.512500426.0084.299y

500 when ˆ Find xy

3.1 - 72

Example

Page 194, problem 12(b)

3.516510426.0084.299y

510 when ˆ Find xy

3.1 - 73

Example

Page 194, problem 12(c)

490 when ˆ Find xy

The x values in the data set range from 497 to 522. Since 490 is not in the range of the x values in the data set, it is not appropriate to use the regression equation in this case.

3.1 - 74

Summary

Section 4.3 introduces regression, where the

linear relationship between two numerical

variables is approximated using a straight

line, called the regression line.

The equation of the regression line is written

as where the regression

coefficients are the y intercept, b0, and the

slope, b1.

0 1y b b x

3.1 - 75

Summary

The regression equation can be used

to make predictions about values of y

for particular values of x.

top related