linear regression - mr. jones · 2019. 11. 21. · linear regression lesson 5-5 go online to see...

4
Copyright © McGraw-Hill Education Linear Regression Lesson 5-5 Go Online to see how to use a graphing calculator with this example. Today’s Vocabulary best-fit line linear regression correlation coefficient residual Learn Linear Regression and Best-Fit Lines A calculator can find the line that most closely approximates data in a scatter plot, called the best-fit line. Linear regression is one algorithm used to find a precise line of fit for a set of data. Calculators may also compute a number r called the correlation coefficient . This measure shows how well data are modeled by a linear equation. It will tell you if a correlation is positive or negative and how closely the equation is modeling the data. The closer the correlation coefficient is to 1 or –1, the more closely the equation models the data. Weak Correlation Moderate Correlation Strong Correlation x O y x O y x O y r = 0.02 r = 0.72 r = -0.97 Example 1 Find a Best-Fit Line BASEBALL The table shows Jackie Robinson’s total hits during each season of his major league career. Use a graphing calculator to write an equation for the best-fit line for the data. Then find and interpret the correlation coefficient. Year 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 Total Hits 175 170 203 140 185 157 159 120 81 98 Step 1 Enter the data. Before you begin, make sure that your Diagnostic setting is on. You can find this under the CATALOG menu. Press D and then scroll down and click DiagnosticOn. Then press . Study Tip Correlation Coefficient The table shows a rule of thumb for determining how well the equation models the data based on the correlation coefficient. Correlation Coefficient Strength of Correlation | r | 0.8 Strong 0.5 | r | < 0.8 Moderate | r | < 0.5 Weak Think About It! Write the following correlation coefficients in order from weakest to strongest. 0.85 0.3 1 0.78 0.54 0.06 0.9 (continued on the next page) Today’s Goals Write equations of best- fit liens using linear regressions. Determine how well functions fit sets of data. Lesson 5-5 • Linear Regression 319 0.06, 0.3, 0.54, 0.78, 0.85, 0.9, 1 THIS MATERIAL IS PROVIDED FOR INDIVIDUAL EDUCATIONAL PURPOSES ONLY AND MAY NOT BE DOWNLOADED, REPRODUCED, OR FURTHER DISTRIBUTED.

Upload: others

Post on 01-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Regression - MR. JONES · 2019. 11. 21. · Linear Regression Lesson 5-5 Go Online to see how to use a graphing calculator with this example. Today’s Vocabulary best-fit

Cop

yrig

ht ©

McG

raw

-Hill

Edu

cati

on

Linear Regression

Lesson 5-5

Go Online to see how to use a graphing calculator with this example.

Today’s Vocabulary best-fit linelinear regressioncorrelation coefficientresidual

Learn Linear Regression and Best-Fit LinesA calculator can find the line that most closely approximates data in a scatter plot, called the best-fit line. Linear regression is one algorithm used to find a precise line of fit for a set of data.

Calculators may also compute a number r called the correlation coefficient. This measure shows how well data are modeled by a linear equation. It will tell you if a correlation is positive or negative and how closely the equation is modeling the data. The closer the correlation coefficient is to 1 or –1, the more closely the equation models the data.

Weak Correlation Moderate Correlation Strong Correlation

xO

y

xO

y

xO

y

r = 0.02 r = 0.72 r = -0.97

Example 1 Find a Best-Fit LineBASEBALL The table shows Jackie Robinson’s total hits during each season of his major league career. Use a graphing calculator to write an equation for the best-fit line for the data. Then find and interpret the correlation coefficient.

Year 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956

Total Hits 175 170 203 140 185 157 159 120 81 98

Step 1 Enter the data.Before you begin, make sure that your Diagnostic setting is on. You can find this under the CATALOG menu. Press D and then scroll down

and click DiagnosticOn. Then press .

Study Tip

Correlation Coefficient The table shows a rule of thumb for determining how well the equation models the data based on the correlation coefficient.

Correlation Coefficient

Strength of Correlation

| r | ≥ 0.8 Strong0.5 ≤ | r | < 0.8

Moderate

| r | < 0.5 Weak

Think About It!

Write the following correlation coefficients in order from weakest to strongest.

0.85 0.3 1 ‒0.78

0.54 ‒0.06 ‒0.9

(continued on the next page)

Today’s Goals● Write equations of best-

fit liens using linear regressions.

● Determine how well functions fit sets of data.

Lesson 5-5 • Linear Regression 319

‒0.06, 0.3, 0.54, ‒0.78, 0.85, ‒0.9, 1

THIS MATERIAL IS PROVIDED FOR INDIVIDUAL EDUCATIONAL PURPOSES ONLY AND MAY NOT BE DOWNLOADED, REPRODUCED, OR FURTHER DISTRIBUTED.

Page 2: Linear Regression - MR. JONES · 2019. 11. 21. · Linear Regression Lesson 5-5 Go Online to see how to use a graphing calculator with this example. Today’s Vocabulary best-fit

Copyright ©

McG

raw-H

ill Education

Enter the data by pressing and selecting the Edit option. Let the year 1947 be represented by year . Enter the years since 1947 into List 1 (L1). These will represent the -values. Enter the total hits into List 2 (L2). These will represent the -values.

Step 2 Perform the regression.Perform the regression by pressing and selecting the CALC

option. Scroll down to LinReg (ax+b) and press . Make sure L1 is the Xlist and L2 is the Ylist. Then select Calculate.

Step 3 Interpret the results.Write the equation of the regression line by rounding the a and b values on the screen. The form that we chose for the regression was ax + b, so the equation is y = + . The correlation coefficient is about , which means that the equation models the data . Its negative value means that as the years since 1947 increase, the total number of Jackie Robinson’s hits .

slopey-intercept

correlationcoe�cient

Check TEMPERATURE The table shows the average annual temperature for the top 10 most populous states in 2014.

Rank 1 2 3 4 5 6 7 8 9 10

Temperature (°F)

59.4 64.8 70.7 45.4 51.8 48.8 50.7 63.5 59 44.4

Part A Use a graphing calculator to write an equation for the best-fit line for the data. Round to the nearest hundredth. y = x +

Part B Find the correlation coefficient r. Round to the nearest hundredth.r =

Part C Based on your answer to part b, does the equation model the data well? Yes or No?

Go Online You can complete an Extra Example online.

Math History Minute

One of the areas of interest of British statistician Florence Nightingale David (1909–1993), who was named after family friend Florence Nightingale, was the distribution of correlation coefficients. In 1938, she released a book entitled Tables of the Correlation Coefficient, for which all of the calculations were done on a hand-cranked mechanical calculator.

Use a Source

Choose another baseball player and research the total number of hits they have he has by season. Use a graphing calculator to write an equation for the best-fit line, and decide whether the equation models the data well.

Your Notes

320 Module 5 • Creating Linear Equations

Sample answer: Derek Jeter’s total hits per season can be modeled by y = −0.46x + 177.6. The correlation coefficient is about –0.05, which means that the equation does not model the data well.

-10.32x-0.8022

-1.20

-0.41

no

62.47

welldecreases

195.22

0

x

y

THIS MATERIAL IS PROVIDED FOR INDIVIDUAL EDUCATIONAL PURPOSES ONLY AND MAY NOT BE DOWNLOADED, REPRODUCED, OR FURTHER DISTRIBUTED.

Page 3: Linear Regression - MR. JONES · 2019. 11. 21. · Linear Regression Lesson 5-5 Go Online to see how to use a graphing calculator with this example. Today’s Vocabulary best-fit

Cop

yrig

ht ©

McG

raw

-Hill

Edu

cati

on

Go Online You can complete an Extra Example online.

Example 2 Use a Best-Fit LineBest-fit lines can be used to estimate values that are not in the data. Recall that when we estimate values that are between known values, this is called linear interpolation. When we estimate a number outside the range of data, it is called linear extrapolation.

SHOPPING The table shows U.S. desktop online sales on Cyber Monday since 2007. Estimate the Cyber Monday sales in 2025.

Year 2009 2010 2011 2012 2013 2014 2015 2016

Sales (millions of dollars)

887 1028 1251 1465 1735 2038 2280 2671

Step 1 Graph the data.Enter the data from the table into the lists. Let be represented by 0. Then the years since 2009 are the x -values. Let the be the y-values. Graph the scatter plot. Turn on Plot1 under the STAT PLOT menu and choose . Use L1 for the Xlist and L2 for the Ylist.Change the viewing window so that all data are visible by pressing

and then selecting ZoomStat.

Step 2 Perform the regression.Perform the regression using the data in the lists. The equation is about y =

x + . The correlation coefficient is , which means that the equation models the data .

Step 3 Graph the best-fit line.Graph the best-fit line. Press and choose Statistics.

From the EQ menu, choose RegEQ. Press .

Step 4 Extrapolate.Use the graph to predict the Cyber Monday sales. Change the viewing window to include the x-value to be evaluated,

. Also increase Ymax to accommodate the increasing y-values. Press CALC 16 to find that when

x = 16, y ≈ .

We can estimate that in 2020, Cyber Monday sales will be about $ .

Think About It!

Why is it helpful to define x as years since 2009 instead of years?

Study Tip

Assumptions Using a best-fit line to make predictions requires you to assume that the trend continues at a constant rate and that more people choose to shop on Cyber Monday each year.

[-0.7, 7.7] scl: 1 by [583.72, 2974.28] scl: 1

[-0.7, 7.7] scl: 1 by [583.72, 2974.28] scl: 1

[-0.7, 17] scl: 1 by [583.72, 5500] scl: 1

Lesson 5-5 • Linear Regression 321

2009

sales

254.51 778.580.9935

well

2025

4851

4,851,000,000

16

Sample answer: Because the data do not start until 2009, using years since 2009 makes it easier to graph.

THIS MATERIAL IS PROVIDED FOR INDIVIDUAL EDUCATIONAL PURPOSES ONLY AND MAY NOT BE DOWNLOADED, REPRODUCED, OR FURTHER DISTRIBUTED.

Page 4: Linear Regression - MR. JONES · 2019. 11. 21. · Linear Regression Lesson 5-5 Go Online to see how to use a graphing calculator with this example. Today’s Vocabulary best-fit

Copyright ©

McG

raw-H

ill Education

Talk About It!

Why would a residual plot where the residuals are almost on the line y = 0 indicate a very good fit? Explain your reasoning.

Think About It!

Use a calculator to find the correlation coefficient of the best-fit line. Does the correlation coefficient also suggest a good-fit? Justify your argument.

Check SOCIAL MEDIA The table shows the number of daily users on a social media site in various years.

Year 2011 2012 2013 2014 2015

Daily Users (millions) 372 526 665 802 936

Use linear regression to estimate the number of daily users in millions on the site in 2030.

A. 3187.4 users B. 4591.4 users

C. 285,391.4 users D. 3047 users

Learn ResidualsWhen finding a best-fit line, not all data will lie on the line. The difference between an observed y-value and its predicted y-value on a regression line is called a residual. When residuals are plotted on a scatter plot, they can help assess how well the best-fit line describes the data. If there is no pattern in the residual plot, then the best-fit line is a good fit.

Example 3 Graph and Analyze a Residual PlotTHANKSGIVING The table shows the average price of a 10-person Thanksgiving dinner from 2004 to 2014. Determine whether the best-fit line models the data well by graphing a residual plot.

Year 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Price ($)

35.68 36.78 38.10 42.26 44.61 42.91 43.47 49.20 49.48 49.04 49.41

Step 1 Find the best-fit line.Enter the data from the table into the lists. Let 2004 be represented by 0. Then the years since are the x-values. Let the

be the y-values. Perform the linear regression using the data in the lists.Step 2 Graph the residual plot.Turn on PLOT2 under the STAT PLOT menu and choose . Use L1 for the Xlist and RESID for the Ylist. You can obtain RESID by pressing [LIST] and selecting RESID from the list of names. Graph the scatter plot of the residuals bypressing and choosing ZoomStat. The residuals appear to be randomly scattered and centered about the line y = 0. Thus, the best-fit line seems to model the data .

Go Online You can complete an Extra Example online.

Go Onlineto see how to use a graphing calculator with this example.

[-1, 1] scl: 1 by [-2.52, 3.21] scl: 1

322 Module 5 • Creating Linear Equations

D

2004prices

well

Sample answer: If the residuals are almost on y = 0, then for every x-value, the difference between the observed y-value and its predicted y-value is almost 0. Thus, all of the data are very close to the best-fit line.

Sample answer: Yes; the correlation coefficient is about 0.9438, which means that the equation models the data well.

THIS MATERIAL IS PROVIDED FOR INDIVIDUAL EDUCATIONAL PURPOSES ONLY AND MAY NOT BE DOWNLOADED, REPRODUCED, OR FURTHER DISTRIBUTED.