essential question: how can we use math to predict the future? 7.1 fitting data to a line
TRANSCRIPT
ESSENTIAL QUESTION: HOW CAN WE USE MATH TO PREDICT THE
FUTURE?
7.1 Fitting Data to a Line
Fitting a Line to Data
This is called three different things: Least Squares Regression Linear Regression Best Fit Line
It involves estimating a line of fit for a scatter plot then finding the slope and y-intercept of the data You can then plug in any x value to get a
corresponding y-value – potentially predicting future data values that have not happened yet
Positive Slope/Pts are close together
Negative Slope/Pts are close together
Types of Correlation:Correlation is how closely the line matches the data (pts close together = good; pts spread out =bad)
0 2 4 6 8 10 12 140
5
10
15
20
25
30
35
40
Good Negative Corre-lation
0 2 4 6 8 10 12 14 160
5
10
15
20
25
30
35
Good Positive Correlation
No Correlation – can't really tell if it is positive or negative-You cannot really draw a line that would fit all the data-The data has a really bad r-value and potentially high standard deviation for "y" or output values
Types of Correlation Continued
0 1 2 3 4 5 6 7 80
10
20
30
40
50
60
No Correlation
How do we use the calculator to find the best fit line/linear regression line/least square regression
line?
Plug the data into your lists: Press Stat then Edit to go to the lists Make sure it is referencing L1 and L2 Enter data in each list (when putting in yearly data –
always refer to the starting point as year zero)After all data is entered: Press Stat – Right –
4-Enter to find the LinReg line a is the slope b is the y-int
You can plug in future values to find future data points
What does least square regression mean?
Applet
Median – Median Line
Find the mean, median, and std. dev for each data set below: A) 1,5,7,486 B) 1,5,7,12 Which data measure is unaffected by the outlier?
This is a line of best fit that is not influenced by outliers – similar to the way the median is not influenced by outliers in the data.
Which of the below would be the Median-Median line of the data red or green?
Correlation and Causation
An R-Value above 0.7 is a good positive correlation An R-Value below -0.7 is a good negative correlation A good correlation does not necessarily imply a causation.
Examples: - Hours of study correlated with test grades- Lower likelihood of cancer due to taking a certain pharmaceutical- When Michael Turner rushes more than 20 times the Falcons are 15-1 etc.
Correlation means there is a good mathematical relationship, so we can use it to predict future values. Causation means that x caused y or vice versa – this is rarely true.
When given a scenario you can almost always argue there is no causation
There are times you can argue that a cause and effect relationship exists between the independent and dependant variables if you have a good reason the answer could be accepted.
HIGH Q!
Types of Samples
Simple Random Sample (SRS) – best type of sample, each data point has an equal opportunity of being chosen
Self Selected Sample – those in the population who chose to volunteer data are in the sample.
Convenience Sample – those in the population who are easiest to reach are in the sample.
Systematic Sample – a rule is used to sample, every fourth person is chosen, every other data point is chosen etc…. (this is probably the second best of these options)
Bias
How do we determine bias in a sample? If a sample is biased it means that certain parts of the
population are underrepresented. Examples: Only sampling college students.
Internet surveys. Measuring the average height and weight of American
Males and declaring this is the average height and weight of humans.