stat 217 – day 25 regression. last time - anova when? comparing 2 or means (one categorical and...
Post on 15-Jan-2016
216 views
TRANSCRIPT
Stat 217 – Day 25
Regression
Last Time - ANOVA
When? Comparing 2 or means (one categorical and one
quantitative variable) Research question
Null hypothesis: 1= 2 = … = I (no association between the two variables)
Alternative hypothesis: at least one differs (there is an association between the two variables)
Example (with 3 groups…)
Not significant Significant
How? Compare differences in means vs. the natural
variability in the data (s) Compare test statistic to F distribution, p-value Output: test statistic, p-value, ratio of variability
between groups to variability within groups Demo
Strong evidence (p-value = .03 < .05) that the type of disability affected the ratings, on average, of these 70 students
Technical Conditions
Technical conditions: Randomness: random sampling or random
assignment Sample sizes: Normal populations Equal standard deviations: Check ratio of sample
standard deviations
Kinda need same shape and spread for a comparison of just means to be reasonable
Technical Conditions
1) RandomnessRandom assignment
2) Each population follows a normal distribution
3) Each population has the same standard deviation1.794/1.482 < 2
Summary: Comparing several groupsCategorical response H0: 1= 2 = … = I
Ha: at least one differs
Is test statistic large? Chi-square test
Expands 2 sample z-test
Quantitative response H0: 1= 2 = … = I
Ha: at least one differs
Is test statistic large? ANOVA
Expands 2 sample t-test
No association between variables
Is an association between variables
Exam 2 comments
Pet owners and CPR(a) Make sure interpret the calculated interval
“55% of pet owners “ – sample or population?
(b) Technical conditions Using the ones for categorical data
(c) See whether .5 is inside CI
(d) Interpretation of p-value: chance of data at least this extreme if null hypothesis is true
(e) Why is sample size information important? Sampling variability
Exam 2 comments
Anchoring(a) Make sure clear which is which
(b) “TC met”, TOS applet with 2 means
(c) Chicago average estimate is 51K to 1.6 million higher than Green Bay average (direction!)
(d) What does it mean to say it’s significant? What is the actual conclusion to the research question
Exam 2 comments
Lab 6
Exam 2 comments
Multiple choice1. B
2. C – either is possible
3. B – small p-value eliminates “random chance” as a plausible explanation
4. B – it’s only unusual if she’s guessing (7s and 11s are only unusual for fair dice)
Extra Credit More likely to get a value far from mean with
smaller sample size (e.g., n =1)
Next Topic: Two quantitative variables Graphical summary Numerical summary Model to allow predictions Inference beyond sample data
Activity 26-1 (p. 532)
Have a sample of 20 homes for sale in Arroyo Grande in 2007 Variable 1 = house price Variable 2 = house size
Is there a relationship between these 2 variables? Does knowing the house size help us predict its
price?
1) Graphical summary: scatterplotPrice vs. size
1. DirectionPositive or negative?
2. StrengthHow closely follow the pattern
3. FormLinear?
Describing Scatterplots
Activity 26-3 (p. 536)
Positive None Negative
Strong Weak Strong
DirectionStrengthForm: Linear or not
2) Numerical summary: Correlation coefficient (Act 27-1)
.994 .889 .510 -.081 -.450 -.721 -.907
Temperatures vs. Month
Direction: positive then negative Form: nonlinear Strength: very strong
r = .257
Example 1: Price vs. Size
r = .780
What do you learn from these numerical and graphical summaries?
Turn in, with partner Activity 26-6
parts b, c, and e
For Thursday Pre-lab for Lab 9
For Monday Activity 26-7 HW 7
2) Guess the correlation
Applet
3) Model
IF it is linear, what line best summarizes the relationship? Demo
Moral: The “least squares regression line” minimizes the sum of the squared residuals
Interpreting the equation (p. 577)
a = intercept, b = slope Slope = predicted change in response associated
with a one-unit increase in the explanatory Intercept = predicted value of response when
explanatory variable = 0
bxay ˆ Explanatory variableResponse variable
3) Model?
Price-hat = 265222 + 169 size Slope = each additional square foot in house size
is associated with a $169 increase in predicted price (price per foot) Be a little careful here, don’t sound too “causal” I really do like the “predicted” in here
Intercept = a house of size zero (empty lot?) is predicted to cost $265,222 Be a little careful here, don’t have any houses in data
set with size near 0…
Using the model
Price-hat = 265222 + 169 size Predicted price for a 1250 square foot house?
Predicted price for a 3000 square foot house? Extrapolation: Very risky to use regression equation to
predict values far outside the range of x values used to derive the line!
4) Is this relationship statistically significant? Is it possible there is no relationship between
house price and size in the population of all homes for sale at that time, and we just happened to coincidently obtain this relationship in our random sample?
Or is this relationship strong enough to convince us it didn’t happen just by chance but reflects a genuine relationship in the population?
p. 605
Let represent the slope of the population regression line
H0: = 0; no relationship between price and size in population
Ha: ≠ 0; is a relationship < negative; > positive
Idea: Want to compare the observed sample slope to zero, does it differ more than we would expect by chance?
Assume = 0
How many standard deviations away?
Variation in sample slopes
Sample slopes our slope?
Standard error = SE(b)
169
Minitab
The regression equation is
Price = 265222 + 169 Size (sq ft)
Predictor Coef SE Coef T P
Constant 265222 42642 6.22 0.000
Size (sq ft) 168.59 31.88 5.29 0.000
Regression equation(add hat)
b
a
SE(b) Two-sided
t=(observed slope-hypothesized slope)standard error of slope= (b – 0)/SE(b)= (168.59-0)/31.88 = 5.29
Turn in, with partner Price vs. pages: Interpret slope/evaluate p-value
For Tuesday Activities 26-7, 28-5 Be working on Lab 9 and HW 7
The regression equation is Price = - 3.4 + 0.147 Pages
Predictor Coef SE Coef T PConstant -3.42 10.46 -0.33 0.746Pages 0.14733 0.01925 7.65 0.000
Describing Scatterplots
Activity 26-6 (p. 539)
Positive, nonlinear, fairly strong Causation?
Strength: How closely do the points follow the pattern?
DirectionStrengthForm: Linear or not
For Monday
Activities 26-7, 28-5 Be working on Lab 9 and HW 7