regression and models with multiple factors · least-squares • the method of least squares...
TRANSCRIPT
![Page 1: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/1.jpg)
Regression and Models with Multiple Factors
Ch. 17, 18
![Page 2: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/2.jpg)
Scatter Plot
70 75 80
15
20
25
Snout-Vent Length
Mass
70 75 80
15
20
25
![Page 3: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/3.jpg)
Linear Regression
70 75 80
15
20
25
Snout-Vent Length
Mass
70 75 80
15
20
25
![Page 4: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/4.jpg)
Least-squares
• The method of least squares chooses the line that minimizes the squared deviations along the Y-axis
• Y = bX + a
• The slope of the line is the average change in variable Y given a change in variable X
• The slope is usually of greater interest than the intercept
• The slope is the covariance between X and Y divided by the variance in X – a positive slope implies a positive covariance and a negative slope implies a negative covariance
![Page 5: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/5.jpg)
Linear Regression Uses
• Depict the relationship between two variables in an eye-catching fashion
• Test the null hypothesis of no association between two variables – The test is whether or not the slope is zero
• Predict the average value of variable Y for a group of
individuals with a given value of variable X – Note that variation around the line can make it very
difficult to predict a value for a given individual with much confidence
![Page 6: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/6.jpg)
What are Residuals?
![Page 7: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/7.jpg)
What are Residuals? Chimps have larger testes than predicted for their body weight Gorillas have smaller testes than predicted for their body weight Chimps have a positive residual Gorillas have a negative residual
![Page 8: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/8.jpg)
What are Residuals
• In general, the residual is the individual’s departure from the value predicted by the model
• In this case the model is simple – the linear regression – but residuals also exist for more complex models
• For a model that fits better, the residuals will be smaller on average
• Residuals can be of interest in their own right, because they represent values that have been corrected for relationships that might be obscuring a pattern (e.g., the body weight-testes mass relationship)
![Page 9: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/9.jpg)
Strong Inference for Observational Studies
• Noticing a pattern in the data and reporting it represents a post hoc analysis – This is not hypothesis testing – The results, while potentially important, must be interpreted
cautiously
• What can be done? – Based on a post-hoc observational study, construct a new
hypothesis for a novel group or system that has not yet been studied
– For example, given the primate data, a reasonable prediction is that residual testes mass in deer will be associated with mating system
– Collect a new observational data set from the new group to test the hypothesis
![Page 10: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/10.jpg)
Phylogenetic Pseudoreplication
http://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347(10)00146-1
![Page 11: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/11.jpg)
Performing Linear Regression in R
• Run the regression: > reg_newt <- lm(mass ~ svl, data=males) > summary(reg_newt) > confint(reg_newt) #get the CI for slope
• Add the line to a plot:
> plot(mass ~ svl, data=males) > abline(reg_newt) See http://whitlockschluter.zoology.ubc.ca/r-code/rcode17 for code to
draw the line’s 95% CI on the graph
![Page 12: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/12.jpg)
Interpreting the Output Call: lm(formula = mass ~ svl, data = males) Residuals: Min 1Q Median 3Q Max -5.5081 -1.7670 -0.5128 1.6322 6.4942 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -31.48430 7.43453 -4.235 0.000108 *** svl 0.68766 0.09806 7.013 8.71e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.514 on 46 degrees of freedom Multiple R-squared: 0.5167, Adjusted R-squared: 0.5062 F-statistic: 49.18 on 1 and 46 DF, p-value: 8.712e-09
Slope
p-value for slope
![Page 13: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/13.jpg)
Least Squares Regression with 95% CI
70 75 80
15
20
25
Snout-Vent Length
Mass
70 75 80
15
20
25
![Page 14: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/14.jpg)
Code for That males <- read.csv("MaleNewtSampleData.csv") plot(males$mass ~ males$svl, pch=19, xlab=list("Snout-Vent Length", cex=1.5), ylab=list("Mass",cex=1.5)) box(lwd=3) axis(1,lwd=3) axis(2,lwd=3) reg_newt <- lm(mass ~ svl, data=males) abline(reg_newt, lwd=3, col="maroon") summary(reg_newt) #Add confidence limits for the regression line xpt <- seq(min(males$svl),max(males$svl), length.out=100) ypt <- data.frame(predict(reg_newt, newdata=data.frame(svl=xpt), interval="confidence")) lines(ypt$lwr ~ xpt, lwd=2, lty=2) lines(ypt$upr ~ xpt, lwd=2, lty=2)
![Page 15: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/15.jpg)
Assumptions of Linear Regression
• The true relationship must be linear
• At each value of X, the distribution of Y is normal (i.e., the residuals are normal)
• The variance in Y is independent of the value of X
• Note that there are no assumptions about the distribution of X
![Page 16: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/16.jpg)
Common Problems
• Outliers – Regression is extremely sensitive to outliers – The line will be drawn to outliers, especially along the x-axis – Consider performing the regression with and without outliers
• Non-linearity
– Best way to notice is by visually inspecting the plot and the line fit – Try a transformation to get linearity [often a log transformation]
• Non-normality of residuals
– Can be detected from a residual plot – Possibly solved with a transformation
• Unequal variance
– Usually visible from a scatterplot or from a residual plot
![Page 17: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/17.jpg)
Residual Plot
70 75 80
-4-2
02
46
svl
resid
ua
ls(r
eg
_n
ew
t)
plot(residuals(reg_newt) ~ svl, data=males, pch=19)
![Page 18: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/18.jpg)
Examples of Problems
![Page 19: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/19.jpg)
Outlier Example
![Page 20: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/20.jpg)
Multiple Explanatory Variables
• The reason ANOVA is so widely used is that it provides a framework to simultaneously test the effects of multiple factors
• ANOVA also makes it possible to detect interactions among the factors
• ANOVA is a special case of a general linear model
![Page 21: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/21.jpg)
General Linear Models
• GLMs handle categorical factors and continuous factors in a single modeling framework
• ANOVA is a special case with just categorical explanatory variables
• Linear regression is a special case with just continuous explanatory variables
![Page 22: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/22.jpg)
Two-factor ANOVA
Testing: • The mean of “serum starved” versus “normal culture” • The mean of “wild-type” versus “WTC911” • The interaction (i.e., wild-type responds differently than WTC911 to the change in culturing conditions [serum starved versus normal culture])
![Page 23: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/23.jpg)
ANOVA with Blocking
• Analysis is similar to two-factor ANOVA (at least it was in 1995)
• Except: – Typically do not test for an interaction term
– Don’t care if the block is significant or not
– Remember: you don’t care about the block; you simply want to reduce the variance introduced by it
![Page 24: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/24.jpg)
Analysis of Covariance
• Used to test for a difference in means, while correcting for a variable that is correlated with the response variable – The slopes must not differ in the two groups
– In other words, the mean comparison is only valid if the interaction term is not significant
• Also used to compare the slope of two regression lines – If the interaction term is significant, then the
conclusion is that the slopes are different
![Page 25: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/25.jpg)
ANCOVA Example
![Page 26: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/26.jpg)
ANCOVA Example
![Page 27: Regression and Models with Multiple Factors · Least-squares • The method of least squares chooses the line that minimizes the squared deviations along the Y-axis • Y = bX + a](https://reader034.vdocuments.mx/reader034/viewer/2022050406/5f8328fcd518423e8a24c13c/html5/thumbnails/27.jpg)
Summary
• Statistical models can be quite complex, with potentially many factors and interaction terms
• The model is specified by something that looks like an equation: Y = μ + A + B + A*B
• General linear models allow you to combine categorical and continuous factors into a single model
• Your sample size will limit the complexity of the model
• For complex models, you will need to think about procedures to choose the best model (i.e., the model that best explains your observations)