more on the linear model
DESCRIPTION
More on the linear model. Categorical predictors. men. RT ~ Noise + Gender. women. resp ~ Condition. Demo. set.seed (666) pred = c(rep(0,20),rep(1,20)) resp = c( rnorm (20,mean=2,sd=1), rnorm (20,mean=2,sd=1)) for( i in 1:10){ resp = c( resp [1:20], resp [21:40]+1) - PowerPoint PPT PresentationTRANSCRIPT
More on the linearmodel
Categorical predictors
men
women
RT ~ Noise + Gender
resp ~ Condition
set.seed(666)
pred = c(rep(0,20),rep(1,20))resp = c(rnorm(20,mean=2,sd=1),
rnorm(20,mean=2,sd=1))
for(i in 1:10){resp = c(resp[1:20],resp[21:40]+1)plot(resp~pred,
xlim=c(-1,2),ylim=c(0,14),xaxt="n",xlab="")
axis(side=1,at=c(0,1),labels=c("A","B"))
text(paste("mean B\nequals:",i,sep="\n"),x=-0.5,y=10,cex=1.5,font=2)
abline(lm(resp~pred))
Sys.sleep(1.25) }
Demo
Deep idea:
A categorical difference between two groups can be
expressed as a line going fromone group to another
Continuous predictor
… units up
1 unit “to the right”
Continuous predictor
… units up
1 unit “to the right”
Categorical predictor
… units up
1 category “to the right”
F M
Output: categorical predictor> summary(lm(RT ~ gender))
Call:lm(formula = RT ~ gender)
Residuals: Min 1Q Median 3Q Max -231.039 -39.649 2.999 44.806 155.646
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 349.203 4.334 80.57 <2e-16 ***genderM 205.885 6.129 33.59 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 61.29 on 398 degrees of freedomMultiple R-squared: 0.7392, Adjusted R-squared: 0.7386 F-statistic: 1128 on 1 and 398 DF, p-value: < 2.2e-16
Output: categorical predictor> summary(lm(RT ~ gender))
Call:lm(formula = RT ~ gender)
Residuals: Min 1Q Median 3Q Max -231.039 -39.649 2.999 44.806 155.646
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 349.203 4.334 80.57 <2e-16 ***genderM 205.885 6.129 33.59 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 61.29 on 398 degrees of freedomMultiple R-squared: 0.7392, Adjusted R-squared: 0.7386 F-statistic: 1128 on 1 and 398 DF, p-value: < 2.2e-16
REFERENCELEVEL
But what happens…
… when I have more than two groups or categories?
> summary(lm(RT ~ gender))
Call:lm(formula = RT ~ gender)
Residuals: Min 1Q Median 3Q Max -231.039 -41.055 3.404 38.428 155.646
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 349.203 4.228 82.59 <2e-16 ***genderM 205.885 5.979 34.43 <2e-16 ***genderI 203.983 5.979 34.11 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 59.79 on 597 degrees of freedomMultiple R-squared: 0.724,Adjusted R-squared: 0.7231 F-statistic: 783.1 on 2 and 597 DF, p-value: < 2.2e-16
Output: three groupsFemales = 349.203 (intercept)
Males = 349.203 + 205.885Infants = 349.203 + 203.983
REFERENCELEVEL
Output: changing reference level> summary(lm(RT ~ gender))
Call:lm(formula = RT ~ gender)
Residuals: Min 1Q Median 3Q Max -231.039 -41.055 3.404 38.428 155.646
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 553.185 4.228 130.835 <2e-16 ***genderF -203.983 5.979 -34.114 <2e-16 ***genderM 1.903 5.979 0.318 0.75 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 59.79 on 597 degrees of freedomMultiple R-squared: 0.724,Adjusted R-squared: 0.7231 F-statistic: 783.1 on 2 and 597 DF, p-value: < 2.2e-16
Infants = 553.185 (intercept)Females = 553.185 – 203.983
Males = 553.185 + 1.903
Notice that nothing has really changed… it’s just a different perspective on the same data
REFERENCELEVEL
In case you need it:
Releveling: In R
relevel(myvector, ref="mynew_reference_level”)
More on the linearmodel
Centering and standardization
Output: weird intercept> summary(lm(familiarity ~ word_frequency))
Call:lm(formula = familiarity ~ word_frequency)
Residuals: Min 1Q Median 3Q Max -4.5298 -1.2306 -0.0087 1.1141 4.6988
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.790e+00 6.232e-01 -4.477 9.37e-06 ***word_frequency 1.487e-04 1.101e-05 13.513 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.699 on 498 degrees of freedomMultiple R-squared: 0.2683, Adjusted R-squared: 0.2668 F-statistic: 182.6 on 1 and 498 DF, p-value: < 2.2e-16
is now in centered
Uncentered> summary(lm(familiarity ~ word_frequency))
Call:lm(formula = familiarity ~ word_frequency)
Residuals: Min 1Q Median 3Q Max -4.5298 -1.2306 -0.0087 1.1141 4.6988
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.790e+00 6.232e-01 -4.477 9.37e-06 ***word_frequency 1.487e-04 1.101e-05 13.51 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.699 on 498 degrees of freedomMultiple R-squared: 0.2683, Adjusted R-squared: 0.2668 F-statistic: 182.6 on 1 and 498 DF, p-value: < 2.2e-16
Centered> summary(lm(familiarity ~ word_frequency.c))
Call:lm(formula = familiarity ~ word_frequency.c)
Residuals: Min 1Q Median 3Q Max -4.5298 -1.2306 -0.0087 1.1141 4.6988
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.568e+00 7.598e-02 73.28 <2e-16 ***word_frequency.c 1.487e-04 1.101e-05 13.51 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.699 on 498 degrees of freedomMultiple R-squared: 0.2683, Adjusted R-squared: 0.2668 F-statistic: 182.6 on 1 and 498 DF, p-value: < 2.2e-16
Centered, not scaled
is now in standard
deviations
Centered and scaled
Centering vs. Standardization
• Centering = subtracting the mean of the data from the data
mydata = mydata - mean(mydata)
• Standardization = subtracting the mean of the data from the data and then dividing by the standard deviation
mydata = (mydata - mean(mydata))/sd(mydata)
Centering vs. Standardization
• Centering = subtracting the mean of the data from the data
mydata = mydata - mean(mydata)
• Standardization = subtracting the mean of the data from the data and then dividing by the standard deviation
mydata = scale(mydata)
Centering vs. Standardization
• Centering = often leads to more interpretable coefficients; doesn’t change metric
mydata = mydata - mean(mydata)
• Standardization = gets rid of the metric (is then in standard units) and then dividing by the standard deviation
mydata = (mydata - mean(mydata))/sd(mydata)
Standardization is also often called z-scoring and sometimes normalization (but you should
not call it that way)
“Standardization” is a linear transformation
… which means it doesn’t really do anything to your
results
• Seconds Milliseconds
• Word Frequency Word Frequency by 1000
• Centering, Standardization
Linear Transformations
None of these change the
“significance”, only the metric of the
coefficients
More on the linearmodel
Interactions
Winter & Bergen (2012)
"Usually (but not always) the interaction, if it is present, will be the most interesting
thing going on."Jack Vevea,UC Merced
Main Effects
InteractionEffects
NearSentFarSent
largepictures
smallpictures
RT (m
s)One main effect
largepictures
smallpictures
RT (m
s)Two main effects
NearSentFarSent
largepictures
smallpictures
RT (m
s)Interaction #1
NearSentFarSent
largepictures
smallpictures
RT (m
s)Interaction #2
NearSentFarSent
largepictures
smallpictures
RT (m
s)Interaction #3
NearSentFarSent
largepictures
smallpictures
RT (m
s)Interaction #4
NearSentFarSent
Visualizing interactionswith continuous variables
Visualizing interactionswith continuous variables
Visualizing interactionswith continuous variables
Visualizing interactionswith continuous variables
Visualizing interactionswith continuous variables
Interpretationof Main Effects
If interaction is significant, the interpretation of main effects is
not straightforward
“The first three rules of statistics”
Michael Starbird
1. Draw a picture!
2. Draw a picture!
3. Draw a picture!
In R: How to include aninteraction
lm(RT ~ PrimeType + VowelType)
lm(RT ~ PrimeType*VowelType)
Main effects only:
Main effects and interaction:
That’s it(for now)