more on the linear model

62
More on the linear model Categorical predictors

Upload: wynn

Post on 23-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

More on the linear model. Categorical predictors. men. RT ~ Noise + Gender. women. resp ~ Condition. Demo. set.seed (666) pred = c(rep(0,20),rep(1,20)) resp = c( rnorm (20,mean=2,sd=1), rnorm (20,mean=2,sd=1)) for( i in 1:10){ resp = c( resp [1:20], resp [21:40]+1) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: More on the linear model

More on the linearmodel

Categorical predictors

Page 2: More on the linear model

men

women

RT ~ Noise + Gender

Page 3: More on the linear model

resp ~ Condition

Page 4: More on the linear model
Page 5: More on the linear model
Page 6: More on the linear model

set.seed(666)

pred = c(rep(0,20),rep(1,20))resp = c(rnorm(20,mean=2,sd=1),

rnorm(20,mean=2,sd=1))

for(i in 1:10){resp = c(resp[1:20],resp[21:40]+1)plot(resp~pred,

xlim=c(-1,2),ylim=c(0,14),xaxt="n",xlab="")

axis(side=1,at=c(0,1),labels=c("A","B"))

text(paste("mean B\nequals:",i,sep="\n"),x=-0.5,y=10,cex=1.5,font=2)

abline(lm(resp~pred))

Sys.sleep(1.25) }

Demo

Page 7: More on the linear model

Deep idea:

A categorical difference between two groups can be

expressed as a line going fromone group to another

Page 8: More on the linear model

Continuous predictor

… units up

1 unit “to the right”

Page 9: More on the linear model

Continuous predictor

… units up

1 unit “to the right”

Page 10: More on the linear model

Categorical predictor

… units up

1 category “to the right”

F M

Page 11: More on the linear model

Output: categorical predictor> summary(lm(RT ~ gender))

Call:lm(formula = RT ~ gender)

Residuals: Min 1Q Median 3Q Max -231.039 -39.649 2.999 44.806 155.646

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 349.203 4.334 80.57 <2e-16 ***genderM 205.885 6.129 33.59 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 61.29 on 398 degrees of freedomMultiple R-squared: 0.7392, Adjusted R-squared: 0.7386 F-statistic: 1128 on 1 and 398 DF, p-value: < 2.2e-16

Page 12: More on the linear model

Output: categorical predictor> summary(lm(RT ~ gender))

Call:lm(formula = RT ~ gender)

Residuals: Min 1Q Median 3Q Max -231.039 -39.649 2.999 44.806 155.646

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 349.203 4.334 80.57 <2e-16 ***genderM 205.885 6.129 33.59 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 61.29 on 398 degrees of freedomMultiple R-squared: 0.7392, Adjusted R-squared: 0.7386 F-statistic: 1128 on 1 and 398 DF, p-value: < 2.2e-16

Page 13: More on the linear model

REFERENCELEVEL

Page 14: More on the linear model

But what happens…

… when I have more than two groups or categories?

Page 15: More on the linear model
Page 16: More on the linear model

> summary(lm(RT ~ gender))

Call:lm(formula = RT ~ gender)

Residuals: Min 1Q Median 3Q Max -231.039 -41.055 3.404 38.428 155.646

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 349.203 4.228 82.59 <2e-16 ***genderM 205.885 5.979 34.43 <2e-16 ***genderI 203.983 5.979 34.11 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 59.79 on 597 degrees of freedomMultiple R-squared: 0.724,Adjusted R-squared: 0.7231 F-statistic: 783.1 on 2 and 597 DF, p-value: < 2.2e-16

Output: three groupsFemales = 349.203 (intercept)

Males = 349.203 + 205.885Infants = 349.203 + 203.983

Page 17: More on the linear model

REFERENCELEVEL

Page 18: More on the linear model

Output: changing reference level> summary(lm(RT ~ gender))

Call:lm(formula = RT ~ gender)

Residuals: Min 1Q Median 3Q Max -231.039 -41.055 3.404 38.428 155.646

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 553.185 4.228 130.835 <2e-16 ***genderF -203.983 5.979 -34.114 <2e-16 ***genderM 1.903 5.979 0.318 0.75 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 59.79 on 597 degrees of freedomMultiple R-squared: 0.724,Adjusted R-squared: 0.7231 F-statistic: 783.1 on 2 and 597 DF, p-value: < 2.2e-16

Infants = 553.185 (intercept)Females = 553.185 – 203.983

Males = 553.185 + 1.903

Notice that nothing has really changed… it’s just a different perspective on the same data

Page 19: More on the linear model

REFERENCELEVEL

Page 20: More on the linear model

In case you need it:

Releveling: In R

relevel(myvector, ref="mynew_reference_level”)

Page 21: More on the linear model

More on the linearmodel

Centering and standardization

Page 22: More on the linear model
Page 23: More on the linear model

Output: weird intercept> summary(lm(familiarity ~ word_frequency))

Call:lm(formula = familiarity ~ word_frequency)

Residuals: Min 1Q Median 3Q Max -4.5298 -1.2306 -0.0087 1.1141 4.6988

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.790e+00 6.232e-01 -4.477 9.37e-06 ***word_frequency 1.487e-04 1.101e-05 13.513 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.699 on 498 degrees of freedomMultiple R-squared: 0.2683, Adjusted R-squared: 0.2668 F-statistic: 182.6 on 1 and 498 DF, p-value: < 2.2e-16

Page 24: More on the linear model
Page 25: More on the linear model
Page 26: More on the linear model
Page 27: More on the linear model

is now in centered

Page 28: More on the linear model
Page 29: More on the linear model
Page 30: More on the linear model

Uncentered> summary(lm(familiarity ~ word_frequency))

Call:lm(formula = familiarity ~ word_frequency)

Residuals: Min 1Q Median 3Q Max -4.5298 -1.2306 -0.0087 1.1141 4.6988

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.790e+00 6.232e-01 -4.477 9.37e-06 ***word_frequency 1.487e-04 1.101e-05 13.51 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.699 on 498 degrees of freedomMultiple R-squared: 0.2683, Adjusted R-squared: 0.2668 F-statistic: 182.6 on 1 and 498 DF, p-value: < 2.2e-16

Page 31: More on the linear model

Centered> summary(lm(familiarity ~ word_frequency.c))

Call:lm(formula = familiarity ~ word_frequency.c)

Residuals: Min 1Q Median 3Q Max -4.5298 -1.2306 -0.0087 1.1141 4.6988

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.568e+00 7.598e-02 73.28 <2e-16 ***word_frequency.c 1.487e-04 1.101e-05 13.51 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.699 on 498 degrees of freedomMultiple R-squared: 0.2683, Adjusted R-squared: 0.2668 F-statistic: 182.6 on 1 and 498 DF, p-value: < 2.2e-16

Page 32: More on the linear model

Centered, not scaled

Page 33: More on the linear model

is now in standard

deviations

Centered and scaled

Page 34: More on the linear model

Centering vs. Standardization

• Centering = subtracting the mean of the data from the data

mydata = mydata - mean(mydata)

• Standardization = subtracting the mean of the data from the data and then dividing by the standard deviation

mydata = (mydata - mean(mydata))/sd(mydata)

Page 35: More on the linear model

Centering vs. Standardization

• Centering = subtracting the mean of the data from the data

mydata = mydata - mean(mydata)

• Standardization = subtracting the mean of the data from the data and then dividing by the standard deviation

mydata = scale(mydata)

Page 36: More on the linear model

Centering vs. Standardization

• Centering = often leads to more interpretable coefficients; doesn’t change metric

mydata = mydata - mean(mydata)

• Standardization = gets rid of the metric (is then in standard units) and then dividing by the standard deviation

mydata = (mydata - mean(mydata))/sd(mydata)

Standardization is also often called z-scoring and sometimes normalization (but you should

not call it that way)

Page 37: More on the linear model

“Standardization” is a linear transformation

… which means it doesn’t really do anything to your

results

Page 38: More on the linear model

• Seconds Milliseconds

• Word Frequency Word Frequency by 1000

• Centering, Standardization

Linear Transformations

None of these change the

“significance”, only the metric of the

coefficients

Page 39: More on the linear model

More on the linearmodel

Interactions

Page 40: More on the linear model

Winter & Bergen (2012)

Page 41: More on the linear model

"Usually (but not always) the interaction, if it is present, will be the most interesting

thing going on."Jack Vevea,UC Merced

Page 42: More on the linear model

Main Effects

InteractionEffects

Page 43: More on the linear model

NearSentFarSent

largepictures

smallpictures

RT (m

s)One main effect

Page 44: More on the linear model

largepictures

smallpictures

RT (m

s)Two main effects

NearSentFarSent

Page 45: More on the linear model

largepictures

smallpictures

RT (m

s)Interaction #1

NearSentFarSent

Page 46: More on the linear model

largepictures

smallpictures

RT (m

s)Interaction #2

NearSentFarSent

Page 47: More on the linear model

largepictures

smallpictures

RT (m

s)Interaction #3

NearSentFarSent

Page 48: More on the linear model

largepictures

smallpictures

RT (m

s)Interaction #4

NearSentFarSent

Page 49: More on the linear model

Visualizing interactionswith continuous variables

Page 50: More on the linear model

Visualizing interactionswith continuous variables

Page 51: More on the linear model

Visualizing interactionswith continuous variables

Page 52: More on the linear model

Visualizing interactionswith continuous variables

Page 53: More on the linear model

Visualizing interactionswith continuous variables

Page 54: More on the linear model

Interpretationof Main Effects

If interaction is significant, the interpretation of main effects is

not straightforward

Page 55: More on the linear model
Page 56: More on the linear model

“The first three rules of statistics”

Michael Starbird

1. Draw a picture!

2. Draw a picture!

3. Draw a picture!

Page 57: More on the linear model

In R: How to include aninteraction

lm(RT ~ PrimeType + VowelType)

lm(RT ~ PrimeType*VowelType)

Main effects only:

Main effects and interaction:

Page 58: More on the linear model
Page 59: More on the linear model
Page 60: More on the linear model
Page 61: More on the linear model
Page 62: More on the linear model

That’s it(for now)