stats 760: lecture 2

51
© Department of Statistics 2013 Slide 1 Stats 760: Lecture 2

Upload: guinevere-mckinney

Post on 31-Dec-2015

25 views

Category:

Documents


1 download

DESCRIPTION

Stats 760: Lecture 2. Linear Models. Agenda. R formulation Matrix formulation Least squares fit Numerical details – QR decomposition R parameterisations Treatment Sum Helmert. R formulation. Regression model y ~ x1 + x2 + x3 Anova model y ~ A+ B (A, B factors) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stats 760: Lecture 2

© Department of Statistics 2013Slide 1

Stats 760: Lecture 2

Page 2: Stats 760: Lecture 2

© Department of Statistics 2013Slide 2

Agenda• R formulation• Matrix formulation• Least squares fit• Numerical details – QR decomposition • R parameterisations

– Treatment– Sum– Helmert

Page 3: Stats 760: Lecture 2

© Department of Statistics 2013Slide 3

R formulation

• Regression model y ~ x1 + x2 + x3• Anova model y ~ A+ B (A, B factors)• Model with both factors and continuous

variables y ~ A*B*x1 + A*B*x2

What do these mean? How do we interpret the output?

Page 4: Stats 760: Lecture 2

© Department of Statistics 2013Slide 4

Regression model

Mean of observation = b0 + b1x1 + b2x2 + b3x3

Estimate b’s by least squares ie minimize

2

33221101

)(iii

n

ii

xxxy

Page 5: Stats 760: Lecture 2

© Department of Statistics 2013Slide 5

Matrix formulation

nkn

k

n xx

xx

y

y

1

1111

1

1

, X y

Arrange data into a matrix and vector

)()( XbyXby T Then minimise

Page 6: Stats 760: Lecture 2

© Department of Statistics 2013Slide 6

Normal equations

• Minimising b’s satisfy

yXXX TT ̂

)ˆ())ˆ(( )ˆ()ˆ(

)()(

bXbXXyXy

XbyXbyTT

T

Proof: Non-negative, zero when b=beta hat

Page 7: Stats 760: Lecture 2

© Department of Statistics 2013Slide 7

Solving the equations

• We could calculate the matrix XTX directly, but this is not very accurate (subject to roundoff errors). For example, when trying to fit polynomials, this method breaks down for polynomials of high degree

• Better to use the “QR decomposition” which avoids calculating XTX

Page 8: Stats 760: Lecture 2

© Department of Statistics 2013Slide 8

Solving the normal equations

• Use “QR decomposition” X=QR• X is n x p and must have “full rank” (no

column a linear combination of other columns)

• Q is n x p “orthogonal” (i.e. QTQ = identity matrix)

• R is p x p “upper triangular” (all elements below the diagonal zero), all diagonal elements positive, so inverse exists

Page 9: Stats 760: Lecture 2

© Department of Statistics 2013Slide 9

Solving using QRXTX = RTQTQR = RTR

XTy = RTQTy

Normal equations reduce to

RTRb = RTQTy

Premultiply by inverse of RT to get Rb = QTy

Triangular system, easy to solve

Page 10: Stats 760: Lecture 2

© Department of Statistics 2013Slide 10

Solving a triangular system

2231321211

2232322

3333

3

2

1

3

2

1

33

2322

131211

/)(

/)(

/

00

0

rbrbrcb

rbrcb

rcb

c

c

c

b

b

b

r

rr

rrr

Page 11: Stats 760: Lecture 2

© Department of Statistics 2013Slide 11

A refinement• We need QTy:• Solution: do QR decomp of [X,y]

• Thus, solve Rb = r

rQrQqrQQrQyQ

qrQryQRX

r

rRqQyX

TTTT

0

0

0

,

0

Page 12: Stats 760: Lecture 2

© Department of Statistics 2013Slide 12

What R has to do

When you run lm, R forms the matrix X from the model formula, then fits the model E(Y)=Xb

Steps:1. Extract X and Y from the data and the model formula

2. Do the QR decomposition

3. Solve the equations Rb = r

4. Solutions are the numbers reported in the summary

Page 13: Stats 760: Lecture 2

© Department of Statistics 2013Slide 13

Forming X

When all variables are continuous, it’s a no-brainer

1. Start with a column of 1’s

2. Add columns corresponding to the independent variables

It’s a bit harder for factors

Page 14: Stats 760: Lecture 2

© Department of Statistics 2013Slide 14

Factors: one way anova

Consider model y ~ a where a is a factor having 3 levels say.

In this case, we1. Start with a column of ones

2. Add a dummy variable for each level of the factor (3 in all), order is order of factor levels

Problem: matrix has 4 columns, but first is sum of last 3, so not linearly independent

Solution: Reparametrize!

Page 15: Stats 760: Lecture 2

© Department of Statistics 2013Slide 15

Reparametrizing

• Let Xa be the last 3 columns (the 3 dummy variables)

• Replace Xa by XaC (ie Xa multiplied by C), where C is a 3 x 2 “contrast matrix” with the properties

1. Columns of XaC are linearly independent

2. Columns of XaC are linearly independent of the column on 1’s

In general, if a has k levels, C will be k x (k-1)

Page 16: Stats 760: Lecture 2

© Department of Statistics 2013Slide 16

The “treatment” parametrization

• Here C is the matrix

C = 0 0

1 0

0 1

(You can see the matrix in the general case by typing contr.treatment(k) in R, where k is the number of levels)

This is the default in R

Page 17: Stats 760: Lecture 2

© Department of Statistics 2013Slide 17

Treatment parametrization (2)

• The model is E[Y] = X , b where X is1 0 0

1 0 01 1 0

1 1 01 0 1

1 0 1

• The effect of the reparametrization is to drop the first column of Xa, leaving the others unchanged.

. . .

. . .

. . .

. . .

Observations at level 1

Observations at level 2

Observations at level 3

Page 18: Stats 760: Lecture 2

© Department of Statistics 2013Slide 18

Treatment parametrization (3)

• Mean response at level 1 is b0

• Mean response at level 2 is b0 + b1

• Mean response at level 3 is b0 + b2

• Thus, b0 is interpreted as the baseline (level 1) mean

• The parameter b1 is interpreted as the offset for level 2 (difference between levels 1 and 2)

• The parameter b2 is interpreted as the offset for level 3 (difference between levels 1 and 3)

. . .

Page 19: Stats 760: Lecture 2

© Department of Statistics 2013Slide 19

The “sum” parametrization

• Here C is the matrixC = 1 0

0 1-1 -1

(You can see the matrix in the general case by typing contr.sum(k) in R, where k is the number of levels)

To get this in R, you need to use the options function

options(contrasts=c("contr.sum", "contr.poly"))

Page 20: Stats 760: Lecture 2

© Department of Statistics 2013Slide 20

sum parametrization (2)

• The model is E[Y] = X , b where X is1 1 0

1 1 01 0 1

1 0 11 -1 -1

1 -1 -1• The effect of this reparametrization is to drop the last column of Xa,

and change the rows corresponding to the last level of a.

. . .

. . .

. . .

. . .

Observations at level 1

Observations at level 2

Observations at level 3

Page 21: Stats 760: Lecture 2

© Department of Statistics 2013Slide 21

Sum parameterization (3)

• Mean response at level 1 is b0 + b1• Mean response at level 2 is b0 + b2• Mean response at level 3 is b0 - b1 - b2• Thus, b0 is interpreted as the average of the 3

means, the “overall mean”• The parameter b1 is interpreted as the offset for

level 1 (difference between level 1 and the overall mean)

• The parameter b2 is interpreted as the offset for level 2 (difference between level 1 and the overall mean)

• The offset for level 3 is - b1 - b2

. . .

Page 22: Stats 760: Lecture 2

© Department of Statistics 2013Slide 22

The “Helmert” parametrization

• Here C is the matrix

C = -1 -1

1 -1

0 2

(You can see the matrix in the general case by typing contr.helmert(k) in R, where k is the number of levels)

Page 23: Stats 760: Lecture 2

© Department of Statistics 2013Slide 23

Helmert parametrization (2)

• The model is E[Y] = X , b where X is1 -1 -1

1 -1 -11 1 -1

1 1 -11 0 2

1 0 2• The effect of this reparametrization is to change all the rows and

columns.

. . .

. . .

. . .

. . .

Observations at level 1

Observations at level 2

Observations at level 3

Page 24: Stats 760: Lecture 2

© Department of Statistics 2013Slide 24

Helmert parametrization (3)

• Mean response at level 1 is b0 - b1 - b2

• Mean response at level 2 is b0 + b1 - b2

• Mean response at level 3 is b0 + 2 b2

• Thus, b0 is interpreted as the average of the 3 means, the “overall mean”

• The parameter b1 is interpreted as half the difference between level 2 mean and level 1 mean

• The parameter b2 is interpreted as the one third of the difference between the level 3 mean and the average of the level 1 and 2 means

. . .

Page 25: Stats 760: Lecture 2

© Department of Statistics 2013Slide 25

Using R to calculate the relationship between b-parameters and means

TT

TT

XXX

XXX

X

1)(

Thus, the matrix (XTX)-1XT gives the coefficients we need to find the b’s from the m’s

Page 26: Stats 760: Lecture 2

© Department of Statistics 2013Slide 26

Example: One way model

• In an experiment to study the effect of carcinogenic substances, six different substances were applied to cell cultures.

• The response variable (ratio) is the ratio of damages to undamaged cells, and the explanatory variable (treatment) is the substance

Page 27: Stats 760: Lecture 2

© Department of Statistics 2013Slide 27

Dataratio treatment0.08 control + 49 other control obs0.08 choralhydrate + 49 other choralhydrate obs0.10 diazapan + 49 other diazapan obs0.10 hydroquinone + 49 other hydroquinine obs0.07 econidazole + 49 other econidazole obs0.17 colchicine + 49 other colchicine obs

Page 28: Stats 760: Lecture 2

© Department of Statistics 2013Slide 28

> cancer.lm=lm(ratio~treatment, data=carcin.df)> summary(cancer.lm)

Coefficients:Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.23660 0.02037 11.616 < 2e-16 ***treatmentchloralhydrate 0.03240 0.02880 1.125 0.26158 treatmentcolchicine 0.21160 0.02880 7.346 2.02e-12 ***treatmentdiazapan 0.04420 0.02880 1.534 0.12599 treatmenteconidazole 0.02820 0.02880 0.979 0.32838 treatmenthydroquinone 0.07540 0.02880 2.618 0.00931 ** ---

Residual standard error: 0.144 on 294 degrees of freedomMultiple R-squared: 0.1903, Adjusted R-squared: 0.1766 F-statistic: 13.82 on 5 and 294 DF, p-value: 3.897e-12

lm output

Page 29: Stats 760: Lecture 2

© Department of Statistics 2013Slide 29

Relationship between means and betas> levels(carcin.df$treatment)

[1] "control" "chloralhydrate" "colchicine" "diazapan" "econidazole" "hydroquinone"

cancer.lm=lm(ratio~treatment, data=carcin.df)X<-model.matrix(cancer.lm)[c(1,51,101,151,201,251),]coef.mat<-solve(t(X)%*%X)%*%t(X)round(coef.mat)

1 51 101 151 201 251(Intercept) 1 0 0 0 0 0treatmentchloralhydrate -1 1 0 0 0 0treatmentcolchicine -1 0 0 0 0 1treatmentdiazapan -1 0 1 0 0 0treatmenteconidazole -1 0 0 0 1 0treatmenthydroquinone -1 0 0 1 0 0

carcin.df[c(1,51,101,151,201,251),] ratio treatment1 0.08 colchicine51 0.08 control101 0.10 diazapan151 0.10 hydroquinone201 0.07 econidazole251 0.17 chloralhydrate

Page 30: Stats 760: Lecture 2

© Department of Statistics 2013Slide 30

Two factors: model y ~ a + b

To form X:

1. Start with column of 1’s

2. Add XaCa

3. Add XbCb

Page 31: Stats 760: Lecture 2

© Department of Statistics 2013Slide 31

Two factors: model y ~ a * b

To form X:

1. Start with column of 1’s

2. Add XaCa

3. Add XbCb

4. Add XaCa: XbCb

(Every column of XaCa multiplied elementwise with every column of XbCb)

Page 32: Stats 760: Lecture 2

© Department of Statistics 2013Slide 32

Two factors: example

Experiment to study weight gain in rats

– Response is weight gain over a fixed time period

– This is modelled as a function of diet (Beef, Cereal, Pork) and amount of feed (High, Low)

– See coursebook Section 4.4

Page 33: Stats 760: Lecture 2

© Department of Statistics 2013Slide 33

Data> diets.df gain source level1 73 Beef High2 98 Cereal High3 94 Pork High4 90 Beef Low5 107 Cereal Low6 49 Pork Low7 102 Beef High8 74 Cereal High9 79 Pork High10 76 Beef Low. . . 60 observations in all

Page 34: Stats 760: Lecture 2

© Department of Statistics 2013Slide 34

Two factors: the model

• If the (continuous) response depends on two categorical explanatory variables, then we assume that the response is normally distributed with a mean depending on the combination of factor levels: if the factors are A and B, the mean at the i th level of A and the j th level of B is mij

• Other standard assumptions (equal variance, normality, independence) apply

Page 35: Stats 760: Lecture 2

© Department of Statistics 2013Slide 35

Diagramatically…

Source = Beef

Source = Cereal

Source = Pork

Level=High

m11 m12 m13

Level=Low

m21 m22 m23

Page 36: Stats 760: Lecture 2

© Department of Statistics 2013Slide 36

Decomposition of the means

• We usually want to split each “cell mean” up into 4 terms:– A term reflecting the overall baseline level of

the response– A term reflecting the effect of factor A (row

effect)– A term reflecting the effect of factor B (column

effect)– A term reflecting how A and B interact.

Page 37: Stats 760: Lecture 2

© Department of Statistics 2013Slide 37

Mathematically…

ijijij Overall Baseline: m11 (mean when both factors are at their baseline levels)

Effect of i th level of factor A (row effect): mi1 - m11 (The i th level of A, at the baseline of B, expressed as a deviation from the overall baseline)

Effect of j th level of factor B (column effect) : m1j - m11 (The j th level of B, at the baseline of A, expressed as a deviation from the overall baseline)Interaction: what’s left over (see next slide)

Page 38: Stats 760: Lecture 2

© Department of Statistics 2013Slide 38

Interactions• Each cell (except the first row and column) has

an interaction:Interaction = cell mean - baseline - row effect - column

effect

• If the interactions are all zero, then the effect of changing levels of A is the same for all levels of B – In mathematical terms, mij – mi’j doesn’t depend

on j• Equivalently, effect of changing levels of B is the

same for all levels of A• If interactions are zero, relationship between

factors and response is simple

Page 39: Stats 760: Lecture 2

© Department of Statistics 2013Slide 39

Fit model> rats.lm<-lm(gain~source+level + source:level)> summary(rats.lm)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.000e+02 4.632e+00 21.589 < 2e-16 ***sourceCereal -1.410e+01 6.551e+00 -2.152 0.03585 * sourcePork -5.000e-01 6.551e+00 -0.076 0.93944 levelLow -2.080e+01 6.551e+00 -3.175 0.00247 ** sourceCereal:levelLow 1.880e+01 9.264e+00 2.029 0.04736 * sourcePork:levelLow -3.052e-14 9.264e+00 -3.29e-15 1.00000 ---Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 14.65 on 54 degrees of freedomMultiple R-Squared: 0.2848, Adjusted R-squared: 0.2185 F-statistic: 4.3 on 5 and 54 DF, p-value: 0.002299

Page 40: Stats 760: Lecture 2

© Department of Statistics 2013Slide 40

Fitting as a regression model

Note that when using the treatment contrasts, this is equivalent to fitting a regression with dummy variables R2, C2, C3

R2 = 1 if obs is in row 2, zero otherwise

C2 = 1 if obs is in column 2, zero otherwise

C3 = 1 if obs is in column 3, zero otherwise

The regression is

Y ~ R2 + C2 + C3 + I(R2*C2) + I(R2*C3)

Page 41: Stats 760: Lecture 2

© Department of Statistics 2013Slide 41

Re-label cell means, in data order

Source = Beef

Source = Cereal

Source = Pork

Level=High

m1 m2 m3

Level=Low

m4 m5 m6

Page 42: Stats 760: Lecture 2

© Department of Statistics 2013Slide 42

Using R to interpret parameters

>rats.lm<-lm(gain~source*level, data=diets.df)>X<-model.matrix(rats.lm)[1:6,]>coef.mat<-solve(t(X)%*%X)%*%t(X)>round(coef.mat) 1 2 3 4 5 6(Intercept) 1 0 0 0 0 0sourceCereal -1 1 0 0 0 0sourcePork -1 0 1 0 0 0levelLow -1 0 0 1 0 0sourceCereal:levelLow 1 -1 0 -1 1 0sourcePork:levelLow 1 0 -1 -1 0 1

>diets.df[1:6,] gain source level1 73 Beef High2 98 Cereal High3 94 Pork High4 90 Beef Low5 107 Cereal Low6 49 Pork Low

Cell Means

betas

Page 43: Stats 760: Lecture 2

© Department of Statistics 2013Slide 43

X matrix: details (first six rows)

(Intercept) source source level sourceCereal: sourcePork: Cereal Pork Low levelLow levelLow

1 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 1 0 1 1 0 1 0 1 1 0 1

Col of 1’s XaCa XbCb XaCa:XbCb

Page 44: Stats 760: Lecture 2

© Department of Statistics 2013Slide 44

Two factors: one continuous, one a factor• Lathe example (330 Lecture 17)• Consider an experiment to measure the

rate of metal removal in a machining process on a lathe.

• The rate depends on the speed setting of the lathe (fast, medium or slow, a categorical measurement) and the hardness of the material being machined (a continuous measurement)

Page 45: Stats 760: Lecture 2

© Department of Statistics 2013Slide 45

04/19/2023 330 lecture 17 45

Data hardness setting rate1 175 fast 1382 132 fast 1023 124 fast 934 141 fast 112 5 130 fast 1006 165 medium 1227 140 medium 1048 120 medium 759 125 medium 8410 133 medium 9511 120 slow 6812 140 slow 9013 150 slow 9814 125 slow 7715 136 slow 88

Page 46: Stats 760: Lecture 2

© Department of Statistics 2013Slide 46

04/19/2023 330 lecture 17 46

120 130 140 150 160 170

7080

9010

011

012

013

014

0

Plot of rate versus hardness for different lathe speeds

hardness of metal

rate

of m

etal

rem

oval

s

s

s

s

s

m

m

m

m

m

f

f

f

f

f

smf

slowmediumfast

Page 47: Stats 760: Lecture 2

© Department of Statistics 2013Slide 47

04/19/2023 330 lecture 17 47

Non-parallel lines

• Model is ( one regression per setting)

hardnessrate

hardnessrate

hardnessrate

SS

MM

FF

Page 48: Stats 760: Lecture 2

© Department of Statistics 2013Slide 48

04/19/2023 330 lecture 17 48

Dummy variables for both parameters

We can combine these 3 equations into one by using “dummy variables”. Define med = 1 if setting = medium, 0 otherwise

slow = 1 if setting =slow, 0 otherwise

h.med = hardness x med

h.slow = hardness x slow

Then we can write the model as

slowhmedh

hardnessslowmedrate

SM

SM

..

Page 49: Stats 760: Lecture 2

© Department of Statistics 2013Slide 49

04/19/2023 330 lecture 17 49

Fitting in RThe model formula for this non-parallel model is

rate ~ setting + hardness + setting:hardness

or, even more compactly, as rate ~ setting * hardness

> summary(lm(rate ~ setting*hardness))Estimate Std. Error t value Pr(>|t|) (Intercept) -12.18162 10.32795 -1.179 0.2684 settingmedium -30.15725 15.49375 -1.946 0.0834 . settingslow -33.60120 19.58902 -1.715 0.1204 hardness 0.86312 0.07295 11.831 8.69e-07 ***settingmedium:hardness 0.14961 0.11125 1.345 0.2116

settingslow:hardness 0.10546 0.14356 0.735 0.4813

Page 50: Stats 760: Lecture 2

© Department of Statistics 2013Slide 50

X-matrix

(Intercept) setting setting hardness settingmedium: settingslow: medium slow hardness hardness

1 1 0 0 175 0 01 1 0 0 132 0 01 1 0 0 124 0 01 1 0 0 141 0 05 1 0 0 130 0 0

6 1 1 0 165 165 07 1 1 0 140 140 08 1 1 0 120 120 09 1 1 0 125 125 010 1 1 0 133 133 0

11 1 0 1 120 0 12012 1 0 1 140 0 14013 1 0 1 150 0 15014 1 0 1 125 0 12515 1 0 1 136 0 136

FAST

MEDIUM

SLOW

Page 51: Stats 760: Lecture 2

© Department of Statistics 2013Slide 51

Building up the X-matrix

1. [1] y~1

2. [1, XaCa] y~setting

3. [1, XaCa, h] y~setting + hardness

4. [1, XaCa, h, XaCa : h]

y~setting + hardness + setting:hardness

Or equivalently, as [1, XaCa]:h