linear regression with r 1

Linear Regressionwith

2012-12-07 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO KNOW

1: Prepare data/specify model/read results

Group Website is at:

http://rpubs.com/kaz_yos/useR_at_HSPH

n Introduction

n Reading Data into R (1)

n Reading Data into R (2)

n Descriptive, continuous

n Descriptive, categorical

n Deducer

n Graphics

n Groupwise, continuous

Previously in this group

n Linear regression

Ingredients

n Data preparation

n Model formula

n within()

n factor(), relevel()

n lm()

n formula = Y ~ X1 + X2

n summary()

n anova(), car::Anova()

Statistics Programming

Open R Studio

Create a new scriptand save it.

http://www.umass.edu/statdata/statdata/data/

lowbwt.dat

http://www.umass.edu/statdata/statdata/data/lowbwt.txthttp://www.umass.edu/statdata/statdata/data/lowbwt.dat

We will use lowbwt dataset used in BIO213

lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4)

Load dataset from web

header = TRUEto pick up

variable names

skip 4 rows

lbw[c(10,39), "BWT"] <- c(2655, 3035)

“Fix” dataset

Replace data pointsto make the dataset identical

to BIO213 dataset10th,39th

BWT column

Lower case variable names

names(lbw) <- tolower(names(lbw))

Convert variable names to lower case

Put them back into variable names

See overview

library(gpairs)gpairs(lbw)

RecodingChanging and creating variables

dataset <- within(dataset, { _variable manipulations_

Take datasetName of newly created dataset

(here replacing original)

Perform variable manipulationYou can specify by variable name

only. No need for dataset$var_name

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})1 to White2 to Black3 to Other

Categorize race and label:

Numeric to categorical: element by element

1st will be reference

factor() to create categorical variable

Take race variable

Order levels 1, 2, 3Make 1 reference level

Label levels 1, 2, 3 as White, Black, Other

Create new variable named

race.cat

Explained more in depth

-Inf Inf0 1 2 3 4 5 6] ] ](None Normal Many

Numeric to categorical:range to element

1st will be reference

How breaks work

Reset reference level

Change reference level of ftv.cat variablefrom None to Normal

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))

Numeric to Boolean to Category

ptl < 1 to FALSE, then to “0”ptl >= 1 to TRUE, then to “1+”

TRUE, FALSE vector created

here levels labels

## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes"))

## Alternative to abovelbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { var <- factor(var, levels = 0:1, labels = c("No","Yes")) })

Binary 0,1 to No,Yes

One-by-one method

Loop method

model formula

outcome ~ predictor1 + predictor2 + predictor3

formula

SAS equivalent: model outcome = predictor1 predictor2 predictor3;

age ~ zyg

In the case of t-test

continuous variable to be compared

grouping variable to separate groups

Variable to be explained

Variable used to explain

Y ~ X1 + X2

linear sum

n . All variables except for the outcome

n + X2 Add X2 term

n - 1 Remove intercept

n X1:X2 Interaction term between X1 and X2

n X1*X2 Main effects and interaction term

Y ~ X1 + X2 + X1:X2

Interaction term

Main effects Interaction

Y ~ X1 * X2

Interaction term

Main effects & interaction

Y ~ X1 + I(X2 * X3)

On-the-fly variable manipulation

New variable (X2 times X3) created on-the-fly and used

Inhibit formula interpretation. For math

manipulation

lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)

Fit a model

lm.full

See model object

Call: command repeated

Coefficient for each variable

summary(lm.full)

See summary

Call: command repeated

Model F-test

Residual distribution

Dummy variables created

R^2 and adjusted R^2

Coef/SE = t

ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level)

ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)

race.catBlack Black people compared to White people (reference level)

race.catOther Other people compared to White people (reference level)

confint(fit.lm)

Confidence intervals

Lower boundary

Upper boundary

Confidence intervals

anova(lm.full)

ANOVA table (type I)

degree of freedom

Sequential SS

Mean SS = SS/DF

F = Mean SS / Mean SS of residual

ANOVA table (type I)

3 smoke

1st gets all in type I

2nd gets all but overlap

between 1 in type Ilast remaining

only in type I

Type I = Sequential SS

library(car)Anova(lm.full, type = 3)

ANOVA table (type III)

degree of freedom

Marginal SS

ANOVA table (type III)

Multi-category variables tested as

3 smoke

1st gets margin

only in type III

margin

last gets margin

only in type III

Type III = Marginal SS

Type I Type III

Comparison

library(effects)plot(allEffects(lm.full), ylim = c(2000,4000))

Effect plot

Fix Y-axis values for all

Effect of a variable with other covariate

set at average

Interaction

lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw)

Continuous * Continuous

Categorical * CategoricalContinuous * Categorical

This model is for demonstration purpose.

Anova(lm.full.int, type = 3)

degree of freedom

Marginal SS

Interactionterms

plot(effect("age:lwt", lm.full.int))

lwt level

plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE)C

alplot(effect(c("race.cat*preterm"), lm.full.int),

x.var = "preterm", z.var = "race.cat", multiline = TRUE)

linear regression with r 1

continuousn reading

n groupwise

continuousn descriptive

lowbwt dataset

nn descriptive

bio213 lowbwt

load dataset

linear regressionwith

Education

matlab tutorials - mit...16.62x matlab tutorials linear...

regression in r part i simple linear regression

chapter 8 linear regression. objectives & learning goals...

business analytics multivariate linear regression …...

linear regression - wharton finance - finance...

review of simple linear regression simple linear regression...

relative importance for linear regression in r: the

simple linear regression in r

introduction to r for data science :: session 6 [linear...

chapter 13: simple linear regression. 2 simple regression...

linear regression in r

1 curve-fitting interpolation. 2 curve fitting regression...

1 curve-fitting polynomial interpolation. 2 curve fitting...

r simple linear regression 2018 - umass simple linear...

linear regression with r 2

multiple linear regression & general linear model in...

getting started in linear regression using r - princeton...

regression and analysis variance linear models in r

lesson 2: linear regression - wordpress.com › 2016 › 02...

hrug - linear regression with r