chapter 1: linear regression with one predictor variable

17
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Β· Functional relation between two variables: = () Value of X β‡’ Value of Y Example: Β°F = 32Β° + (9/5)Β°C is a deterministic relationship 1 value of X β‡’ 1 unique value of Y Β· Statistical relation between two variables: 1 value of X β‡’ a distribution of values of Y Y = Dependent / Response / Outcome Variable X = Independent / Explanatory / Predictor Variable

Upload: others

Post on 05-May-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 1

Chapter 1: Linear Regression with One Predictor Variable

also known as: Simple Linear Regression

Bivariate Linear Regression

Introduction:

Β· Functional relation between two variables:

π‘Œ = 𝑓(𝑋)

Value of X β‡’ Value of Y

Example: Β°F = 32Β° + (9/5)Β°C

is a deterministic relationship

1 value of X β‡’ 1 unique value of Y

Β· Statistical relation between two variables:

1 value of X β‡’ a distribution of values of Y

Y = Dependent / Response / Outcome Variable

X = Independent / Explanatory / Predictor Variable

Page 2: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 2

Linear Equation: General equation for a straight line

π‘Œ = 𝑏0 + 𝑏1𝑋

𝑏0: Intercept = value of Y when X=0

𝑏1: Slope = change in Y per unit change in X

𝑏1 =π‘β„Žπ‘Žπ‘›π‘”π‘’ 𝑖𝑛 π‘Œβˆ’π‘£π‘Žπ‘™π‘’π‘’

π‘β„Žπ‘Žπ‘›π‘”π‘’ 𝑖𝑛 π‘‹βˆ’π‘£π‘Žπ‘™π‘’π‘’="π‘Ÿπ‘–π‘ π‘’"

"π‘Ÿπ‘’π‘›"

What if X increases by 1 unit?

π‘Œ = 𝑏0 + 𝑏1(𝑋 + 1) = {𝑏0 + 𝑏1𝑋} + 𝑏1

Y increases by 𝑏1 units

{ Other than linear: e.g. curvilinear π‘Œ = 𝑏0 + 𝑏1𝑋 + 𝑏2𝑋2 }

Regression of Y on X

Β· Observe data points {(𝑋1, π‘Œ1), . . . , (𝑋𝑛, π‘Œπ‘›)}

Β· At each point 𝑋𝑖 there is a distribution of π‘Œπ‘–β€™s

Page 3: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 3

Example: Y = head circumference (cm), X = gestational age (wks)

in a sample of 100 low birth weight infants

Qs: Does average head circumference change with gestational age?

What is the form of the relationship? (linear? curvilinear?)

How to estimate the relationship, given the data?

How to make predictions for new observations?

15

20

25

30

35

40

20 22 24 26 28 30 32 34 36

Y =

He

ad C

ircu

mfe

ren

ce (

cm)

X = Gestational Age (weeks)

Page 4: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 4

Descriptive Data:

Β· Scatterplot

Β· Correlation Coefficients

Some examples:

Page 5: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 5

Population Correlation Coefficient:

Random Variables X and Y with parameters πœ‡π‘‹, πœ‡π‘Œ, πœŽπ‘‹2, πœŽπ‘Œ

2

𝜌 =πΆπ‘œπ‘£(𝑋,π‘Œ)

πœŽπ‘‹πœŽπ‘Œ=𝐸[(π‘‹βˆ’πœ‡π‘‹)(π‘Œβˆ’πœ‡π‘Œ)]

πœŽπ‘‹πœŽπ‘Œ , βˆ’1 ≀ 𝜌 ≀ +1

𝜌 measures the direction and strength of linear association

between X and Y

Maximum likelihood estimator of 𝜌 is

the Pearson Correlation Coefficient:

π‘Ÿ =βˆ‘(π‘‹π‘–βˆ’π‘‹)(π‘Œπ‘–βˆ’π‘Œ)

βˆšβˆ‘(π‘‹π‘–βˆ’π‘‹)2βˆ‘(π‘Œπ‘–βˆ’π‘Œ)

2=βˆ‘(π‘‹π‘–βˆ’π‘‹)(π‘Œπ‘–βˆ’π‘Œ)

(π‘›βˆ’1)π‘ π‘‹π‘ π‘Œ

Inference on 𝜌:

If X and Y are both Normal,

𝐻0: 𝜌 = 0 vs. π»π‘Ž: 𝜌 β‰  0

𝑇 =π‘Ÿβˆšπ‘›βˆ’2

√1βˆ’π‘Ÿ2 ~ 𝑑(π‘›βˆ’2) under 𝐻0: 𝜌 = 0

critical value = ±𝑑(π‘›βˆ’2,𝛼 2⁄ )

Page 6: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 6

Spearman Rank Correlation Coefficient:

For X or Y non-Normal

𝑅𝑋𝑖= rank of 𝑋𝑖

π‘…π‘Œπ‘–= rank of π‘Œπ‘–

𝑅𝑋 = π‘…π‘Œ =𝑛+1

2 means of ranks 𝑅𝑋𝑖 or π‘…π‘Œπ‘–

Spearman rank correlation coefficient is

π‘Ÿπ‘  =βˆ‘(π‘…π‘‹π‘–βˆ’π‘…π‘‹)(π‘…π‘Œπ‘–βˆ’π‘…π‘Œ)

βˆšβˆ‘(π‘…π‘‹π‘–βˆ’π‘…π‘‹)2βˆ‘(π‘…π‘Œπ‘–βˆ’π‘…π‘Œ)

2 , βˆ’1 ≀ π‘Ÿπ‘  ≀ +1

𝐻0: There is no association between X and Y

π»π‘Ž: There is association between X and Y

𝑇 =π‘Ÿπ‘ βˆšπ‘›βˆ’2

√1βˆ’π‘Ÿπ‘ 2 ~ 𝑑(π‘›βˆ’2) under 𝐻0

critical value = ±𝑑(π‘›βˆ’2,𝛼 2⁄ )

Page 7: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 7

The Simple Linear Regression Model

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + πœ€π‘– , i = 1, . . . ,n observations

π‘Œπ‘– value of response for ith observation

𝑋𝑖 value of predictor for ith observation

Population parameters (unknown):

𝛽0 Population intercept

𝛽1 Population regression coefficient

πœ€π‘– is ith random error term

Mean: 𝐸(πœ€π‘–) = 0

Variance: π‘‰π‘Žπ‘Ÿ(πœ€π‘–) = 𝜎2

Independence: πœ€π‘– and πœ€π‘— are uncorrelated for 𝑖 β‰  𝑗

Normality: πœ€π‘–~𝑁(0, 𝜎2) 𝑖. 𝑖. 𝑑. for all i

β‡’ π‘Œπ‘– = 𝛽0 + 𝛽1π‘‹π‘–βŸ πΆπ‘œπ‘›π‘ π‘‘π‘Žπ‘›π‘‘

+ πœ€π‘–βŸπ‘…π‘Žπ‘›π‘‘π‘œπ‘š,𝑖.𝑖.𝑑.𝑁(0,𝜎2)

β‡’ 𝐸(π‘Œπ‘–) = 𝐸(𝛽0 + 𝛽1𝑋𝑖 + πœ€π‘–) = 𝛽0 + 𝛽1𝑋𝑖

π‘‰π‘Žπ‘Ÿ(π‘Œπ‘–) = π‘‰π‘Žπ‘Ÿ(𝛽0 + 𝛽1𝑋𝑖 + πœ€π‘–) = π‘‰π‘Žπ‘Ÿ(πœ€π‘–) = 𝜎2

β‡’ π‘Œπ‘– ~ 𝑁(πœ‡π‘Œ, 𝜎2) 𝑖. 𝑖. 𝑑.

where πœ‡π‘Œ = 𝛽0 + 𝛽1𝑋𝑖

Page 8: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 8

How to obtain οΏ½Μ‚οΏ½0 and οΏ½Μ‚οΏ½1, estimates for 𝛽0 and 𝛽1?

Least Squares Estimators (LSE):

LSEs minimize the sum of squared deviations of π‘Œπ‘– from 𝐸(π‘Œπ‘–)

Least Squares Criterion:

𝑄 = βˆ‘ [π‘Œπ‘– βˆ’ 𝐸(π‘Œπ‘–)]2𝑛

𝑖=1 = βˆ‘ [π‘Œπ‘– βˆ’ (𝛽0 + 𝛽1𝑋𝑖)]2𝑛

𝑖=1

Minimize Q: set first derivatives w.r.t. each parameter = 0

First derivatives are:

πœ•π‘„

πœ•π›½0= βˆ’2βˆ‘(π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖) (1)

πœ•π‘„

πœ•π›½1= βˆ’2βˆ‘π‘‹π‘–(π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖) (2)

Normal Equations: set (1)=0 and (2)=0; call solutions οΏ½Μ‚οΏ½0 and οΏ½Μ‚οΏ½1

βˆ’2βˆ‘(π‘Œπ‘– βˆ’ οΏ½Μ‚οΏ½0 βˆ’ οΏ½Μ‚οΏ½1𝑋𝑖) = 0

βˆ’2βˆ‘π‘‹π‘–(π‘Œπ‘– βˆ’ οΏ½Μ‚οΏ½0 βˆ’ οΏ½Μ‚οΏ½1𝑋𝑖) = 0

β‡’

βˆ‘π‘Œπ‘– = 𝑛�̂�0 + οΏ½Μ‚οΏ½1βˆ‘π‘‹π‘–

βˆ‘π‘‹π‘–π‘Œπ‘– = οΏ½Μ‚οΏ½0βˆ‘π‘‹π‘– + οΏ½Μ‚οΏ½1βˆ‘π‘‹π‘–2

Page 9: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 9

Solution to Normal Equations:

Least Squares Estimators (LSE):

οΏ½Μ‚οΏ½πŸ =βˆ‘(π‘Ώπ’Šβˆ’π‘Ώ)(π’€π’Šβˆ’π’€)

βˆ‘(π‘Ώπ’Šβˆ’π‘Ώ)𝟐

οΏ½Μ‚οΏ½πŸŽ = 𝒀 βˆ’ οΏ½Μ‚οΏ½πŸπ‘Ώ

Properties of LSE:

Unbiased estimators (accuracy)

𝐸(οΏ½Μ‚οΏ½0) = 𝛽0 , 𝐸(οΏ½Μ‚οΏ½1) = 𝛽1

Minimum variance (precision)

Robust against Normality assumption

Note:

functions are called β€œestimators”

calculated values from data are called β€œestimates”

Interpretation:

Intercept (οΏ½Μ‚οΏ½0) Value of Y when X=0

(not always meaningful!)

Slope (οΏ½Μ‚οΏ½1) Average change in Y per unit increase in X

β€œEffect” of X on Y; β€œregression coefficient”

Page 10: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 10

Example: X = gestational age (wks), Y = head circumference (cms)

Formula for least squares regression line is:

Y = 3.91 + 0.78X

Intercept: not meaningful! (extrapolation to X = 0 weeks)

Slope: For every increase of one week gestational age,

there is an increase of about 0.78 cm head circumference.

15

20

25

30

35

40

20 22 24 26 28 30 32 34 36

Y =

He

ad C

ircu

mfe

ren

ce (

cm)

X = Gestational Age (weeks)

Page 11: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 11

Another approach:

Method of Maximum Likelihood

The MLE maximizes the likelihood function

(the likelihood of the observed data, given the model parameters)

Q. Under which parameter values is the sample data

most likely to occur?

[see explanation of MLE on p.27-29]

For simple linear regression:

πœ€π‘– = π‘Œπ‘– βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖 ~ 𝑁(0, 𝜎2)

β‡’ 𝑓(πœ€π‘–) =1

√2πœ‹πœŽπ‘’π‘₯𝑝 {βˆ’

1

2𝜎2(𝑦𝑖 βˆ’ 𝛽0 βˆ’ 𝛽1π‘₯𝑖)

2}

β‡’ likelihood = 𝐿 = ∏ 𝑓(πœ€π‘–)𝑛𝑖=1

β‡’ π‘™π‘œπ‘”π‘’πΏ = 𝑙𝑛 {1

(2πœ‹πœŽ2)𝑛 2⁄ 𝑒π‘₯𝑝 [βˆ’1

2𝜎2βˆ‘ (𝑦𝑖 βˆ’ 𝛽0 βˆ’ 𝛽1π‘₯𝑖)

2𝑛𝑖=1 ]}

= βˆ’π‘›

2𝑙𝑛(2πœ‹πœŽ2) βˆ’

1

2𝜎2βˆ‘ (𝑦𝑖 βˆ’ 𝛽0 βˆ’ 𝛽1π‘₯𝑖)

2𝑛𝑖=1

πœ•π‘™π‘œπ‘”π‘’πΏ

πœ•π›½0= 0 ,

πœ•π‘™π‘œπ‘”π‘’πΏ

πœ•π›½1= 0 ,

πœ•π‘™π‘œπ‘”π‘’πΏ

πœ•πœŽ= 0

β‡’ same solution as LSE (please prove for yourself!)

same nice properties

Page 12: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 12

After calculating the fitted regression line:

Fitted value οΏ½Μ‚οΏ½π’Š �̂�𝑖 = οΏ½Μ‚οΏ½0 + οΏ½Μ‚οΏ½1𝑋𝑖

On the fitted line for the value 𝑋𝑖

Fitted Y- values are estimates of the Mean Response Function

�̂�𝑖 is an unbiased estimator of the mean response at 𝑋𝑖

The fitted line is an unbiased estimator of the mean response

function

Note: the point (𝑋, π‘Œ) is ALWAYS on the fitted regression line,

i.e,

π‘Œ = οΏ½Μ‚οΏ½0 + οΏ½Μ‚οΏ½1𝑋

Page 13: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 13

The ith residual 𝑒𝑖:

𝑒𝑖 = π‘Œπ‘– βˆ’ �̂�𝑖 = π‘Œπ‘– βˆ’ (οΏ½Μ‚οΏ½0 + οΏ½Μ‚οΏ½1𝑋𝑖)

Β· it is the vertical distance between (𝑋𝑖 , π‘Œπ‘–) and (𝑋𝑖 , �̂�𝑖)

Β· it is the estimate of the ith error term, 𝑒𝑖 = πœ€οΏ½Μ‚οΏ½

Β· βˆ‘ 𝑒𝑖 = 0𝑛𝑖=1

Proof:

βˆ‘π‘’π‘– = βˆ‘[π‘Œπ‘– βˆ’ (οΏ½Μ‚οΏ½0 + οΏ½Μ‚οΏ½1𝑋𝑖)]

= βˆ‘π‘Œπ‘– βˆ’ 𝑛�̂�0 βˆ’ οΏ½Μ‚οΏ½1βˆ‘π‘‹π‘–

= 0 (by normal equation 1)

Page 14: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 14

Error Sum of Squares:

𝑆𝑆𝐸 = βˆ‘ (π‘Œπ‘– βˆ’ �̂�𝑖)2𝑛

𝑖=1 = βˆ‘ 𝑒𝑖2𝑛

𝑖=1

minimum when residuals are from LSE or MLE.

associated degrees of freedom 𝑑𝑓 = 𝑛 βˆ’ 2

(generally 𝑑𝑓 = 𝑛 βˆ’ 𝑝 where p = # of parameters in the model)

Mean Squared Error: unbiased estimator of 𝜎2

𝑀𝑆𝐸 =𝑆𝑆𝐸

𝑑𝑓=

𝑆𝑆𝐸

π‘›βˆ’2 𝐸(𝑀𝑆𝐸) = 𝜎2

Page 15: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 15

Example: X = gestational age and Y = head circumference

100 observations

scatterplot, fitted line, fitted values

15

20

25

30

35

40

20 22 24 26 28 30 32 34 36

Y =

He

ad C

ircu

mfe

ren

ce (

cm)

X = Gestational Age (weeks)

Page 16: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 16

EXCEL: SUMMARY OUTPUT

Regression Statistics Multiple R 0.780691936 R Square 0.609479899 Adjusted R

Square 0.605495 Standard

Error 1.590413353 Observations 100

ANOVA df SS MS F Significance F

Regression 1 386.8673658 386.8674 152.9474 1.00121E-21 Residual 98 247.8826342 2.529415

Total 99 634.75

Coefficients Standard

Error t Stat P-value Intercept 3.914264144 1.82914689 2.13994 0.034842 X Variable 1 0.780053162 0.063074406 12.36719 1E-21

Page 17: Chapter 1: Linear Regression with One Predictor Variable

BSTT523: Kutner et al., Chapter 1 17

SAS output:

The REG Procedure

Model: MODEL1

Dependent Variable: headcirc

Number of Observations Read 100

Number of Observations Used 100

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 1 386.86737 386.86737 152.95 <.0001

Error 98 247.88263 2.52941

Corrected Total 99 634.75000

Root MSE 1.59041 R-Square 0.6095

Dependent Mean 26.45000 Adj R-Sq 0.6055

Coeff Var 6.01290

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 3.91426 1.82915 2.14 0.0348

gestage 1 0.78005 0.06307 12.37 <.0001