chapter 13 multiple regression analysisstazjt/teaching/st2137/lecture/lec 13.pdf · chapter 13...
TRANSCRIPT
'
&
$
%
Chapter 13
Multiple Regression Analysis
1
'
&
$
%
Introduction
• Multiple regression analysis is a method for relating two or
more independent variables to a dependent variable.
• Dependent variable: continuous (except with logistic
regression)
• Independent variables: either “continuous” or “categorical”
• For categorical variables, use dummy variables rather than the
actual character values.
2
'
&
$
%
Dummy variables
• A dummy variable always takes values 0 or 1.
• For example, a categorical variable, “W”, with 3 values, “Low”,
“Average” and “High”.
• We create 2 dummy variables D1, D2 such that (D1, D2)=(1,0)
for “W=low”, (D1, D2)=(0,1) for “W=Average” and (D1,
D2)=(0,0) for “W=High”.
• If all of the independent variables are categorical, you may use
ANOVA method.
3
'
&
$
%
Multiple Regression
• A researcher is interested in the effects of scheduled exercise
and the use of a simulant for weight loss.
• 24 subjects participated in the study.
• 3 levels of exercise (0, 5,and 10 hours/week)
• 4 levels of stimulant (100, 200, 300 and 400 mg/day)
4
'
&
$
%
Multiple Regression
• Each of subject was randomly assigned to a level of exercise
and stimulant such that 2 subjects are in each of the 3×4(=12)
possible combinations of exercise and stimulant.
• After 3 weeks of participation, a measure of weight loss (post
weight-pre weight) is obtained for each subject.
• The results are given in the data set “weight loss.txt”.
5
'
&
$
%
Multiple Regression Model
• Model: Yi = β0 + β1x1 + β2x2 + ϵi where ϵi ∼ N(0, σ)
independently.
• Test H0 : β1 = 0 and β2 = 0 against H1 : β1 ̸= 0 for some
i = 1, 2.
• Test statistic: F=MSR/MSE. F∼F(k,n-k-1), where k is the
number of independent variables in the model (i.e.k=2 and
n=24 in this example).
• Reject H0 at the 5% significant level if Fobserved > F0.05(2, 21)
or p-value < 0.05.
6
'
&
$
%
Multiple Regression: SAS
options ls=75 ps=40 nodate pageno=1;
data ex13 1;
infile “F:\ST2137\lecdata\ex13 1.txt” firstobs=2;
input id dosage exercise loss;
proc reg data=ex13 1;
title “Regression Example: Weight loss”;
model loss=dosage exercise/p r;
run;
quit;
Remark:
“proc reg”, “proc anova” and “proc glm” are considered as
“interactive” procedures.
It remains active until a new procedure is submitted or until a
“quit” statement is submitted.
7
'
&
$
%
Multiple Regression: SAS output
8
'
&
$
%
Multiple Regression: SAS output
9
'
&
$
%
Remarks
• The model is significant since p-value=0.0005 in the ANOVA
table.
• The regression coefficient for “dosage” is not statistically
significantly different from 0 (p-value=0.8151)
• One may consider running a new regression model with
“dosage” eliminated, to refine the estimate of “exercise”.
10
'
&
$
%
Multiple Regression: R
>ex13.1=read.table(“F:/ST2137/ex13 1.txt”, header=T)
>attach(ex13.1)
>model1=lm(loss∼dosage+exercise)
>summary(model1)
11
'
&
$
%
Multiple Regression: R
>anova(model1)
12
'
&
$
%
Multiple Regression: SPSS
• “Analyze”→ “Regression” → “Linear”
• Move “loss” to the Dependent panel and “dosage” and
“exercise” to the Independent panel.
• “OK”.
13
'
&
$
%14
'
&
$
%
Multiple Regression: SPSS output
15