regression analysis(cases 1-3)
TRANSCRIPT
-
8/12/2019 Regression Analysis(Cases 1-3)
1/42
Regression Analysis
Muhammad Akram Naseem
([email protected])Presenter:
Research Centre for Training and Development(RCTD
3/10/2014 1Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
2/42
Model Building
Model
Mathematical way to express the theory is
known as model
Types of Models
1. Exact Models(Mathematical Model)
2. In-Exact Models(Statistical Model)
3/10/2014 Unlock the Potential of Data Analysis 2
-
8/12/2019 Regression Analysis(Cases 1-3)
3/42
Exact Models(Mathematical Model)
The expression by which output can be
determined exactly by the input(s) known as
exact model, e.g
3/10/2014 Unlock the Potential of Data Analysis 3
Chemical formula of water: H2O
Chemical formula of glucose:C6H12O6
Area of a circle=pi(radius)^2
-
8/12/2019 Regression Analysis(Cases 1-3)
4/42
In-Exact Models
(Statistical Model)
The expressions in which output cant bedetermined exactly by some input(s) known asstatistical models, e.g.
1. Fertility of land cant be determined exactly byonly amount of rain fall
2. CGPA(marks) of the students cant bedetermined exactly by study hours of the
students
3. Sale of Ice cream cant be determined exactly byonly daily temperature
3/10/2014 Unlock the Potential of Data Analysis 4
-
8/12/2019 Regression Analysis(Cases 1-3)
5/42
Different Forms of Statistical Model
Linear Models
1. Simple Linear Models
2. Multiple Linear Models Non Linear Models
1. Polynomial Models
2. Reciprocal Models3. Logarithmic Models
3/10/2014 Unlock the Potential of Data Analysis 5
-
8/12/2019 Regression Analysis(Cases 1-3)
6/42
Linearity Determination
Graphic Form
3/10/2014 Unlock the Potential of Data Analysis 6
-6
-5
-4
-3
-2
-1
0
1 2 3 4 5 6 7 8 9 10
Y=a+bx,
Fig-1
0
1
2
3
4
5
6
78
1 2 3 4 5 6 7 8 9 10
Y=a+bx
Fig-2
-
8/12/2019 Regression Analysis(Cases 1-3)
7/42
Linearity Determination
Graphic Form
3/10/2014 Unlock the Potential of Data Analysis 7
-300
-200
-100
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Quardatic
Fig-3
-
8/12/2019 Regression Analysis(Cases 1-3)
8/42
Linearity Determination
Linear with respect to variables and parameters
Y=+X+
3/10/2014 Unlock the Potential of Data Analysis 8
Y=+ 1X1 +2X2+------------------------ kXk+
-
8/12/2019 Regression Analysis(Cases 1-3)
9/42
Assumptions of Classical Linear
Regression Model(CLRM)
1. The regression model is linear in theparameters. Y=+X+u
2. X values are fixed in repeated sampling (X isassumed to be nonstochastic)
3. Zero mean value of disturbance Uii.e E(Ui) =0
4. Homoscadasticity or equal variance of uiVar(ui)=
2
3/10/2014 Unlock the Potential of Data Analysis 9
-
8/12/2019 Regression Analysis(Cases 1-3)
10/42
Assumptions of Classical Linear
Regression Model(CLRM)5.No autocorrelation between the disturbances, the correlation between any two uiand ujijis
zero Cov(ui , uj) =0
6. Zero covariance between ui and XiE(uiXi)=0
7.The number of observations n must be greater than the number of parameters to beestimated. Alternatively, the number of observations n must be greater than the number of
explanatory variables
8.Variability in X values. The X values in a given sample must not all be the same. Var(X) must be
a finite positive number.
9.The regression model is correctly specified. Alternatively, there is no specification bias or error
10.There is no perfect multicollinearity. That is, there is no perfect linear relationship among the
explanatory variable.
3/10/2014 Unlock the Potential of Data Analysis 10
-
8/12/2019 Regression Analysis(Cases 1-3)
11/42
Regression Analysis
Dependence of one variable on a single variableor more than one variables is known asRegression
Simple Regression
Dependence of one variable on a single variable isknown as simple Regression
Multiple Regression
Dependence of one variable on more than onevariables is known as multiple Regression
3/10/2014 11Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
12/42
Regression Analysis
Simple Regression
1-Blood Pressure(Y)depends on age(X)
Y=+X+
2-CGPA of students(Y) depend on study hours(X)
Y=+X+
3-Production of a certain crop(Y) depend onamount of fertilizer used(X)
Y=+X+
3/10/2014 12Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
13/42
Regression Analysis
Y=+X+
DependentVariable
In Dependent
VariableY-intercept
Slope of line orRegression Co-efficient or
Rate of change
Residual
term
3/10/2014 13Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
14/42
-
8/12/2019 Regression Analysis(Cases 1-3)
15/42
Regression Analysis
Purpose of Regression Analysis
1. To find out rate of change
2. To estimate the dependent variable on the
basis of independent variable(s)
3/10/2014 15Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
16/42
3/10/2014 Unlock the Potential of Data Analysis 16
(a) > 0 (b) < 0 (c) = 0
Regression lines for different values of
Y=+X+
-
8/12/2019 Regression Analysis(Cases 1-3)
17/42
Simple Regression Analysis
(Case-1)
Case: Blood Pressure is dependent on age
Dependent variable: Blood Pressure(B.P)
Independent: Age(x)
Model: B.P= +age+
3/10/2014 17Unlock the Potential of Data Analysis
xyn
- xn
yn
x
n -
x
n
2 2
b =
a = y byxx
-
8/12/2019 Regression Analysis(Cases 1-3)
18/42
Scattered Diagram
3/10/2014 Unlock the Potential of Data Analysis 18
AGE
8070605040302010
B.P
150
140
130
120
110
-
8/12/2019 Regression Analysis(Cases 1-3)
19/42
Simple Regression Analysis
(Case-1)
4.Click on
Ok2.Click onlinear
3.Shift the desire
variables
1.Click on
analyze
3/10/2014 19Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
20/42
Simple Regression Analysis
(Case-1)
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 0.965 .930 .926 2.178
ANOVAb
Model Sum of Squares df Mean Square F Sig.
Regression 1078.29 1 1078.29 227.268 0.00
Residual 80.658 17 4.745
Total 1158.97 18a. Predictors: (Constant), Age
b. Dependent Variable: B.P
Explanatory
power of the
model
P-value suggest
that model is
significant
3/10/2014 20Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
21/42
Output tables
1-Summary table
2-ANOVA table
3-Co-efficients table
3/10/2014 Unlock the Potential of Data Analysis 21
-
8/12/2019 Regression Analysis(Cases 1-3)
22/42
Simple Regression Analysis
(Case-1)
Co-efficients
Unstandardized CoefficientsStandardized
Coefficients
t Sig.
B Std. Error Beta
(Constant)112.216 1.401 80.097 0.00
Age0.447 0.030 0.965 15.075 0.00
P-value suggest
that explanatory
variable is
significant
Estimated model is:
B.P=112.216+0.447Age3/10/2014 22Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
23/42
Practice- Case-1
An experiment was conducted to study the
impact of heart rate(X) on anxiety(Y). The data
relate to 12 normal adults and is given in spss
file.
Estimate the model-------
3/10/2014 Unlock the Potential of Data Analysis 23
Y=+X+
-
8/12/2019 Regression Analysis(Cases 1-3)
24/42
Simple Regression Analysis
(Case-2)
In case 2 our objective is to know the impact
of a categorical (binary) explanatory variable
on a quantitative dependent variable, how the
analysis will be performed and how we willinterpret the findings.
Dependent variable: Marks
Independent variable: gender
3/10/2014 Unlock the Potential of Data Analysis 24
-
8/12/2019 Regression Analysis(Cases 1-3)
25/42
Multiple Regression
1-Saving of household(Y)depends on monthlyincome(X1), size of family(X2) and so on
Y=+1X1+2X2+------------------------ kXk+
2-CGPA of students(Y) depend on studyhours(X1),IQ(X2) and so on
Y=+1X1+2X2+------------------------ kXk+
3-Production of a certain crop(Y) depend on
amount of fertilizer used(X1),water(X2) and so onY=+1X1+2X2+------------------------ kXk+
3/10/2014 25Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
26/42
Simple Regression Analysis
(Case-2)
In Regression Analysis, when ever explanatoryvariable is categorical , then we introduce dummyvariable.
Number of dummy variables= number ofcategories-1,
In case-2 our explanatory variable is gender( male,female) which possess two categories, so we
introduce one dummy variable(D) by the followingcoding scheme
Female=1 , male= 0
3/10/2014 Unlock the Potential of Data Analysis 26
-
8/12/2019 Regression Analysis(Cases 1-3)
27/42
Simple Regression Analysis
(Case-2)
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 0.056 .003 -.007 5.484
ANOVAb
Model Sum of Squares df Mean Square F Sig.
Regression 9.213 1 9.213 0.306 0.58
Residual 2947.547 98 30.077
Total 2956.760 99a. Predictors: (Constant), gender
b. Dependent Variable: Marks
Explanatory
power of the
model
P-value suggest
that model is in
significant
3/10/2014 27Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
28/42
Simple Regression Analysis
(Case-2)
Co-efficients
Unstandardized CoefficientsStandardized
Coefficientst Sig.
B Std. Error Beta
(Constant)68.455 0.739 92.569 0.00
gnder-0.610 1.102 -0.056 -0.553 0.581
P-value suggest
that explanatory
variable is in
significant
Estimated model
Marks=68.455-0.610gender
3/10/2014 28Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
29/42
Simple Regression Analysis
(Case-2)
Marks=68.455-0.610gender
Average marks of male students:
Marks=68.455-0.610(0)=68.455---------------(1)Average marks of female students:
Marks=68.455-0.610(1)=67.844----------------(2)
The difference of equation (2)-(1)= -0.610
3/10/2014 Unlock the Potential of Data Analysis 29
-
8/12/2019 Regression Analysis(Cases 1-3)
30/42
Practice:-Case 2
Use Case 2.save data file and determine the
impact of gender on the salary of employees
of an organization
Dependent variable: Salary
Independent variable: gender
Estimated the model:
3/10/2014 Unlock the Potential of Data Analysis 30
Y=+X+
-
8/12/2019 Regression Analysis(Cases 1-3)
31/42
Simple Regression Analysis
(Case-3)
In case 3 our objective is to know the impact
of a multi-categorical explanatory variable on
a quantitative dependent variable, how the
analysis will be performed and how we willinterpret the findings.
File used: Case3.sav
Dependent variable: Salary
Independent variable: Job category
3/10/2014 Unlock the Potential of Data Analysis 31
-
8/12/2019 Regression Analysis(Cases 1-3)
32/42
Simple Regression Analysis
(Case-3)
In Case3 our explanatory variable is jobcategory, which posses 3 subcategories(Clerical , Custodial , Manager).
We will make two dummy variables by takingconsidering one sub-category as reference orbase category
D1: (clerical=1 , otherwise= 0)D2: (custodial=1 , otherwise= 0) , manager asreference category
3/10/2014 Unlock the Potential of Data Analysis 32
-
8/12/2019 Regression Analysis(Cases 1-3)
33/42
Simple Regression Analysis
(Case-3)
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 0.805 .649 .647 10144.651
ANOVAb
Model Sum of Squares df Mean Square F Sig.
Regression 8.943E10 2 4.472E10 434.502 0.00
Residual 4.847E10 471 1.029E8
Total 1.379E11 473
a. Predictors: (Constant), D1,D2
b. Dependent Variable: Salary
Explanatory
power of the
model
P-value suggest
that model is
significant
3/10/2014 33Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
34/42
Simple Regression Analysis
(Case-3)
Co-efficients
Unstandardized CoefficientsStandardized
Coefficientst Sig.
B Std. Error Beta
(Constant)63977.798 1106.872 57.801 0.00
D1-36138.018 1228.281
-0.897-29.422 0.00
D2 -33038.909 2244.280 -0.449 -14.721 0.00
P-value suggest
that explanatory
variables D1 and
D2 are significant
Estimated model :Salary=63977-36138D1-33038D2
3/10/2014 34Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
35/42
Simple Regression Analysis
(Case-3)
Estimated model :
Salary=63977-36138D1-33038D2Average Salary of Clerks:63977-36138(1)-33038(0)= 27839
Average Salary of Custodian:63977-36138(0)-33038(1)=30939
Average Salary of Managers:63977-36138(0)-33038(0)=63977
3/10/2014 Unlock the Potential of Data Analysis 35
27840 30939
63978
0
20000
40000
60000
80000
Clerical Custodian Manager
Average Salary of
different job categories
-
8/12/2019 Regression Analysis(Cases 1-3)
36/42
Multiple Regression
Y=+1X
1+
2X
2+------------------------
kX
k+
Intercept
DependentVariable
In Dependent
VariableResidual
term
Regression co-efficients
3/10/2014 36Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
37/42
Multiple Regression
To know the impact of age and weight on
blood pressure a random sample from 20
patients is collected and analyzed
BP=+1AGE+2Weight+
3/10/2014 37Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
38/42
Multiple Regression
4.Click on
Ok2.Click onlinear
3.Shift the desire
variables
1.Click on
analyze
3/10/2014 38Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
39/42
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1.00 0.99 0.99 0.53
ANOVA
Sum of Squares df Mean Square F Sig.
Regression 555.18 2.00 277.59978.25 0.00
Residual 4.82 17.00 0.28
Total 560 19
Explanatory
power of the
model
P-value suggest
that model is
significant
3/10/2014 39Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
40/42
Coefficients
Unstandardized CoefficientsStandardized
Coefficients
t Sig.
B Std. Error Beta
(Constant) -16.58 3.01 -5.51 0.00
Age 0.71 0.05 0.33 13.23 0.00
Weight 1.03 0.03 0.82 33.15 0.00
P-value suggest
that explanatory
variable is
significant
Estimated model is:
BP=-16.58+0.71Age+1.03Weight
3/10/2014 40Unlock the Potential of Data Analysis
-
8/12/2019 Regression Analysis(Cases 1-3)
41/42
Practice
The data given in spss file were collected using a simplerandom sample of 20 hypertensive patients.
Y = mean arterial blood pressure (mmHg)
X1= age (years)
X2= weight (kg) X3= body surface area (sqm)
X4= duration of hypertension (years)
X5= basal pulse (beats/min)
X6= measures of stress
3/10/2014 Unlock the Potential of Data Analysis 41
-
8/12/2019 Regression Analysis(Cases 1-3)
42/42