regression analysis(cases 1-3)

Upload: saeedawais47

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Regression Analysis(Cases 1-3)

    1/42

    Regression Analysis

    Muhammad Akram Naseem

    ([email protected])Presenter:

    Research Centre for Training and Development(RCTD

    3/10/2014 1Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    2/42

    Model Building

    Model

    Mathematical way to express the theory is

    known as model

    Types of Models

    1. Exact Models(Mathematical Model)

    2. In-Exact Models(Statistical Model)

    3/10/2014 Unlock the Potential of Data Analysis 2

  • 8/12/2019 Regression Analysis(Cases 1-3)

    3/42

    Exact Models(Mathematical Model)

    The expression by which output can be

    determined exactly by the input(s) known as

    exact model, e.g

    3/10/2014 Unlock the Potential of Data Analysis 3

    Chemical formula of water: H2O

    Chemical formula of glucose:C6H12O6

    Area of a circle=pi(radius)^2

  • 8/12/2019 Regression Analysis(Cases 1-3)

    4/42

    In-Exact Models

    (Statistical Model)

    The expressions in which output cant bedetermined exactly by some input(s) known asstatistical models, e.g.

    1. Fertility of land cant be determined exactly byonly amount of rain fall

    2. CGPA(marks) of the students cant bedetermined exactly by study hours of the

    students

    3. Sale of Ice cream cant be determined exactly byonly daily temperature

    3/10/2014 Unlock the Potential of Data Analysis 4

  • 8/12/2019 Regression Analysis(Cases 1-3)

    5/42

    Different Forms of Statistical Model

    Linear Models

    1. Simple Linear Models

    2. Multiple Linear Models Non Linear Models

    1. Polynomial Models

    2. Reciprocal Models3. Logarithmic Models

    3/10/2014 Unlock the Potential of Data Analysis 5

  • 8/12/2019 Regression Analysis(Cases 1-3)

    6/42

    Linearity Determination

    Graphic Form

    3/10/2014 Unlock the Potential of Data Analysis 6

    -6

    -5

    -4

    -3

    -2

    -1

    0

    1 2 3 4 5 6 7 8 9 10

    Y=a+bx,

    Fig-1

    0

    1

    2

    3

    4

    5

    6

    78

    1 2 3 4 5 6 7 8 9 10

    Y=a+bx

    Fig-2

  • 8/12/2019 Regression Analysis(Cases 1-3)

    7/42

    Linearity Determination

    Graphic Form

    3/10/2014 Unlock the Potential of Data Analysis 7

    -300

    -200

    -100

    0

    100

    200

    300

    400

    500

    600

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    Quardatic

    Fig-3

  • 8/12/2019 Regression Analysis(Cases 1-3)

    8/42

    Linearity Determination

    Linear with respect to variables and parameters

    Y=+X+

    3/10/2014 Unlock the Potential of Data Analysis 8

    Y=+ 1X1 +2X2+------------------------ kXk+

  • 8/12/2019 Regression Analysis(Cases 1-3)

    9/42

    Assumptions of Classical Linear

    Regression Model(CLRM)

    1. The regression model is linear in theparameters. Y=+X+u

    2. X values are fixed in repeated sampling (X isassumed to be nonstochastic)

    3. Zero mean value of disturbance Uii.e E(Ui) =0

    4. Homoscadasticity or equal variance of uiVar(ui)=

    2

    3/10/2014 Unlock the Potential of Data Analysis 9

  • 8/12/2019 Regression Analysis(Cases 1-3)

    10/42

    Assumptions of Classical Linear

    Regression Model(CLRM)5.No autocorrelation between the disturbances, the correlation between any two uiand ujijis

    zero Cov(ui , uj) =0

    6. Zero covariance between ui and XiE(uiXi)=0

    7.The number of observations n must be greater than the number of parameters to beestimated. Alternatively, the number of observations n must be greater than the number of

    explanatory variables

    8.Variability in X values. The X values in a given sample must not all be the same. Var(X) must be

    a finite positive number.

    9.The regression model is correctly specified. Alternatively, there is no specification bias or error

    10.There is no perfect multicollinearity. That is, there is no perfect linear relationship among the

    explanatory variable.

    3/10/2014 Unlock the Potential of Data Analysis 10

  • 8/12/2019 Regression Analysis(Cases 1-3)

    11/42

    Regression Analysis

    Dependence of one variable on a single variableor more than one variables is known asRegression

    Simple Regression

    Dependence of one variable on a single variable isknown as simple Regression

    Multiple Regression

    Dependence of one variable on more than onevariables is known as multiple Regression

    3/10/2014 11Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    12/42

    Regression Analysis

    Simple Regression

    1-Blood Pressure(Y)depends on age(X)

    Y=+X+

    2-CGPA of students(Y) depend on study hours(X)

    Y=+X+

    3-Production of a certain crop(Y) depend onamount of fertilizer used(X)

    Y=+X+

    3/10/2014 12Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    13/42

    Regression Analysis

    Y=+X+

    DependentVariable

    In Dependent

    VariableY-intercept

    Slope of line orRegression Co-efficient or

    Rate of change

    Residual

    term

    3/10/2014 13Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    14/42

  • 8/12/2019 Regression Analysis(Cases 1-3)

    15/42

    Regression Analysis

    Purpose of Regression Analysis

    1. To find out rate of change

    2. To estimate the dependent variable on the

    basis of independent variable(s)

    3/10/2014 15Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    16/42

    3/10/2014 Unlock the Potential of Data Analysis 16

    (a) > 0 (b) < 0 (c) = 0

    Regression lines for different values of

    Y=+X+

  • 8/12/2019 Regression Analysis(Cases 1-3)

    17/42

    Simple Regression Analysis

    (Case-1)

    Case: Blood Pressure is dependent on age

    Dependent variable: Blood Pressure(B.P)

    Independent: Age(x)

    Model: B.P= +age+

    3/10/2014 17Unlock the Potential of Data Analysis

    xyn

    - xn

    yn

    x

    n -

    x

    n

    2 2

    b =

    a = y byxx

  • 8/12/2019 Regression Analysis(Cases 1-3)

    18/42

    Scattered Diagram

    3/10/2014 Unlock the Potential of Data Analysis 18

    AGE

    8070605040302010

    B.P

    150

    140

    130

    120

    110

  • 8/12/2019 Regression Analysis(Cases 1-3)

    19/42

    Simple Regression Analysis

    (Case-1)

    4.Click on

    Ok2.Click onlinear

    3.Shift the desire

    variables

    1.Click on

    analyze

    3/10/2014 19Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    20/42

    Simple Regression Analysis

    (Case-1)

    Model Summary

    Model R R Square Adjusted R Square Std. Error of the Estimate

    1 0.965 .930 .926 2.178

    ANOVAb

    Model Sum of Squares df Mean Square F Sig.

    Regression 1078.29 1 1078.29 227.268 0.00

    Residual 80.658 17 4.745

    Total 1158.97 18a. Predictors: (Constant), Age

    b. Dependent Variable: B.P

    Explanatory

    power of the

    model

    P-value suggest

    that model is

    significant

    3/10/2014 20Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    21/42

    Output tables

    1-Summary table

    2-ANOVA table

    3-Co-efficients table

    3/10/2014 Unlock the Potential of Data Analysis 21

  • 8/12/2019 Regression Analysis(Cases 1-3)

    22/42

    Simple Regression Analysis

    (Case-1)

    Co-efficients

    Unstandardized CoefficientsStandardized

    Coefficients

    t Sig.

    B Std. Error Beta

    (Constant)112.216 1.401 80.097 0.00

    Age0.447 0.030 0.965 15.075 0.00

    P-value suggest

    that explanatory

    variable is

    significant

    Estimated model is:

    B.P=112.216+0.447Age3/10/2014 22Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    23/42

    Practice- Case-1

    An experiment was conducted to study the

    impact of heart rate(X) on anxiety(Y). The data

    relate to 12 normal adults and is given in spss

    file.

    Estimate the model-------

    3/10/2014 Unlock the Potential of Data Analysis 23

    Y=+X+

  • 8/12/2019 Regression Analysis(Cases 1-3)

    24/42

    Simple Regression Analysis

    (Case-2)

    In case 2 our objective is to know the impact

    of a categorical (binary) explanatory variable

    on a quantitative dependent variable, how the

    analysis will be performed and how we willinterpret the findings.

    Dependent variable: Marks

    Independent variable: gender

    3/10/2014 Unlock the Potential of Data Analysis 24

  • 8/12/2019 Regression Analysis(Cases 1-3)

    25/42

    Multiple Regression

    1-Saving of household(Y)depends on monthlyincome(X1), size of family(X2) and so on

    Y=+1X1+2X2+------------------------ kXk+

    2-CGPA of students(Y) depend on studyhours(X1),IQ(X2) and so on

    Y=+1X1+2X2+------------------------ kXk+

    3-Production of a certain crop(Y) depend on

    amount of fertilizer used(X1),water(X2) and so onY=+1X1+2X2+------------------------ kXk+

    3/10/2014 25Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    26/42

    Simple Regression Analysis

    (Case-2)

    In Regression Analysis, when ever explanatoryvariable is categorical , then we introduce dummyvariable.

    Number of dummy variables= number ofcategories-1,

    In case-2 our explanatory variable is gender( male,female) which possess two categories, so we

    introduce one dummy variable(D) by the followingcoding scheme

    Female=1 , male= 0

    3/10/2014 Unlock the Potential of Data Analysis 26

  • 8/12/2019 Regression Analysis(Cases 1-3)

    27/42

    Simple Regression Analysis

    (Case-2)

    Model Summary

    Model R R Square Adjusted R Square Std. Error of the Estimate

    1 0.056 .003 -.007 5.484

    ANOVAb

    Model Sum of Squares df Mean Square F Sig.

    Regression 9.213 1 9.213 0.306 0.58

    Residual 2947.547 98 30.077

    Total 2956.760 99a. Predictors: (Constant), gender

    b. Dependent Variable: Marks

    Explanatory

    power of the

    model

    P-value suggest

    that model is in

    significant

    3/10/2014 27Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    28/42

    Simple Regression Analysis

    (Case-2)

    Co-efficients

    Unstandardized CoefficientsStandardized

    Coefficientst Sig.

    B Std. Error Beta

    (Constant)68.455 0.739 92.569 0.00

    gnder-0.610 1.102 -0.056 -0.553 0.581

    P-value suggest

    that explanatory

    variable is in

    significant

    Estimated model

    Marks=68.455-0.610gender

    3/10/2014 28Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    29/42

    Simple Regression Analysis

    (Case-2)

    Marks=68.455-0.610gender

    Average marks of male students:

    Marks=68.455-0.610(0)=68.455---------------(1)Average marks of female students:

    Marks=68.455-0.610(1)=67.844----------------(2)

    The difference of equation (2)-(1)= -0.610

    3/10/2014 Unlock the Potential of Data Analysis 29

  • 8/12/2019 Regression Analysis(Cases 1-3)

    30/42

    Practice:-Case 2

    Use Case 2.save data file and determine the

    impact of gender on the salary of employees

    of an organization

    Dependent variable: Salary

    Independent variable: gender

    Estimated the model:

    3/10/2014 Unlock the Potential of Data Analysis 30

    Y=+X+

  • 8/12/2019 Regression Analysis(Cases 1-3)

    31/42

    Simple Regression Analysis

    (Case-3)

    In case 3 our objective is to know the impact

    of a multi-categorical explanatory variable on

    a quantitative dependent variable, how the

    analysis will be performed and how we willinterpret the findings.

    File used: Case3.sav

    Dependent variable: Salary

    Independent variable: Job category

    3/10/2014 Unlock the Potential of Data Analysis 31

  • 8/12/2019 Regression Analysis(Cases 1-3)

    32/42

    Simple Regression Analysis

    (Case-3)

    In Case3 our explanatory variable is jobcategory, which posses 3 subcategories(Clerical , Custodial , Manager).

    We will make two dummy variables by takingconsidering one sub-category as reference orbase category

    D1: (clerical=1 , otherwise= 0)D2: (custodial=1 , otherwise= 0) , manager asreference category

    3/10/2014 Unlock the Potential of Data Analysis 32

  • 8/12/2019 Regression Analysis(Cases 1-3)

    33/42

    Simple Regression Analysis

    (Case-3)

    Model Summary

    Model R R Square Adjusted R Square Std. Error of the Estimate

    1 0.805 .649 .647 10144.651

    ANOVAb

    Model Sum of Squares df Mean Square F Sig.

    Regression 8.943E10 2 4.472E10 434.502 0.00

    Residual 4.847E10 471 1.029E8

    Total 1.379E11 473

    a. Predictors: (Constant), D1,D2

    b. Dependent Variable: Salary

    Explanatory

    power of the

    model

    P-value suggest

    that model is

    significant

    3/10/2014 33Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    34/42

    Simple Regression Analysis

    (Case-3)

    Co-efficients

    Unstandardized CoefficientsStandardized

    Coefficientst Sig.

    B Std. Error Beta

    (Constant)63977.798 1106.872 57.801 0.00

    D1-36138.018 1228.281

    -0.897-29.422 0.00

    D2 -33038.909 2244.280 -0.449 -14.721 0.00

    P-value suggest

    that explanatory

    variables D1 and

    D2 are significant

    Estimated model :Salary=63977-36138D1-33038D2

    3/10/2014 34Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    35/42

    Simple Regression Analysis

    (Case-3)

    Estimated model :

    Salary=63977-36138D1-33038D2Average Salary of Clerks:63977-36138(1)-33038(0)= 27839

    Average Salary of Custodian:63977-36138(0)-33038(1)=30939

    Average Salary of Managers:63977-36138(0)-33038(0)=63977

    3/10/2014 Unlock the Potential of Data Analysis 35

    27840 30939

    63978

    0

    20000

    40000

    60000

    80000

    Clerical Custodian Manager

    Average Salary of

    different job categories

  • 8/12/2019 Regression Analysis(Cases 1-3)

    36/42

    Multiple Regression

    Y=+1X

    1+

    2X

    2+------------------------

    kX

    k+

    Intercept

    DependentVariable

    In Dependent

    VariableResidual

    term

    Regression co-efficients

    3/10/2014 36Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    37/42

    Multiple Regression

    To know the impact of age and weight on

    blood pressure a random sample from 20

    patients is collected and analyzed

    BP=+1AGE+2Weight+

    3/10/2014 37Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    38/42

    Multiple Regression

    4.Click on

    Ok2.Click onlinear

    3.Shift the desire

    variables

    1.Click on

    analyze

    3/10/2014 38Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    39/42

    Model Summary

    Model R R Square Adjusted R Square Std. Error of the Estimate

    1.00 0.99 0.99 0.53

    ANOVA

    Sum of Squares df Mean Square F Sig.

    Regression 555.18 2.00 277.59978.25 0.00

    Residual 4.82 17.00 0.28

    Total 560 19

    Explanatory

    power of the

    model

    P-value suggest

    that model is

    significant

    3/10/2014 39Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    40/42

    Coefficients

    Unstandardized CoefficientsStandardized

    Coefficients

    t Sig.

    B Std. Error Beta

    (Constant) -16.58 3.01 -5.51 0.00

    Age 0.71 0.05 0.33 13.23 0.00

    Weight 1.03 0.03 0.82 33.15 0.00

    P-value suggest

    that explanatory

    variable is

    significant

    Estimated model is:

    BP=-16.58+0.71Age+1.03Weight

    3/10/2014 40Unlock the Potential of Data Analysis

  • 8/12/2019 Regression Analysis(Cases 1-3)

    41/42

    Practice

    The data given in spss file were collected using a simplerandom sample of 20 hypertensive patients.

    Y = mean arterial blood pressure (mmHg)

    X1= age (years)

    X2= weight (kg) X3= body surface area (sqm)

    X4= duration of hypertension (years)

    X5= basal pulse (beats/min)

    X6= measures of stress

    3/10/2014 Unlock the Potential of Data Analysis 41

  • 8/12/2019 Regression Analysis(Cases 1-3)

    42/42