fitting linear regression in spss and output interpretation

Upload: serban-zodian

Post on 04-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    1/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 1

    Fitting Linear Regression in SPSS and OutputInterpretation

    Tuesday 26thAugust

    The aim of this workshop is to introduce you to fitting linear regression in SPSS. It

    will be using the DHS from Ghana, although the techniques shown are the same for

    all datasets. This worksheet and the data associated with the workshop are all

    available on the course website.

    At the end of this session you should be able to:

    - fit a simple linear regression model in SPSS

    - understand how to create dummy variables for use in linear regression with

    categorical explanatory variables

    - interpret output from linear regression analyses

    1. Simple Linear Regression Continuous Explanatory

    Variables

    First of all, download the dataset from the course website at

    www.southampton.ac.uk/socsci/ghp3/course/material.html to your desktop. The

    dataset that will be used for this session is the same as for Computer Workshop 3. It

    is a reduced version of the Ghana DHS 2003, with a line for each child aged under 5

    years old in the selected households.

    Open SPSS in the usual way and open up the dataset.

    In the first part of the workshop we will be looking at the relationship between birth

    weight and weight-for-age z-score. The hypothesis is that the lower the birth weight,

    the lower the weight-for-age z-score against the reference population. We will start

    with some data manipulation, followed by exploratory analyses and then to the

    simple linear regression.

    It is always extremely important to get a feel for the data before you rush headlong

    into some complicated statistical analysis.

    http://www.southampton.ac.uk/socsci/ghp3/course/material.html
  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    2/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 2

    1. Select Analyze | Descriptive Statistics | Explore. The following dialogue

    box should appear.

    2. Transfer the variable Wt/A Standard deviation to the Dependent List box

    by clicking the right arrow next to the box, and then click on the OKbutton.

    3. The output will appear in the right-hand pane of the Output Viewer window.

    Scroll through this output carefully and note what SPSS has produced. The

    default output will include the mean and standard deviation for your data, a

    95% confidence interval for the population mean, a stem-and-leaf plot and a

    boxplot. The stem-and-leaf plot is useful as it enables us to see whether the

    distribution of our response variable (weight-for-age) is highly skewed or not

    in this case it is not! However, it is clear from the boxplot that there are

    some strange values, with a score of about 1000.

    4. There are a number of children who have had their weight-for-age flagged.

    This is because the values for weight-for-age for those children are outside

    acceptable ranges the measurement for height may have been incorrect.

    These are coded as 9998, but are included in the analysis at the moment. We

    need to change this (and while we are doing this we will change other

    variables like this as well.

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    3/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 3

    5. Go to Transform | Recode into Same Variables and recode values 9996 and

    9998 into System-missing for Height-for-age, Weight-for-age, Weight-for-

    height and birth weight. If you have forgotten how to recode variables please

    ask.

    6. Rerun the Explore command and study the results again. The results have

    changed by a large amount.

    7. We can investigate the relationship between weight-for-age and birth weight

    by looking at the correlation between the two variables. Correlation is usually

    calculated between two continuous variables. A correlation of 1 indicates

    perfect positive correlation as one variable increases the other also increases

    at exactly the same rate, while a correlation of -1 indicates perfect negative

    correlation as one variable increases the other decreases at exactly the same

    rate. A correlation of 0 indicates no linear relationship between the two

    variables.

    - Go to Analyse | Correlate | Bivariate. The following box appears.

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    4/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 4

    - Place Wt/A Standard Deviations and Birth weight in the right hand

    Variables Box, as shown above. ClickOK. The following table is

    produced in the output.

    Correlations

    Wt/A Standard

    deviations

    Birth weight

    (kilos - 3 dec.)

    Pearson Correlation 1.000 .109**

    Sig. (2-tailed) .002

    Wt/A Standard deviations

    N 3094.000 837

    Pearson Correlation .109**

    1.000

    Sig. (2-tailed) .002

    Birth weight (kilos - 3 dec.)

    N 837 974.000

    **. Correlation is significant at the 0.01 level (2-tailed).

    - The correlation between Weight-for-age Standard deviation and birth

    weight is 0.109. This is not that high, but the p-value (in the Sig. (2-

    tailed) is 0.002. This is below 0.05 (for a 5% test) and thus is

    significant at the 5% level. Thus there is a relationship between the two

    variables. Also note that the number of children included in thiscorrelation is only 837. Many children do not have a recorded birth

    weight, and some do not have a weight-for-age (the children without a

    weight-for-age include those who have died between birth and the

    survey)/

    8. It is now time for the simple linear regression. Select Analyze | Regression |

    Linear. The linear regression dialogue box appears (see next page).

    9. Our dependent variable is Wt/A Standard deviations, so place this into the

    dependent box. We are predicting weight-for-age using Birth Weight, so

    place birth weight into the independent(s) box.

    10. ClickOK.

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    5/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 5

    11. The following output is produced:

    Variables Entered/Removedb

    ModelVariablesEntered

    VariablesRemoved

    Method

    1 Birth weight(kilos - 3 dec.)

    a . Enter

    a. All requested variables entered.

    b. Dependent Variable: Wt/A Standard deviations

    Model Summary

    Model R R SquareAdjusted R

    SquareStd. Error of the

    Estimate

    1 .109a .012 .011 120.660

    a. Predictors: (Constant), Birth weight (kilos - 3 dec.)

    ANOVAb

    Model Sum of Squares df Mean Square F Sig.

    Regression 145568.173 1 145568.173 9.999 .002a

    Residual 1.216E7 835 14558.868

    1

    Total 1.230E7 836

    a. Predictors: (Constant), Birth weight (kilos - 3 dec.)

    b. Dependent Variable: Wt/A Standard deviations

    This table simply statesthe variables in themodel and the selectionmethod chosen.

    The results indicate thecorrelation (0.109, as seen before)and the r-square this indicateshow much variation is explained in this case not much!

    Do notworryabout this

    box!

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    6/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 6

    Coefficientsa

    Unstandardized CoefficientsStandardizedCoefficients

    Model B Std. Error Beta t Sig.

    (Constant) -146.216 18.112 -8.073 .0001

    Birth weight (kilos - 3 dec.) .017 .005 .109 3.162 .002

    a. Dependent Variable: Wt/A Standard deviations

    The final box, labelled coefficients gives the results of the analysis. Each of the

    columns is explained below:

    - Unstandardized Coefficients B: This shows the values of the numbers in the

    linear regression equation.

    o The constant term is -146.2 indicating that a child who weighs 0g at birth

    (impossible, but this is the theory) will be -146.2 standard deviations below

    the mean for their weight-for-age.

    o The relationship between birth weight and weight-for-age is 0.017. For every

    gram increase in birth weight, weight-for-age increases by 0.017.

    - Unstandardized Coefficients Std.Error: This is the standard error for the

    coefficient it is used in the calculation of significance

    - Standardized Coefficients Beta: Do not worry about this!

    - t: This is the t-test to see if the coefficients are significantly different from 0. A

    value over 1.96 indicates significance at the 5% level.

    - Sig.: This is the p-value. If it is under 0.05 then the variable is significant. The

    value we have here is 0.002, which is highly significant. There is a significant

    relationship between birth weight and weight-for-age.

    2. Simple Linear Regression Categorical Explanatory

    Variables

    1. The procedure for conducting linear regression when there are categorical

    explanatory variables is slightly different, as you need to create dummy

    variables, as explained earlier. If you do not do this, the results that you

    obtain will not be valid. We will look at the relationship between wealth index

    and weight-for-age standard deviations.

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    7/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 7

    2. Firstly, do some exploratory analysis. One way to do this with categorical

    variables is to calculate the mean standard deviation for each wealth quintile.

    To do this:

    - Go toAnalyze | Compare Means | Means- Place Wt/A Standard Deviations in the Dependent List

    - Put Wealth index into the Independent list box

    - Click OK. The following results should be produced:

    Report

    Wt/A Standard deviations

    Wealthindex Mean N Std. Deviation

    Poorest -135.55 1031 127.879

    Poorer -113.66 694 122.574

    Middle -110.86 556 117.847

    Richer -94.47 425 112.391

    Richest -68.28 388 117.536

    Total -112.12 3094 123.417

    - There are large differences in weight-for-age by wealth. The average for the

    poorest quintile is -135.55, while for the richest it is -68.28. As wealth

    increases, weight-for-age against the reference population also increases.

    3. We will now recreate this analysis by conducting linear regression. But first,

    we will need to create dummy variables for the wealth index

    - Four new variables need to be created, as wealth has five categories

    (remember that the number of dummy variables is needed is one less than the

    number of categories!)

    - Go to Transform | Recode into Different Variables

    - PlaceWealth index into the central box. On the right hand side, under

    Output Variable, enter in Poorest into the name variable and label this

    Dummy variable for Poorest Wealth Quintile. ClickChange.

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    8/12

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    9/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 9

    - ClickContinue and then OK. A new variable is created called poorest.

    4. You now need to create three more dummy variables for other categories of

    wealth. To do this, go to Transform | Recode into Different Variablesand follow the process above for Poorer, Middle and Richer. Each time

    you will need to recode a different value to be the dummy (for instance for

    Middle, all those with a 3 in the original dataset need to be recoded as a 1,

    and all other variables as a 0. Please ask if you are confused!

    Alternatively, use the syntax to do this automatically. A file is included on the

    website for you to use to create your dummy variables.

    5. Now the linear regression can be run. Go toAnalyze | Regression |

    Linear. The regression from the previous analysis will still be there. The

    Dependent variable remains the same,Wt/A Standard deviations, but the

    Independent variables are now different.

    Remove Birth weight from the Independent(s)box. Enter instead the four

    dummy variables: Poorest, Poorer, Middle and Richer.

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    10/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 10

    ClickOK

    6. Four boxes are produced, as before. Below is the final box, labelled

    Coefficients.

    Coefficientsa

    Unstandardized CoefficientsStandardizedCoefficients

    Model B Std. Error Beta t Sig.

    (Constant) -68.284 6.173 -11.062 .000

    Dummy variable for poorestwealth quintile

    -67.262 7.242 -.257 -9.288 .000

    Dummy variable for poorerwealth quintile

    -45.381 7.707 -.153 -5.888 .000

    Dummy variable for middlewealth quintile

    -42.576 8.043 -.132 -5.294 .000

    1

    Dummy variable for richerwealth quintile

    -26.189 8.537 -.073 -3.068 .002

    a. Dependent Variable: Wt/A Standard deviations

    You will see that all of the variables are highly significant! This is seen in the

    final column, Sig., which shows the p-value. This indicates that all wealth

    quintiles are different from the Constant, which is the Richest quintile.

    The value for the constant is -68.284, which is the same as seen previously for

    the mean standard deviation for the Richest quintile!

    For the poorest quintile the average score is -68.284 67.262 = -135.546. The

    same as before! For all the wealth quintiles the results mirror the results seen

    before.

    3. Multiple Linear Regression

    You may be wondering why we bothered doing the regression on weight-for-age and

    wealth when we can get the results simply using the Compare Means command.

    The reason is to show the differences when more than one variable is added into the

    model at the same time.

    We have seen that birth weight and wealth are related to weight-for-age when thesimple bivariate analysis is conducted. But what happens if we analyse them together?

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    11/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 11

    Birth weight is highly related to wealth: infants born to poorer households are likely

    to be lighter than infants born to richer households. So is the relationship between

    wealth and weight-for-age only due to the relationship with birth weight those of a

    lighter birth weight are likely to remain below the norm throughout childhood.

    To test this we enter the variables into the model together.

    1. Go toAnalyze | Regression | Linear. The previous regression variables

    will still be contained in the different boxes.

    2. Click on Birth Weight and place it into the Independent(s)box, alongside

    the wealth quintile dummy variables.

    3. ClickOK. The final table in the output is copied below.

    Coefficientsa

    Unstandardized CoefficientsStandardizedCoefficients

    Model B Std. Error Beta t Sig.

    (Constant) -119.658 18.412 -6.499 .000

    Dummy variable for poorest

    wealth quintile-83.830 14.066 -.220 -5.960 .000

    Dummy variable for poorerwealth quintile

    -37.202 12.418 -.115 -2.996 .003

    Dummy variable for middlewealth quintile

    -42.243 12.494 -.130 -3.381 .001

    Dummy variable for richerwealth quintile

    -39.684 11.140 -.138 -3.562 .000

    1

    Birth weight (kilos - 3 dec.) .018 .005 .120 3.491 .001

    a. Dependent Variable: Wt/A Standard deviations

    The results have changed! Partly this is due to there being a different sample

    being used (only those with a birth weight AND a wealth quintile are included

    in the analysis) but it is also due to having both variables in the model at one

    time.

    All the variables are significant in the model still, although after taking

    account of birth weight the difference between richest and poorest actually

    increases. This shows that even though birth weight is significantly related to

    weight-for-age, there is a very large effect of wealth after the birth on weight-for-age.

  • 7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation

    12/12

    UNICEF Workshop on Global Study18th to 28thAugust 2008

    Centre for Global Health, Population, Poverty and Policy (GHP3) 12

    4. The analysis can be extended to include other variables, such as Type of

    Place of Residence, Educational Level and Place of Delivery.

    However, all of these are categorical variables, so remember to categorise

    these as dummy variables first!

    Exercises

    1. Conduct multiple linear regression on Weight-for-age Standard deviations,

    including as explanatory variables birth weight, wealth index, urban/rural and

    highest educational level of the parent

    2. Conduct multiple linear regression on Weight-for-Height, using the same

    variables as in Exercise 1. Are there any obvious differences that you can see?

    What is the relationship between wealth and weight-for-height after

    controlling for the other variables?