kofi nyanteng cleaning and screning data using spss

6
Yacar-Yacara Consults – 2015 +2330501377013 YACAR-YACARA CONSULTS [email protected] Screening and Cleaning Data Before you start to analyse your data it is essential that you check your data set for errors. It is very easy to make mistakes when entering data and unfortunately some errors can completely mess up the entire analyses. For example, entering 35 when you mean to enter 3 can distort the results of the analysis. The data screening process involves a number of steps: Step 1: Checking for errors. First, you need to check each of your variables for scores that are out of range (i.e. not within the range of possible scores). Step 2: Finding the error in the data file. Second, you need to find where in the data file this error occurred (i.e. which case is involved). Step 3: Correcting the error in the data file. Finally, you need to correct the error in the data file itself. Step 1: Checking for errors When checking for errors you are primarily looking for values that fall outside the range of possible values for a variable. For example, if sex is coded 1=male, 2=female, you should not find any scores other than 1 or 2 for this variable. Scores that fall outside the possible range can distort your statistical analyses so it is very important that all these errors are corrected before you start. To check for errors you will need to inspect the frequencies for each of your variables. This includes all of the individual items that make up the scales. Errors must be corrected before total scores for these scales are calculated.

Upload: kofi-kyeremateng-nyanteng

Post on 08-Jan-2017

45 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Kofi nyanteng   cleaning and screning data using spss

Yacar-Yacara Consults – 2015 +2330501377013

YACAR-YACARA CONSULTS [email protected]

Screening and Cleaning Data

Before you start to analyse your data it is essential that you check your data set for errors. It is

very easy to make mistakes when entering data and unfortunately some errors can completely

mess up the entire analyses. For example, entering 35 when you mean to enter 3 can distort

the results of the analysis.

The data screening process involves a number of steps:

Step 1: Checking for errors. First, you need to check each of your variables for scores that are

out of range (i.e. not within the range of possible scores).

Step 2: Finding the error in the data file. Second, you need to find where in the data file this

error occurred (i.e. which case is involved).

Step 3: Correcting the error in the data file. Finally, you need to correct the error in the data

file itself.

Step 1: Checking for errors

When checking for errors you are primarily looking for values that fall outside the range of

possible values for a variable. For example, if sex is coded 1=male, 2=female, you should not

find any scores other than 1 or 2 for this variable.

Scores that fall outside the possible range can distort your statistical analyses so it is very

important that all these errors are corrected before you start.

To check for errors you will need to inspect the frequencies for each of your variables. This

includes all of the individual items that make up the scales. Errors must be corrected before

total scores for these scales are calculated.

Page 2: Kofi nyanteng   cleaning and screning data using spss

Yacar-Yacara Consults – 2015 +2330501377013

Checking categorical variables

From the main menu at the top of the screen click on:

Analyze

Descriptive Statistics, then

Frequencies

Choose the variables that you wish to check (e.g. sex, marital, educ.).

Click on the arrow button to move these into the variable box.

Click on the Statistics button. Tick Minimum and Maximum in the Dispersion

section.

Click on Continue and then on OK

There are two parts to the output. The first table provides a summary of each of the variables

you requested. The remaining tables give you a break-down, for each variable, of the range of

responses (these are listed using the value label, rather than the code number that was used).

Check your minimum and maximum values to see whether they are within the range of

possible scores on that variable? Check this against your codebook.

Check the number of valid cases and missing cases to see if there are missing cases.

If this occurs, open your Data Editor window, move down to the empty case row, click in the

shaded area where the case number appears and press Delete on your keypad.

Rerun the Frequencies procedure again to get the correct values.

Checking continuous variables

From the menu at the top of the screen click on

Analyze,

Descriptive statistics,

Descriptives

Click on the variables that you wish to check. Click on the arrow button to move them

into the Variables box (e.g. age).

Click on the Options button. You can ask for a range of statistics, the main ones at

this stage are mean, standard deviation, minimum and maximum. Click on the

statistics you wish to generate.

Click on Continue, and then on OK.

Page 3: Kofi nyanteng   cleaning and screning data using spss

Yacar-Yacara Consults – 2015 +2330501377013

Check the minimum and maximum values. Do these make sense? Does the mean score also

make sense?

Step 2: Finding and correcting the error in the data file

How can you find out where the mistake is in your data set?

Make sure that the Data Editor window is open and on the screen in front of you with

the data showing.

Click on the variable name of the variable in which the error has occurred (e.g. sex).

Click once to highlight the column.

Click on Edit from the menu across the top of the screen. Click on Find

In the Search for box, type in the incorrect value that you are looking for (e.g. 3).

Click on Find next. SPSS will scan through the file and will stop at the first

occurrence of the value that you specified. Take note of the ID number of this case

(from the first row). You will need this to check your records or questionnaires to find

out what the value should be.

Click on Search Forward again to continue searching for other cases with the same

incorrect value. You may need to do this a number of times before you reach the end

of the data set.

To correct, place the cursor in the cell, make sure that it is highlighted and then just

type in the correct value.

Page 4: Kofi nyanteng   cleaning and screning data using spss

Yacar-Yacara Consults – 2015 +2330501377013

Descriptive Statistics

Descriptive statistics have a number of uses. These include:

i. To describe the characteristics of your sample in the Method section of your report;

ii. To check your variables for any violation of the assumptions underlying the statistical

techniques that you will use to address your research questions; and

iii. To address specific research questions.

Tools

Frequencies

Descriptives

Explore

Cross tabulation

Ratio

Frequencies

Descriptive Statistics

Frequencies (mostly for categorical variables)

Analyze, then click on

Descriptive Statistics,

Frequencies.

Select the categorical variables you are interested in (e.g. sex) and move them into the

Variables box.

Click on the Statistics button. In the Dispersion section tick minimum and maximum.

Click on Continue and then OK.

Descriptive (mostly for continuous variables)

Analyze, then click on

Descriptive Statistics,

Descriptives.

Select all the continuous variables that you wish to obtain descriptive statistics

Click on the arrow button to move them into the Variables box (e.g. age).

Click on the Options button.

Click on mean, standard deviation, minimum, maximum,

Click on Continue, and then OK

Page 5: Kofi nyanteng   cleaning and screning data using spss

Yacar-Yacara Consults – 2015 +2330501377013

Explore

It gives a detailed descriptive analysis of the variables. It presents a number of information

about the variables.

It provides results for:

Mean

Median

Variance

Percentile

Kurtosis

Skewness

Etc..

Analyze, then click on

Descriptive Statistics,

Explore

Select all the variables that you wish to explore

Click on the arrow button to move them into the Variables box (e.g. age).

Click on the Options button.

Click on all the descriptive tools you want to explore ,

Click on Continue, and then OK

Cross Tabulation

Cross tabulation is used to find the relationship between categorical variables. It gives a

descriptive presentation of the relationship between variables.

Analyze, then click on

Descriptive Statistics,

Cross tabs.

Select the categorical variables you are interested in (e.g. sex) and move them into the

Variables boxes.

Click on the Statistics button. In the Dispersion section tick Minimum and

Maximum.

Click on Continue and then OK.

Page 6: Kofi nyanteng   cleaning and screning data using spss

Yacar-Yacara Consults – 2015 +2330501377013

Kofi Kyeremateng Nyanteng

YACAR-YACARA CONSULTS

+233243637096

+233501377013