Transcript
Page 1: Cleaning your data - wps.pearsoned.com.auwps.pearsoned.com.au/wps/media/objects/8264/8462810/Wilson... · Cleaning your data Using SPSS to clean your data The data cleaning process

Cleaning your data

Using SPSS to clean your data

The data cleaning process demands careful consideration, as it will significantly affect the final statistical results. The entire process is guided by the preliminary plan of data analysis, which was formulated in the research design phase.

Cleaning the data requires consistency checks and treatment of missing responses, generally done through SPSS. Consistency checks serve to identify the data, which are out of range, logically inconsistent or have extreme values. The missing responses are treated carefully to minimise their adverse effects by assigning a suitable value (neutral or imputed) or discarding them methodically (case wise or pair wise deletion). Missing responses pose problems if their proportion to the total is significant (more than 10 percent).

Using SPSS to clean your data

Click• on the SPSS icon and open up SPSS You will notice that there are two views, “variable” view and “data” view. Data view is generally used

when you are analysing your data.Click• on the tab “data view”

At the top of the page you will notice, file, edit, data, transform, analyse, graphs, utilities, add-ons, windows and help: If you ever get stuck just click on help.Click • on “analyse”Click • on “descriptive statistics”Click • on “descriptive”Except for the variable “case” • move all of the variables into the right hand box labelled “variables”Click• on the “options” buttonClick• on “mean” “standard deviation” “minimum” maximum” “kurtosis” and “skewness”Click• continueClick• “OK”.

Now you will receive some output.

Reading your descriptive statistic output.The first column is the “label” or survey question.

The second column shows the number of responses entered for that variable (n=216).The third and fourth column shows the minimum and maximum statistics. These are the numbers entered

in data view. You should check that the maximum and minimum values are within the range of the values on the survey. For instance, if the survey measured the question using a seven-point Likert scale the values should lie somewhere between 1 and 7 and values outside this range would indicate data that has been entered incorrectly. For instance a 22 or a 23 are values outside the range and these surveys need to be checked for the correct value (this is why we label the survey with a unique ID and put this into SPSS so it is easy to identify data entry errors and correct them).

The fifth, sixth, seventh and eighth columns are measures of central tendency, and provide information regarding “normality”.

Wilson_Cleaning.indd 1Wilson_Cleaning.indd 1 3/8/09 10:24:47 AM3/8/09 10:24:47 AM

Page 2: Cleaning your data - wps.pearsoned.com.auwps.pearsoned.com.au/wps/media/objects/8264/8462810/Wilson... · Cleaning your data Using SPSS to clean your data The data cleaning process

Marketing Research: An Integrated Approach2

Descriptive Statistics

N Minimum Maximum MeanStd.

Deviation Skewness Kurtosis

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error

When considering the various sport shoe brands on the market, Sporty U is the one I am most likely to buy 216 1 7 4.11 1.240 –.204 .166 .280 .330

Yes, I am likely to buy this brand of sport shoe in the future 216 1 7 3.87 1.168 –.444 .166 .706 .330

I am more likely to buy Sporty U than any other brand of sport shoe 216 2.00 7.00 4.7546 1.25382 –.626 .166 .360 .330

Valid N (listwise) 216

Using SPSS to clean your data continued

Click • on “analyse”Click • on “descriptive statistics”Click • on “frequency”Except for the variable “case” • move all of the variables into the right hand box labelled “variables”Click• on the “statistics” buttonYou will see measures of central tendency • Click• on “mean” “median” and “mode”

You will see measures of dispersionClick• on “standard deviation” and “variance”

You will see measures of distributionClick• on “skewness” and “kurtosis”Click• continueClick • on chartsClick • on histogram with normal curve Click • continueClick• “OK”.

Now you will receive some output.

Reading your frequency statistic output.You will receive a range of output. The first relates to the statistics for the variables.

The first column will show you the statistic and the subsequent columns will show you the variables. This view provides you with a different view and shows you missing data, measures of central tendency, dispersion and distribution.

Wilson_Cleaning.indd 2Wilson_Cleaning.indd 2 3/8/09 10:24:47 AM3/8/09 10:24:47 AM

Page 3: Cleaning your data - wps.pearsoned.com.auwps.pearsoned.com.au/wps/media/objects/8264/8462810/Wilson... · Cleaning your data Using SPSS to clean your data The data cleaning process

Cleaning your data 3

Statistics

Yes, I am likely to buy this brand of sport shoe in the

future

When considering the various sport shoe brands on the market, Sporty U

is the one I am most likely to buy

I am more likely to buy Sporty U than any other

brand of sport shoe

N Valid 216 216 216

Missing 0 0 0

Mean 3.87 4.11 4.7546

Median 4.00 4.00 5.0000

Mode 4 4 5.00

Std. Deviation 1.168 1.240 1.25382

Variance 1.363 1.537 1.572

Skewness –.444 –.204 –.626

Std. Error of Skewness .166 .166 .166

Kurtosis .706 .280 .360

Std. Error of Kurtosis .330 .330 .330

The next table you will receive relates to the frequency of each variable.There will be a frequency table for each variable. When reading the frequency table you will notice that the label of the variable is displayed above the frequency table. The first column on the frequency table shows the value labels. The second column represents the frequency or count of responses that corresponds with each value label. The third, fourth and fifth columns relate to percent. The third column is the percent, the fourth takes into account the missing data (if there is any) and reflects the percentage if there were no missing data. If there is no missing data the percent column and the valid percent column will be the same. The final percentage column is the cumulative percent which adds up the percentage cumulatively.

When considering the various sport shoe brands on the market, Sporty U is the one I am most likely to buy

Frequency Percent Valid PercentCumulative

Percent

Valid Very strongly disagree with 6 2.8 2.8 2.8

Strongly disagree with 18 8.3 8.3 11.1

Disagree with 29 13.4 13.4 24.5

Neither disagree or agree with 87 40.3 40.3 64.8

Agree with 52 24.1 24.1 88.9

Strongly agree with 18 8.3 8.3 97.2

Very strongly agree with 6 2.8 2.8 100.0

Total 216 100.0 100.0

The next piece of output received is the visual of the data and shows the histogram with a normal distribu-tion curve overlayed. This visual will also give the mean, standard deviation and the number of participants that responded.

Wilson_Cleaning.indd 3Wilson_Cleaning.indd 3 3/8/09 10:24:48 AM3/8/09 10:24:48 AM

Page 4: Cleaning your data - wps.pearsoned.com.auwps.pearsoned.com.au/wps/media/objects/8264/8462810/Wilson... · Cleaning your data Using SPSS to clean your data The data cleaning process

Marketing Research: An Integrated Approach4

From the frequency information you should be able to identify if there is “missing data” and the extent to which data is missing. You should also be able to identify any variables that have been entered incorrectly by looking at the “value labels” and you should also start to get a “feel” for you data and whether it is normally distributed or not by looking at the Skew and Kurtosis information as well as the visual inspection.

Wilson_Cleaning.indd 4Wilson_Cleaning.indd 4 3/8/09 10:24:48 AM3/8/09 10:24:48 AM


Top Related