Cleaning your data - wps. ?· Cleaning your data Using SPSS to clean your data The data cleaning process…

Download Cleaning your data - wps. ?· Cleaning your data Using SPSS to clean your data The data cleaning process…

Post on 19-Jul-2018

212 views

Category:

Documents

0 download

TRANSCRIPT

<ul><li><p>Cleaning your data </p><p>Using SPSS to clean your data</p><p>The data cleaning process demands careful consideration, as it will significantly affect the final statistical results. The entire process is guided by the preliminary plan of data analysis, which was formulated in the research design phase.</p><p>Cleaning the data requires consistency checks and treatment of missing responses, generally done through SPSS. Consistency checks serve to identify the data, which are out of range, logically inconsistent or have extreme values. The missing responses are treated carefully to minimise their adverse effects by assigning a suitable value (neutral or imputed) or discarding them methodically (case wise or pair wise deletion). Missing responses pose problems if their proportion to the total is significant (more than 10 percent). </p><p>Using SPSS to clean your data </p><p>Click on the SPSS icon and open up SPSS You will notice that there are two views, variable view and data view. Data view is generally used </p><p>when you are analysing your data.Click on the tab data view</p><p> At the top of the page you will notice, file, edit, data, transform, analyse, graphs, utilities, add-ons, windows and help: If you ever get stuck just click on help.Click on analyseClick on descriptive statisticsClick on descriptiveExcept for the variable case move all of the variables into the right hand box labelled variablesClick on the options buttonClick on mean standard deviation minimum maximum kurtosis and skewnessClick continueClick OK.</p><p>Now you will receive some output. </p><p>Reading your descriptive statistic output.The first column is the label or survey question. </p><p>The second column shows the number of responses entered for that variable (n=216).The third and fourth column shows the minimum and maximum statistics. These are the numbers entered </p><p>in data view. You should check that the maximum and minimum values are within the range of the values on the survey. For instance, if the survey measured the question using a seven-point Likert scale the values should lie somewhere between 1 and 7 and values outside this range would indicate data that has been entered incorrectly. For instance a 22 or a 23 are values outside the range and these surveys need to be checked for the correct value (this is why we label the survey with a unique ID and put this into SPSS so it is easy to identify data entry errors and correct them).</p><p>The fifth, sixth, seventh and eighth columns are measures of central tendency, and provide information regarding normality.</p><p>Wilson_Cleaning.indd 1Wilson_Cleaning.indd 1 3/8/09 10:24:47 AM3/8/09 10:24:47 AM</p></li><li><p>Marketing Research: An Integrated Approach2</p><p>Descriptive Statistics</p><p>N Minimum Maximum MeanStd. </p><p>Deviation Skewness Kurtosis</p><p>Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error</p><p>When considering the various sport shoe brands on the market, Sporty U is the one I am most likely to buy 216 1 7 4.11 1.240 .204 .166 .280 .330</p><p>Yes, I am likely to buy this brand of sport shoe in the future 216 1 7 3.87 1.168 .444 .166 .706 .330</p><p>I am more likely to buy Sporty U than any other brand of sport shoe 216 2.00 7.00 4.7546 1.25382 .626 .166 .360 .330</p><p>Valid N (listwise) 216</p><p>Using SPSS to clean your data continued </p><p>Click on analyseClick on descriptive statisticsClick on frequencyExcept for the variable case move all of the variables into the right hand box labelled variablesClick on the statistics buttonYou will see measures of central tendency Click on mean median and mode </p><p> You will see measures of dispersionClick on standard deviation and variance </p><p> You will see measures of distributionClick on skewness and kurtosisClick continueClick on chartsClick on histogram with normal curve Click continueClick OK.</p><p>Now you will receive some output. </p><p>Reading your frequency statistic output.You will receive a range of output. The first relates to the statistics for the variables. </p><p>The first column will show you the statistic and the subsequent columns will show you the variables. This view provides you with a different view and shows you missing data, measures of central tendency, dispersion and distribution.</p><p>Wilson_Cleaning.indd 2Wilson_Cleaning.indd 2 3/8/09 10:24:47 AM3/8/09 10:24:47 AM</p></li><li><p>Cleaning your data 3</p><p>Statistics</p><p>Yes, I am likely to buy this brand of sport shoe in the </p><p>future </p><p>When considering the various sport shoe brands on the market, Sporty U </p><p>is the one I am most likely to buy</p><p>I am more likely to buy Sporty U than any other </p><p>brand of sport shoe</p><p>N Valid 216 216 216</p><p>Missing 0 0 0</p><p>Mean 3.87 4.11 4.7546</p><p>Median 4.00 4.00 5.0000</p><p>Mode 4 4 5.00</p><p>Std. Deviation 1.168 1.240 1.25382</p><p>Variance 1.363 1.537 1.572</p><p>Skewness .444 .204 .626</p><p>Std. Error of Skewness .166 .166 .166</p><p>Kurtosis .706 .280 .360</p><p>Std. Error of Kurtosis .330 .330 .330</p><p>The next table you will receive relates to the frequency of each variable.There will be a frequency table for each variable. When reading the frequency table you will notice that the label of the variable is displayed above the frequency table. The first column on the frequency table shows the value labels. The second column represents the frequency or count of responses that corresponds with each value label. The third, fourth and fifth columns relate to percent. The third column is the percent, the fourth takes into account the missing data (if there is any) and reflects the percentage if there were no missing data. If there is no missing data the percent column and the valid percent column will be the same. The final percentage column is the cumulative percent which adds up the percentage cumulatively.</p><p>When considering the various sport shoe brands on the market, Sporty U is the one I am most likely to buy</p><p>Frequency Percent Valid PercentCumulative </p><p>Percent</p><p>Valid Very strongly disagree with 6 2.8 2.8 2.8</p><p>Strongly disagree with 18 8.3 8.3 11.1</p><p>Disagree with 29 13.4 13.4 24.5</p><p>Neither disagree or agree with 87 40.3 40.3 64.8</p><p>Agree with 52 24.1 24.1 88.9</p><p>Strongly agree with 18 8.3 8.3 97.2</p><p>Very strongly agree with 6 2.8 2.8 100.0</p><p>Total 216 100.0 100.0</p><p>The next piece of output received is the visual of the data and shows the histogram with a normal distribu-tion curve overlayed. This visual will also give the mean, standard deviation and the number of participants that responded.</p><p>Wilson_Cleaning.indd 3Wilson_Cleaning.indd 3 3/8/09 10:24:48 AM3/8/09 10:24:48 AM</p></li><li><p>Marketing Research: An Integrated Approach4</p><p>From the frequency information you should be able to identify if there is missing data and the extent to which data is missing. You should also be able to identify any variables that have been entered incorrectly by looking at the value labels and you should also start to get a feel for you data and whether it is normally distributed or not by looking at the Skew and Kurtosis information as well as the visual inspection.</p><p>Wilson_Cleaning.indd 4Wilson_Cleaning.indd 4 3/8/09 10:24:48 AM3/8/09 10:24:48 AM</p></li></ul>

Recommended

View more >