data construction and data analysis for survey research
TRANSCRIPT
Data Construction and Data Analysis for Survey Research
Data Construction and Data Analysis
for Survey Research
RAYMOND KENT
* © Raymond Kent 2001
All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission.
No paragraph of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London WIT 4LP.
Any person who does any unauthorised act in relation to this publication may be liable to criminal prosecution and civil claims for damages.
The author has asserted his right to be identified as the author of this work in accordance with the Copyright, Designs and Patents Act 1988
First published 2001 by PALGRA VE MACMILLAN Houndmills, Basingstoke, Hampshire RG21 6XS and 175 Fifth Avenue, New York, N.Y. 10010 Companies and representatives throughout the world
PALGRAVE MACMILLAN is the global academic imprint of the Palgrave Macmillan division of St. Martin's Press, LLC and of Palgrave Macmillan Ltd. Macmillan is a registered trademark in the United States, United Kingdom and other countries. Palgrave is a registered trademark in the European Union and other countries.
ISBN 978-0-333-76306-3 ISBN 978-1-137-08944-1 (eBook) DOI 10.1007/978-1-137-08944-1
This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources.
A catalogue record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication Data Kent, Raymond A.
Data construction and data analysis for survey research / Raymond Kent. p.cm
Includes bibliographical references and index. ISBN 978-0-333-76306-3 1. Sociology-Statistical methods. 2. Sociology- Data processing. 3. Social
sciences-Research. 4. Social surveys-Statistical methods. 5. Market surveys-Statistical methods. I. Title. HM535.K45 2001 301'.07'27- dc21 2001036165
10 9 8 7 6 5 4 3 2 10 09 08 07 06 05 04 03
Copy-edited and typeset by Povey-Edmondson Tavistock and Rochdale, England
Contents
List of Tables List of Figures Preface
1 Introduction Learning objectives The scope of this book The structure of the book How to study this book The survey in market and social research Survey objectives Types of survey When to use (and when not to use) surveys Summary Further reading References
part I CONSTRUCTING SURVEY DATA: GETTING GOOD QUALITY DATA
Introduction to Part 1
2 Designing the Data Matrix Learning objectives Introduction What are 'data'? The process of data construction The format of the data matrix The design of the data matrix
The specification, number and selection of respondents The variables The process of measurement Scaling
The quality of data Background discussion: the current 'theory' of measurement Summary Exercises Points for discussion Further reading References
v
xi
xiii
XV
1 1 1 3 4 5 7 9
10 11 11 12
14
15 15 15 15 20 20 23 23 25 28 36 44 45 48 48 50 50 50
vi - Contents
3 Filling the Data Matrix Learning objectives Introduction
51 51 51 51 52 52 54 56 59 60 60 61 61 63 64 68 68 69 70 70 70 71
Survey design and execution Population specification Frame error Questionnaire design Non-response Response errors Interviewer errors Editing Coding Data entry
Using computer packages Entering data on SPSS Saving your work Introduction to the table tennis study Summary Exercises Points for discussion Further reading References
part II ANALYSING SURVEY DATA: CHOOSING THE RIGHT DATA ANALYSIS TECHNIQUES
Introduction to Part II 74 Analysis objectives 74 Scale type 75 The number of variables 76 Choosing data analysis teclmiques 77 Further reading 77
4 Tables and Charts for Categorical Variables 79 Learning objectives 79 Introduction 79 Univariate frequency tables 79 Bivariate crosstabulation 82 Three-way and n-way tables 84 Bar charts and pie charts 84 Using SPSS Frequencies, Graphs, Crosstabs and Recode procedures 88
Frequencies 88 Graphs and charts 90 ~~~ ~ Recode 92
Summary 94 Exercises 95
Contents - vii
Points for discussion 95 Further reading %
5 Tables and Charts for Interval Variables 97 Learning objectives 97 Introduction 97 Frequency tables for interval data 98 Metric tables 99 Histograms and line graphs 99 Scattergrams 102 Using SPSS Histogram, Line and Scatter 103 Summary 106 Exercises 106 Points for discussion 107 Further reading 107
6 Summarising Categorical Variables 108 Learning objectives 108 Introduction 108 Univariate data summary 108 Bivariate data summary 109 Rank correlation 112 Using SPSS Crosstabs:Statistics 113 Summary 114 Exercises 114 Points for discussion 115 Further reading 115
7 Summarising Interval Variables 116 Learning objectives 116 Introduction 116 Univariate data summary 116
Central tendency 117 Dispersion 118 Distribution shape 119 Percentile values 122
Bivariate data summary 123 Correlation and regression 125 Spearman's rho 126
Multivariate procedures 128 Multiple regression 129 Factor analysis 129 Cluster analysis 131 Multidimensional scaling 132
UsingSPSS 133 Summary 136 Exercises 136 Points for discussion 137 Further reading 137
viii -Contents
8 Sampling and the Concept of Error 138 Learning objectives 138 Introduction 138 When we need to take samples 138 Sample selection 139 Sample design 141 Sampling in practice 144 Sampling errors 148
Systematic error 148 Random error 149
The concept of error 150 Total survey error 151 Controlling error 152 Summary 154 Exercises 154 Points for discussion 155 Further reading 155 References 155
9 Making Inferences from Samples: Categorical Variables 156 Learning objectives 156 Introduction 156 Estimation 157 Testing hypothese for statistical significance 160
Univariate hypotheses 160 Bivariate hypotheses 163 Multivariate hypotheses 164
Statistical inference and bivariate data summaries 164 Using SPSS 166 Summary 167 Exercises 168 Points for discussion 169 Further reading 169 References 169
10 Making Inferences from Samples: Interval Variables 170 Learning objectives 170 Introduction 170 Estimation 170 Testing the null hypothesis 171
Univariate hypotheses 171 Bivariate hypotheses 174 Comparing means: the analysis of variance 174 The statistical significance of correlation 178
The significance test controversy 178 Using SPSS 181 Summary 182 Exercises 183 Points for discussion 183 Further reading 183
Contents - ix
11 Evaluating Hypotheses and Explaining Relationships Learning objectives Introduction What is an 'hypothesis'? Should hypotheses be stated in advance of undertaking the
research? Evaluating hypotheses Analysing and explaining relationships between variables
What is an 'explanation'? Causal analysis Providing understanding IJialectical anal)fsis
Summary Exercises Points for discussion Further reading
part Ill ANALYSING SURVEY DATA: KNOWING HOW TO HANDLE YOUR DATA
184 184 184 184
186 187 188 188 189 192 192 193 193 194 194
Introduction to Part Ill 196
12 Handling Your Data Matrix 197 Learning objectives 197 Introduction 197 Upgrading and downgrading scales 198 IJata dredging 200 How many cases are needed and what size of sample should
be taken? 201 Strategies for coping with relatively few cases 203 Strategies for analysing summated rating scales 203
The reliabilit)f and validit)f of summated ratings 208 What do you do with 'don't know's? 211 Missing values 212 Handling multiple response items 214 Can you use statistical inference on non-random samples? 216 Using SPSS 218
Using Compute 218 Using Basic Tables 219 Using Define Variable/Missing Values 219 Using Multiple Response 220 Using Reliability Analysis 220
Background discussion: the interpretation of Cronbach' s coefficient alpha 221
Summary 222 Exercises 225 Points for discussion 225 Further reading 226 References 226
x - Contents
13 Analysing Open-ended Questions Learning objectives Introduction Treating responses as qualitative data Coding the data Open versus closed questions Summary Exercises Points for discussion Further reading
Appendix 1: The Table Tennis Questionnaire
Appendix 2: Using Pinpoint
Appendix 3: SPSS Release 10.0
Glossary References
Index
227 227 227 228 229 231 232 233 233 234
235
238 241 243
247 249
List of Tables
2.1 Adult press readership by title 16 2.2 Students by identification number 42 2.3 Respondents by sex 42 2.4 Respondents by ethnic group 42 2.5 Respondents' agreement with the statement: 'This is a first class
service' 43 2.6 Ranking of 6 students by performance in Maths and English 43 2.7 Household size 43 2.8 Age distribution of respondents 43 2.9 Some results from a survey 49 3.1 Errors in data entry 62 4.1 A frequency table for a binary variable 80 4.2 A frequency table of ordinal data 80 4.3 A multi-variable frequency table: respondents by sex, social class
and age 81 4.4 Table 4.2 regrouped 81 4.5 A crosstabulation of other household players by sex of
respondent 82 4.6 A crosstabulation of 'where table tennis was first played' by 'age' 83 4.7 Table 4.5 expressed as column percents 83 4.8 Age began playing table tennis and other household players
layered by whether or not they were encouraged to take up the sport 84
4.9 An SPSS 'Frequencies' output 90 4.10 'Play frequency' by whether anybody else in the household plays 91 4.11 Column percentages: 'how many times played per week' by
whether anybody else in the household plays 92 5.1 A frequency table for age began playing table tennis 98 5.2 Table 5.1 regrouped into two categories 99 5.3 Age groups by sex of respondent 99 6.1 A univariate table for categorical variables 109 6.2 How many times per week respondents play crosstabulated by
whether anybody else in the household plays 110 6.3 Chi-square and Cramer's V for Table 6.2 111 6.4 SPSS output for Phi and Cramer's V 114 6.5 SPSS Chi-square output 114 7.1 Measures of central tendency for 'age began' 117 7.2 Measures of dispersion for 'age began' 118 7.3 A frequency distribution for 'age began' 119 7.4 SPSS measures of distribution shape 120
xi
xii -List of Tables
7.5 Percentile values for 'agebegan' 122 7.6 Scores of A-D on two tests 124 7.7 The calculation of r from Table 7.6 124 7.8 SPSS regression coefficients 125 7.9 SPSS Pearson correlation output 127 7.10 SPSS Spearman's rho output 127 7.11 Multiple regression. 'Spend' regressed on 'Enjoyment', 'Social
benefits', 'Competition' and 'Health and fitness' 129 7.12 SPSS model constants for multiple regression 130 7.13 A correlation matrix 130 7.14 Factor loading on two factors 131 7.15 Similarity rankings of six multiples 133 7.16 SPSS regression output- variables entered 134 7.17 SPSS regression output- model summary 135 7.18 SPSS coefficients 135 9.1 Critical values of Chi-square 162 9.2 Importance of social benefits by age groups 163 9.3 Division played in by whether anybody else in the
household plays 166 9.4 A goodness-of-fit test using Chi-square 167
10.1 Mean importance of social benefits by presence of other household players 175
10.2 ANOV A of perceived importance of benefits by whether anybody else in the household plays table tennis 176
10.3 Using SPSS 'Explore' to generate confidence intervals 181 11.1 Type of wine consumed by income 190 11.2 Wine consumed by income, controlling for age 191 11.3 Wine consumed by income, controlling for social class 191 12.1 Percentage sampling errors: maximum variability at 50:50 202 12.2 Satisfaction with various elements of playing table tennis 204 12.3 Mean satisfaction score for elements of playing table tennis 205 12.4 Total satisfaction scores by sex of player 205 12.5 Satisfaction with practice facilities by whether anybody else in the
household plays table tennis 206 12.6 Satisfaction with practice facilities by whether anybody else in the
household plays table tennis 207 12.7 Satisfaction with practice facilities by whether anybody else in the
household plays, excluding 'neither' category 207 12.8 Analysis of missing values: univariate 213 12.9 Analysis of missing values: bivariate 214 12.10 A multiple response question 215 12.11 Crosstabulating a multiple response by another variable 215 13.1 Social grade and socio-economic classification 230 13.2 Analysis of an open-ended question 231
List of Figures
2.1 An open-ended question 18 2.2 A data matrix on SPSS 19 2.3 A survey data matrix 21 2.4 A coded data matrix 21 2.5 A coded single-answer question 21 2.6 A multiple-response question 21 2.7 The problem of measurement 30 2.8 A summate rating scale - customer satisfaction with service
provided 32 2.9 A Likert measurement 33 2.10 A semantic differential 35 2.11 A snake diagram 35 2.12 Summary of scale types 40 3.1 Telephone survey contact outcomes 56 3.2 The 'Data Editor' window 65 3.3 The completed data matrix 65 3.4 The 'Define Variable' dialog box 66 ii.1 Factors determining choice of technique 78 4.1 Bar chart: the age distribution of players 85 4.2 A horizontal bar chart: importance of perceived social benefits 86 4.3 A stacked bar chart: importance of perceived social benefits by
sex of player 86 4.4 A clustered bar chart: importance of perceived social benefits by
sex of player 87 4.5 Importance of aspects of playing table tennis 87 4.6 The social benefits of playing table tennis by individual case 87 4.7 A pie chart 88 4.8 Summaries of separate variable as a pie chart 88 4.9 The SPSS 'Frequencies' dialog box 89 4.10 The 'Crosstabs' dialog box 91 4.11 The 'Recode into Different Variables' dialog box 93 4.12 'Old and New Values' 93 4.13 The new recoded variable 94 5.1 A bar chart for continuous interval data 100 5.2 A histogram for continuous interval data 100 5.3 A bar chart for discrete interval data 101 5.4 A histogram for discrete interval data 101 5.5 A line graph of Figure 5.1 102 5.6 A scattergram of 'spend' by 'agebegan' 102 5.7 A histogram of 'age began' 103
xiii
xiv - List of Figures
5.8 5.9 5.10 5.11 5.12 6.1 7.1 7.2 7.3 7.4 7.5 7.6 7.7 9.1
10.1 10.2 11.1 12.1 12.2 12.3
A3.1 A3.2
A line graph of 'agebegan' The importance of various aspects of playing table tennis 'Agebegan' plotted by sex of respondent A scatterplot with separate markers for males and females A matrix scatterplot with three variables The 'Crosstabs: Statistics' dialog box The dishibution of 'agebegan' Three normal distributions with differing parameters The normal distribution Scattergram of X on Y A histogram for 'spend' A multi-dimensional map based on ranking in Table 7.15 The SPSS 'Frequencies: Statistics' dialog box A sampling dishibution of sample size n Critical regions: two-tail test Critical region: one-tail test A 'spurious' relationship A scattergram of 1 agebegan' by total satisfaction score A scatter gram of 1 spend' by total satisfaction score The 'Compute Variable' dialog box SPSS Release 10.0 'Variable View' Completed 'Variable View' for Figure 3.3
104 104 105 105 106 113 120 121 122 124 128 132 134 158 172 173 191 205 206 219 241 242
Preface
The idea for this book was occasioned by an uncounted, but certainly large, number of students (and in some cases colleagues) who over the years have come to see me and asked, 'OK, so now I've collected all my data, what do I do next?' or 'I've looked at a lot of statistics books, but I don't know which statistics to use to analyse my data' or 'I've used a set of 5-point rating scales, but how do I analyse the results?' or 'I've run off all these tables, but I've no idea how to make sense of them' or 'Do I add the neutral category to the agree or to the disagree group?' or 'What do I do with the don't knows?' or 'My sample is not really a random one, do I have to calculate tests of significance?' or 'Can I use a spreadsheet to analyse my survey data?'
This book is addressed to all these students - and to their tutors and supervisors who will no doubt appreciate having some literature that they can recommend. It should assist tutors and lecturers to give students some good advice. It might also help market and social researchers who have sought in vain for help on the 'nitty gritty' of analysing data from surveys they have undertaken.
The present structure of the book owes a lot to six anonymous reviewers who made many useful suggestions (although one of them did wonder whether I had recently had a disagreement with a statistician, given some of the jibes I had made at their expense in earlier drafts). Although I have toned down some of the comments, I could not bring myself to expunge them all, so I expect some flak from that quarter. H any of the reviewers read the final version of this book, I hope they will recognise that many of their comments have been taken on board. Needless to say, the reviewers did not all express the same viewpoint, so it was not possible to incorporate all their suggestions.
University of Stirling RAYMOND KENT
The author and publishers are grateful for pennission given by SPSS, St Andrews House, West Street, SU1Tey, to produce screen shots of a range of SPSS windows. This book is not sponsored or approved by SPSS and any errors are in no way the responsibility of SPSS.
XV