1 introduction to spss data types and spss data entry and analysis

76
1 Introduction to SPSS Data types and SPSS data entry and analysis

Upload: wesley-small

Post on 22-Dec-2015

250 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: 1 Introduction to SPSS Data types and SPSS data entry and analysis

1

Introduction to SPSS

Data types and SPSSdata entry and analysis

Page 2: 1 Introduction to SPSS Data types and SPSS data entry and analysis

2

In this session

What does SPSS look like? Types of data (revision) Data Entry in SPSS Simple charts in SPSS Summary statistics Contingency tables and crosstabulations Scatterplots and correlations Tests of differences of means

Page 3: 1 Introduction to SPSS Data types and SPSS data entry and analysis

3

SPSS/PASW

Page 4: 1 Introduction to SPSS Data types and SPSS data entry and analysis

4

Aspects of SPSS

Menus - Analyse and Charts esp. Spreadsheet view of data

Rows are cases (people, respondents etc.) Columns are Variables

Variable view of data Shows detail of each variable type

Page 5: 1 Introduction to SPSS Data types and SPSS data entry and analysis

5

Questionnaire Data Coding

Page 6: 1 Introduction to SPSS Data types and SPSS data entry and analysis

6

In SPSS

We change ticks etc. on a questionnaire into numbers

One number for each variable for each case How we do this depends on the type of

variable/data

Page 7: 1 Introduction to SPSS Data types and SPSS data entry and analysis

7

Types of data

Nominal Ranked Scales/measures Mixed types Text answers (open ended questions)

Page 8: 1 Introduction to SPSS Data types and SPSS data entry and analysis

8

Nominal (categorical)

order is arbitrary e.g. sex, country of birth, personality type, yes or no. Use numeric in SPSS and give value labels.

(e.g. 1=Female, 2=Male, 99=Missing)

(e.g. 1=Yes, 2=No, 99=Missing)

(e.g. 1=UK, 2=Ireland, 3=Pakistan, 4=India, 5=other, 99=Missing)

Page 9: 1 Introduction to SPSS Data types and SPSS data entry and analysis

9

Ranks or Ordinal

in order, 1st, 2nd, 3rd etc. e.g. status, social class Use numeric in SPSS with value labels

E.g. 1=Working class, 2=Middle class, 3=Upper class

E.g. Class of degree, 1=First, 2=Upper second, 3=Lower second, 4=Third, 5=Ordinary, 99=Missing

Page 10: 1 Introduction to SPSS Data types and SPSS data entry and analysis

10

Measures, scales

1. Interval - equal units e.g. IQ

2. Ratio - equal units, zero on scale e.g. height, income, family size, age Makes sense to say one value is twice another

Use numeric (or comma, dot or scientific) in SPSS

E.g. family size, 1, 2, 3, 4 etc. E.g. income per year, 25000, 14500, 18650 etc.

Page 11: 1 Introduction to SPSS Data types and SPSS data entry and analysis

11

Mixed type

Categorised data Actually ranked, but used to identify

categories or groups e.g. age groups = ratio data put into groups

Use numeric in SPSS and use value labels. E.g. Age group, 1=‘Under 18’, 2=‘18-24’, 3=‘25-

34’, 4=‘35-44’, 5=‘45-54’, 6=‘55 or greater’

Page 12: 1 Introduction to SPSS Data types and SPSS data entry and analysis

12

Text answers

E.g. answers to open-ended questions Either enter text as given (Use String in SPSS) Or Code or classify answers into one of a small number

types. (Use numeric/nominal in SPSS)

Page 13: 1 Introduction to SPSS Data types and SPSS data entry and analysis

13

Data Entry in SPSS

Video by Andy Field

Page 14: 1 Introduction to SPSS Data types and SPSS data entry and analysis

14

Frequency counts

Used with categorical and ranked variables e.g. gender of students taking Health and

Illness option

Page 15: 1 Introduction to SPSS Data types and SPSS data entry and analysis

15

e.g. Number of GCSEs passed by students taking Health and Illness option

Page 16: 1 Introduction to SPSS Data types and SPSS data entry and analysis

16

Central Tendency

Mean = average value sum of all the values divided by the number of values

Mode = the most frequent value in a distribution (N.B. it is possible to have 2 or more modes, e.g. bimodal

distribution) Median

= the half-way value, or the value that divides the ordered distribution in the middle

The middle score when scores are ordered N.B. need to put values into order first

Page 17: 1 Introduction to SPSS Data types and SPSS data entry and analysis

17

Dispersion and variability

Quartiles The three values that split the sorted data into

four equal parts. Second Quartile = median. Lower quartile = median of lower half of the data Upper quartile = median of upper half of the data Need to order the individuals first One quarter of the individuals are in each inter-

quartile range

Page 18: 1 Introduction to SPSS Data types and SPSS data entry and analysis

18

Used on Box Plot

Upper quartile

Lower quartile

Median

Age of Health and Illness students

Page 19: 1 Introduction to SPSS Data types and SPSS data entry and analysis

19

Variance Average deviation from the mean, squared

5.20 is the Sum of Squares This depends on number of individuals so we divide by n (5) Gives 1.04 which is the variance

Score Mean DeviationSquared Deviation

1 2.6 -1.6 2.56

2 2.6 -0.6 0.36

3 2.6 0.4 0.16

3 2.6 0.4 0.16

4 2.6 1.4 1.96

Total 5.20

Page 20: 1 Introduction to SPSS Data types and SPSS data entry and analysis

20

Standard Deviation

The variance has one problem: it is measured in units squared.

This isn’t a very meaningful metric so we take the square root value.

This is the Standard Deviation

Page 21: 1 Introduction to SPSS Data types and SPSS data entry and analysis

21

Using SPSS

‘Analyse>Descriptive>Explore’ menu. Gives mean, median, SD, variance, min,

max, range, skew and kurtosis. Can also produce stem and leaf, and

histogram.

Page 22: 1 Introduction to SPSS Data types and SPSS data entry and analysis

22

Charts in SPSS

Use ‘Chart Builder’ from ‘Graph’ menu or the Legacy menu

And/or double click chart to edit it. E.g. double click to edit bars (e.g. to change

from colour to fill pattern). Do this in SPSS first before cut and paste to

Word Label the chart (in SPSS or in Word)

Page 23: 1 Introduction to SPSS Data types and SPSS data entry and analysis

23

Stem and leaf plots

e.g. age of students taking Health and Illness option

good at showing distribution of data outliers range

Page 24: 1 Introduction to SPSS Data types and SPSS data entry and analysis

24

Stem and leaf plots e.g.

Page 25: 1 Introduction to SPSS Data types and SPSS data entry and analysis

25

Box Plot

Page 26: 1 Introduction to SPSS Data types and SPSS data entry and analysis

26

Box Plot

Fill colour changed.N.B. numbers refer to case numbers.

Page 27: 1 Introduction to SPSS Data types and SPSS data entry and analysis

27

Histograms and bar charts

Length/height of bar indicates frequency

Page 28: 1 Introduction to SPSS Data types and SPSS data entry and analysis

28

Histogram

Fill pattern suitable for black and white printing

Page 29: 1 Introduction to SPSS Data types and SPSS data entry and analysis

29

Changing the bin size

Bin size made smaller to show more bars

Page 30: 1 Introduction to SPSS Data types and SPSS data entry and analysis

30

Pie chart

angle of segment indicates proportion of the whole

Page 31: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Pie Chart

Shadow and one slice moved out for emphasis

Page 32: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Analysing relationships

Contingency tables or crosstabulations Compares nominal/categorical variables

But can include ordinal variables N.B. table contains counts (= frequency data) One variable on horizontal axis One variable on vertical axis Row and column total counts known as marginals

Page 33: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Example

In the Health and Illness class, are women more likely to be under 21 than men?

Page 34: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Crosstabulations

e.g.

Use column and row percentages to look for relationships

Page 35: 1 Introduction to SPSS Data types and SPSS data entry and analysis

SPSS output

Page 36: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Chi-square ²

Cross tabulations and Chi-square are tests that can be used to look for a relationship between two variables:

When the variables are categorical so the data are nominal (or frequency).

For example, if we wanted to look at the relationship between gender and age.

There are several different types of Chi-square (²), we will be using the 2 x 2 Chi-square

Page 37: 1 Introduction to SPSS Data types and SPSS data entry and analysis

2x2 Chi-square results in SPSS

Page 38: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Another example

The Bank employees data

Page 39: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Bank EmployeesChi-Square tests

Page 40: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Chi-Square analysis on SPSS

http://www.youtube.com/watch?v=Ahs8jS5mJKk 4m15s

http://www.youtube.com/watch?v=IRCzOD27NQU

From 6m:30s to 9m:50s

http://www.youtube.com/watch?v=532QXt1PM-Q&feature=plcp&context=C3ba91a4UDOEgsToPDskJ-ABupdp-Yfvuf4j4fJGzV 12m30s

Page 41: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Low values in cells

Get SPSS to output expected values Look where these are <5 Consider recoding to combine cols or rows

Page 42: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Tabulating questionnaire responses

Categorical survey data often “collapsed” for purposes of data analysis

Original category Frequency Collapsed category Frequency

White British 284 White 304

White Irish 7

Other White 13

Indian 40 South Asian 105

Pakistani 32

Bangladeshi 33

Chinese 16 Chinese 16

Black British 30 Black 44

Afro-Caribbean 12

African 2

An analysis on a sample of 2 (e.g. Black African) would not have been very meaningful!

Page 43: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Recoding variables

http://www.youtube.com/watch?v=uzQ_522F2SM&feature=related

Ignore t-test for now 6m11s

http://www.youtube.com/watch?v=FUoYZ_f6Lxc

Uses old version of SPSS, no submenu now. 6m

Page 44: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Scatterplots and correlations

Looks for association between variables, e.g. Population size and GDP crime and unemployment rates height and weight

Both variables must be rank, interval or ratio (scale or ordinal in SPSS).

Thus cannot use variables like, gender, ethnicity, town of birth, occupation.

44

Page 45: 1 Introduction to SPSS Data types and SPSS data entry and analysis

45

Scatterplots

e.g. age (in years) versus Number of GCSEs

Page 46: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Interpretation

As Y increases X increases

Called correlation

Regression line model in red

46

Page 47: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Correlation measures association not causation

The older the child the better s/he is at reading The less your income the greater the risk of schizophrenia

Height correlates with weight But weight does not cause height Height is one of the causes of weight (also body shape,

diet, fitness level etc.) Numbers of ice creams sold is correlated with the

rate of drowning Ice creams do not cause drowning (nor vice versa) Third variable involved – people swim more and buy more

ice creams when it’s warm

47

Page 48: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Scatterplot in SPSS

Use Graph menu http://www.youtube.com/watch?

v=74BjgPQvIEg 8m34s

http://www.youtube.com/watch?v=blfflA-34pQ&feature=related 4m04s

http://www.youtube.com/watch?v=UVylQoG4hZM 1m50s, ignore polynomial regression

48

Page 49: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Modifying the Scatterplot

http://www.youtube.com/watch?v=803YCYA2AoQ&feature=related 4m04s

http://www.youtube.com/watch?v=vPzvuMuVXk8&feature=related 3m40s

49

Page 50: 1 Introduction to SPSS Data types and SPSS data entry and analysis

If mixed data sets

Change point icon and/or colour to see different subsets.

Overall data may have no relationship but subsets might.

E.g. show male and female respondents. Use Chart builder

50

Page 51: 1 Introduction to SPSS Data types and SPSS data entry and analysis

51

Correlation

Correlation coefficient = measure of strength of relationship, e.g. Pearson’s r

varies from 0 to 1 with a plus or minus sign

Page 52: 1 Introduction to SPSS Data types and SPSS data entry and analysis

52

Positive correlation

as x increases, y increases

r = 0.7

Page 53: 1 Introduction to SPSS Data types and SPSS data entry and analysis

53

Negative correlation

as x increases, y decreases

r = -0.7

Page 54: 1 Introduction to SPSS Data types and SPSS data entry and analysis

54

Strong correlation (i.e. close to 1)

r = 0.9

Page 55: 1 Introduction to SPSS Data types and SPSS data entry and analysis

55

Weak correlation (i.e. close to 0)

r = 0.2

Page 56: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Interpretation cont.

r2 is a measure of degree of variation in one variable accounted for by variation in the other.

E.g. If r=0.7 then r2=.49 i.e. just under half the variation is accounted for (rest accounted for by other factors).

If r=0.3 then r2=0.09 so 91% of the variation is explained by other things.

56

Page 57: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Significance of r

SPSS reports if r is significant at α=0.05 N.B. this is dependent on sample size to a

large extent. Other things being equal, larger samples

more likely to be significant. Usually, size of r is more important than

its significance

57

Page 58: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Pearson’s r in SPSS

http://www.youtube.com/watch?v=loFLqZmvfzU 6m57s

58

Page 59: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Parametric and non-parametric

Some statistics rely on the variables being investigated following a normal distribution. – Called Parametric statistics

Others can be used if variables are not distributed normally – called Non-parametric statistics.

Pearson’s r is a parametric statistic Kendal’s tau and Spearman’s rho (rank

correlation) are non-parametric.59

Page 60: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Assessing normality

Produce histogram and normal plot

60

Page 61: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Use statistical test

SPSS provides two formal tests for normality : Kolmogorov-Smirnov (K-S) and Shapiro-Wilks (S-W) But, there is debate about KS Extremely sensitive to departure from normality May erroneously imply parametric test not

suitable – especially in small sample So, always use a histogram as well.

61

Page 62: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Often can use parametric tests

Parametric tests (e.g. Pearson’s r) are robust to departures from normality

Small, non-normal samples OK But use non-parametric if

Data are skewed (questionnaire data often is) Data are bimodal

62

Page 63: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Spearmans’s rho

http://www.youtube.com/watch?v=r_WQe2c-ISU From 4.14 to 4.56

http://www.youtube.com/watch?v=POkFi5vKvI8&feature=fvwrel 6m16s

63

Page 64: 1 Introduction to SPSS Data types and SPSS data entry and analysis

So far…

Looked at relationships between nominal variables

Gender vs age group

Looked at relationships between scale variables

Height vs. Weight

Now combine the two Groups vs a scale variable

E.g. Gender vs income

64

Page 65: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Reminder – IV vs DV

IV = independent variable What makes a difference, causes effects, is responsible

for differences.

DV = dependent variable What is affected by things, what is changed by the IV.

Gender vs income. Gender = IV, income = DV So we investigate the effect of gender on income

65

Page 66: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Example 1Age group vs. no. of GCSEs

Using the Health and Illness class data Age group defines 2 groups

Under 21 21 and over

Just two groups Can use independent samples t-test Independent because the two groups consist of

different people. t-test compares the means of the 2 groups.

66

Page 67: 1 Introduction to SPSS Data types and SPSS data entry and analysis

67

Difference of means

Do under 21s have more or fewer GCSEs than 21 and overs?

Means are different (6.44 & 4.28) but is that significant?

Page 68: 1 Introduction to SPSS Data types and SPSS data entry and analysis

68

No significant difference therefore assume equal variances

Means are statistically significantly different

Page 69: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Parametric vs non-parametric

Just as in the case of correlations, there are both kinds of tests.

Need to check if DV is normally distributed. Do this visually Also use statistical tests

69

Page 70: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Tests for normality

Kolmogorov-Smirnov and Shapiro-Wilk If n>50 use KS If n≤50 use SW Null hypothesis is ‘data are normally distributed’. So if p<0.05 then data are significantly different

from a normal distribution – use non-parametric tests

If p≥0.05 then no significant difference – use parametric tests

70

Page 71: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Checking normality

Produce histogram of DV Tick box to undertake statistical test Interpret results.

71

Page 72: 1 Introduction to SPSS Data types and SPSS data entry and analysis

t-test

Identify your two groups. Determine what values in the data indicate

those two groups (e.g. 1=female, 2=male) Select Analyze:Compare Means:Independent

samples t-test http://www.youtube.com/watch?

v=_KHI3ScO8sc 9m40s

72

Page 73: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Mann-Whitney U test

Use this when comparing two groups and the DV is not normally distributed

http://www.youtube.com/watch?v=7iTvv3m9d_g 3m45s

73

Page 74: 1 Introduction to SPSS Data types and SPSS data entry and analysis

Comparing 3 or more groups

ANOVA = Analysis of Variance Analyze: Compare Means: One-way ANOVA http://www.youtube.com/watch?

v=wFq1b3QjI1U 4m04s

Useful to get table of means (descriptives) and means plots from ANOVA options.

74

Page 75: 1 Introduction to SPSS Data types and SPSS data entry and analysis

ANOVA Means and F value

75

Page 76: 1 Introduction to SPSS Data types and SPSS data entry and analysis

ANOVA Means Plot

76