quantitative methods in social research 2010/11 week 2 (morning) a novice’s guide to quantitative...

25
Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Upload: sawyer-pay

Post on 01-Apr-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Quantitative Methods in Social Research 2010/11

Week 2 (Morning)A novice’s guide to quantitative

analysis

Page 2: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Examining quantitative data

• Quantitative measures are typically referred to as variables.

• Some variables are generated directly via the data generation process, but other, derived variables may be constructed from the original set of variables later on.

• As the next slide indicates, variables are frequently referred to in more specific ways.

Page 3: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Cause(s) and effect…?

• Often, one variable (and occasionally more than one variable) is viewed as being the dependent variable.

• Variables which are viewed as impacting upon this variable, or outcome, are often referred to as independent variables.

• However, for some forms of statistical analyses, independent variables are referred to in more specific ways (as can be seen within the menus of SPSS for Windows)

Page 4: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Levels of measurement (Types of quantitative data)

• A nominal variable relates to a set of categories such as ethnic groups or political parties which is not ordered.

• An ordinal variable relates to a set of categories in which the categories are ordered, such as social classes or levels of educational qualification.

• An interval-level variable relates to a ‘scale’ measure, such as age or income, that can be subjected to mathematical operations such as averaging.

Page 5: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

How many variables?

• The starting point for statistical analyses is typically an examination of the distributions of values for the variables of interest. Such examinations of variables one at a time are a form of univariate analysis.

• Once a researcher moves on to looking at relationships between pairs of variables she or he is engaging in bivariate analyses.

• … and if they attempt to explain why two variables are related with reference to another variable or variables they have moved on to a form of multivariate analysis.

Page 6: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Looking at categorical variables

• For nominal/ordinal variables this largely means looking at the frequencies of each category, often pictorially using, say, bar-charts or pie-charts.

• It is usually easier to get a sense of the relative importance of the various categories if one converts the frequencies into percentages!

Page 7: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Example of a frequency table

Frequency %

At school, college or university 872 12.4

At/through work 1405 19.9

In a pub/cafe/restaurant/ bar/club

2096 29.7

At a social event organised by friend(s)

1055 14.9

Other 1631 23.1

TOTAL 7059 100.0

Place met marital or cohabiting partner

Page 8: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Example of a pie-chartAt school, college or university

At/through work

In a pub/cafe/restaurant/bar/club

At a social event organised by friend(s)

Other

Page 9: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Looking at ‘scale’ variables

• For interval-level data the appropriate visual summary of a distribution is a histogram, examining which can allow the researcher to assess whether it is reasonable to assume that the quantity of interest has a particular distributional shape (and whether it exhibits skewness).

Page 10: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Example of a histogram

Page 11: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Description or inference?

• Descriptive statistics summarise relevant features of a set of values.

• Inferential statistics help researchers decide whether features of quantitative data from a sample can be safely concluded to be present in the population.

• Generalizing from a sample to a population is part of the process of statistical inference

• One objective may be to produce an estimate of the proportion of people in the population with a particular characteristic, i.e. a process of estimation.

Page 12: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

What makes inference difficult?

• Inferences about a population can have their credibility undermined by the sampling-related bias that may be present in a non-random sample.

• Even if there is no bias of this sort, samples differ from populations because of sampling error, i.e. the amount a quantity in a random sample differs from the corresponding quantity in the population.

• A pattern or difference in a sample may thus be solely an artefact of sampling error, i.e. the pattern or difference has been induced by ‘noise’ rather than reflecting something genuine in the population.

Page 13: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

The value of random sampling

• We can sample from a population in various ways (e.g. we could select the first ten women and ten men we meet to make a gender comparison), but some ways (including this one!) may lead to biases arising from the sampling process.

• However, in a random sample, in which:– all members of the population of interest have some chance of

being included,– their inclusion or exclusion is by chance alone, and– the chance of the inclusion of each population member can be

established,

there is no scope for bias through sampling, only for sampling error.

Page 14: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

The value of knowing things about sampling error

• Random samples thus allow us to restrict the sources of error in sample data to sampling error alone, i.e. instead of:

Observed (sample) quantities = Population quantities +/- Sampling error +/- Bias

• We have:

Observed (sample) quantities = Population quantities +/- Sampling error

• So, if we know something about how much sampling error there is likely to be, we can use this (together with our sample data) to infer things about the population quantities.

Page 15: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

…but how do we know about it?• Sampling error is the inaccuracy in sample data that arises

because we have a sample rather than the whole population. If we are lucky, the amount of sampling error is small (especially if we have a reasonably large sample), but there is always a small chance, even in a random sample, that our sample has an ‘odd’ composition, and the sampling error is thus large.

• Fortunately, statistical theory allows us to estimate the kinds of quantities of sampling error that are likely to have occurred in a given situation; more precisely, it allows us to establish a frequency distribution for the possible amounts of sampling error that we may have in our sample, and hence quantify how likely it is that our sample results are (more than) a given amount wrong...

Page 16: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

What determines sampling error? An example

• The amount of sampling error, on average, reflects the size of the sample (with the amount typically being less in proportional terms for a bigger sample) and also reflects how diverse the quantity of interest is.

• Estimating average earnings: Average sampling errorFor a sample of 25 men: £79.0 For a sample of 25 women: £29.4For a sample of 100 men: £39.5For a sample of 100 women: £14.7

Page 17: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Looking at the relationship between two categorical variables

If two variables are nominal or ordinal, i.e. categorical, we can look at the relationship between them in the form of a cross-tabulation, using percentages to summarize the pattern. (Typically, if there is one variable that can be viewed as depending on the other, i.e. a dependent variable, and the categories of this variable make up the columns of the cross-tabulation, then it makes sense to have percentages that sum to 100% across each row; these are referred to as row percentages).

Page 18: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

An example of a cross-tabulation(from Jamieson et al., 2002#)

Both ‘permanent’

Both ‘try and see’

Different answers

TOTAL

Cohabiting without marriage

15 (48%) 4 (13%) 12 (39%) 31 (100%)

Cohabited and then married

16 (67%) 1 (4%) 7 (29%) 24 (100%)

Married without cohabiting

9 (100%) 0 (0%) 0 (0%) 9 (100%)

‘When you and your current partner first decided to set up home or move in together, did you think of it as a permanent arrangement or something that you would try and then see how it worked?’

# Jamieson, L. et al. 2002. ‘Cohabitation and commitment: partnership plans of young men and women’, Sociological Review 50.3: 356–377.

Page 19: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Alternative forms of percentage

• In the following example, row percentages allow us to compare outcomes between the categories of an independent variable.

• However, we can also use column percentages to look at the composition of each category of the dependent variable.

• In addition, we can use total percentages to look at how the cases are distributed across combinations of the two variables.

Page 20: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Example Cross-tabulation II: Row percentages

Class origin * Class destination Crosstabulation

Class destination

Service Intermediate Working Total

Class origin Service Count 730 323 189 1242

% within Class origin 58.8% 26.0% 15.2% 100.0%

Intermediate Count 857 1140 1108 3105

% within Class origin 27.6% 36.7% 35.7% 100.0%

Working Count 786 1385 2916 5087

% within Class origin 15.5% 27.2% 57.3% 100.0%

Total Count 2373 2848 4213 9434

% within Class origin 25.2% 30.2% 44.7% 100.0%

Derived from: Goldthorpe, J.H. with Llewellyn, C. and Payne, C. (1987). Social Mobility and Class Structure in Modern Britain (2nd Edition). Oxford: Clarendon Press.

Page 21: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Example Cross-tabulation II: Column percentages

Class origin * Class destination Crosstabulation

Class destination

Service Intermediate Working Total

Class origin Service Count 730 323 189 1242

% within Class destination 30.8% 11.3% 4.5% 13.2%

Intermediate Count 857 1140 1108 3105

% within Class destination 36.1% 40.0% 26.3% 32.9%

Working Count 786 1385 2916 5087

% within Class destination 33.1% 48.6% 69.2% 53.9%

Total Count 2373 2848 4213 9434

% within Class destination 100.0% 100.0% 100.0% 100.0%

Page 22: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Example Cross-tabulation II: Total percentages

Class origin * Class destination Crosstabulation

Class destination

Service Intermediate Working Total

Class origin Service Count 730 323 189 1242

% of Total 7.7% 3.4% 2.0% 13.2%

Intermediate Count 857 1140 1108 3105

% of Total 9.1% 12.1% 11.7% 32.9%

Working Count 786 1385 2916 5087

% of Total 8.3% 14.7% 30.9% 53.9%

Total Count 2373 2848 4213 9434

% of Total 25.2% 30.2% 44.7% 100.0%

Page 23: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Test statistics• How can we summarise the pattern in the first of the preceding

sample-based cross-tabulations, so that we can assess how much evidence there is that it is not a coincidence, i.e. something akin to a ‘face in a cloud’? (Setting aside the possibility of bias...)

• If we can draw the conclusion that there is too much evidence of a pattern or difference for it to be likely to be a coincidence, then we can (reasonably confidently) conclude that there is a pattern or difference in the population.

• In general, statistical inference operates via the construction of test statistics, which quantify the evidence that there is a difference or relationship in such a way that it can be assessed how likely an observed difference or relationship in a sample is to have occurred purely as a consequence of sampling error, rather than as a reflection of a difference or relationship in the population.

Page 24: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Hello to the p-value!• For any test statistic, the crunch question is how likely it is that (at

least) that much evidence of a difference or relationship would have been generated solely by sampling error. The probability of this is referred to as the p-value.

• The p-value is also often referred to as the significance value, with significance testing being the process of identifying whether the evidence provided by a test statistic is statistically significant, i.e. unlikely to have been generated solely by sampling error.

• Different forms of statistical analysis use a range of different test statistics, but the p-value always has the same meaning.

• It is a convention to regard p<0.05 (i.e. less than 5% or 1 in 20) as unusual enough to be inferred not to be a coincidence.

Page 25: Quantitative Methods in Social Research 2010/11 Week 2 (Morning) A novice’s guide to quantitative analysis

Extending the social mobility example: the value of multivariate analysis

• Could patterns of class mobility be explained via a third variable: (the role of) education?

• Might the impact of class of origin on class of destination have diminished over time? (i.e. changed with respect to a third variable)

• The latter possibility would involve an interaction effect, i.e. the impact of one variable varying according to the level of another variable.