1 my contact details colin gray room s2 (thursday mornings, especially) e-mail address:...

201
1 My contact details • Colin Gray • Room S2 (Thursday mornings, especially) • E-mail address: [email protected] • Telephone: (27) 2234 • A rapid response to any queries assured!

Upload: alexis-newton

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

1

My contact details • Colin Gray

• Room S2 (Thursday mornings, especially)

• E-mail address: [email protected]

• Telephone: (27) 2234

• A rapid response to any queries assured!

Page 2: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

2

This afternoon’s programme

• 1:30 – 3:30pm Simple descriptive statistics.

• 3:30 – 4:00pm A break for coffee.

• 4:00 – 4:45pm Finding probabilities.

Page 3: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

3

SESSION 1

Describing data

Page 4: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

4

Kinds of data

Page 5: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

5

Univariate, bivariate and multivariate data sets

• We can classify data according to the number of measured variables in the data set.

• If there is one measured variable, we have a UNIVARIATE data set.

• If there are two measured variables, we have a BIVARIATE data set.

• If there are three or more measured variables, we have a MULTIVARIATE data set.

Page 6: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

6

Levels of measurement

• There are three levels of measurement:

1. Scale, interval or continuous.

2. Ordinal.

3. Nominal.

Page 7: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

7

Scale data

• Measures on an independent scale with units. Heights, weights, performance scores, IQs and number of Hits are all scale data. So also are counts of the number of hits and so on. Each score has ‘stand-alone’ meaning.

Page 8: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

8

Ordinal data

• Data in the form of RANKS (1st, 3rd, 53rd). A rank has meaning only in relation to the other individuals in the sample. A rank does not express, in units, the extent to which a property is possessed.

• Rarely would a researcher collect data in the form of ranks. But there are hidden issues here. Some would argue that ratings are really ordinal data (with ties) and should be treated as such in statistical analysis.

Page 9: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

9

Nominal data

• Assignments to categories (so-many males, so-many females.) Nominal data are numerical, but the numbers are arbitrary LABELS, as when John receives a 1 for Sex, while Jane receives a 2.

• Nominal data are not really measurements at all.

Page 10: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

10

Experimental versus correlational research

• In a true experiment such as a randomised clinical trial, the researcher manipulates one variable, the INDEPENDENT VARIABLE (IV), with a view to demonstrating that is has a causal effect upon the DEPENDENT VARIABLE (DV).

• The DV is measured during the course of the experiment.

• In correlational research, ALL variables are measured as they occur in the people studied.

Page 11: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

11

Comparison

• Experimental research usually results in univariate data sets.

• The statistical analysis usually involves COMPARISON of scores obtained under the different experimental conditions.

• For example, performance under an active condition might be compared with performance under a control condition.

Page 12: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

12

Association

• Correlational research results in bivariate or multivariate data sets.

• Here, the interest centres on the possible existence of statistical ASSOCIATIONS among the variables measured.

• If watching screened violence promotes actual violence, we should find that those who watch most screened violence should tend to be the most violent, those who watch least should be the least violent and so on.

Page 13: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

13

Uses of statistics

1. We use statistics to SUMMARISE and DESCRIBE our data.

2. We use statistics to CONFIRM patterns in our data. One aspect of this process of confirmation is the making of statistical TESTS.

Page 14: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

14

A simple two-group experiment

• The experimenter wants to show that ingestion of caffeine improves shooting accuracy, as measured by number of Hits.

• Participants are randomly assigned to one of the two conditions.

• All participants shoot at the same target.

Page 15: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

15

Results of the Caffeine experiment

Page 16: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

16

The raw data

• The table shows the RAW DATA, that is, the ORIGINAL SCORES achieved by the participants.

• From inspection, it seems that the Caffeine group tended to have higher scores.

• With larger data sets, however, it can be very difficult to see what’s going on merely from inspection.

Page 17: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

17

Distribution

• The DISTRIBUTION of a variable is a table or diagram showing the relative FREQUENCIES, over the entire range, with which different values occur.

• A good first move in a statistical analysis is to draw a graph of the distribution.

Page 18: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

18

Distributions of the Caffeine and Placebo data

Page 19: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

19

Three important aspects of a distribution

1. Its LEVEL or CENTRAL TENDENCY.

2. The SPREAD or DISPERSION of scores around the centre.

3. The SHAPE of the distribution.

Page 20: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

20

Different central tendencies • The scores of the

Caffeine group TEND to be higher than do the scores of the Placebo group. The two distributions differ in LEVEL or CENTRAL TENDENCY.

• There is, however, considerable overlap: some participants in the Placebo condition outperformed those in the Caffeine condition.

Page 21: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

21

Individual differences • In the Caffeine

distribution, values are densest around 13; whereas in the Placebo distribution, values are densest around 9.

• But there is a huge RANGE in performance.

• The worst performer (who scored 2) was in the Caffeine group; the best (who scored 20) was in the Placebo group.

Page 22: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

22

Central tendency: the “average”

• An average is a measure of level or central tendency, the “typical” value.

• It is clear from inspection of the figure that the average score of the Caffeine distribution should be higher than the average score of the Placebo distribution.

• There are several different measures of the “average” of a set of scores.

Page 23: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

23

The mean

• The MEAN of a set of scores is the sum of their values divided by the number of scores.

• If X is a score and n is the number of scores, the mean M is:

Page 24: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

24

Example

• The mean of the scores 10, 1, 3, 4 and 2 is …

Page 25: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

25

The two group means

Page 26: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

26

Deviation scores • A deviation score d is

a score from which the mean has been subtracted.

• Deviation scores have the very important property that they sum to zero.

• Therefore, their mean is also zero.

Page 27: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

27

Centring • In column X, are raw

scores, centred on their mean value of 2.

• Place the deviation scores d in the next column. This operation is known as CENTRING and is common in regression analysis.

• The new values are now centred on zero, rather than the mean of the original values.

Page 28: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

28

The mean as the ‘centre of gravity’

• The mean can be thought of as THE CENTRE OF GRAVITY of a distribution, the point at which it would BALANCE on a knife-point.

• We can see (because this distribution is symmetrical) that the mean of this distribution is 3.

Page 29: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

29

Outliers

• Often data sets contain scores that are atypical of the distribution as a whole.

• Such an atypical score is known as an OUTLIER.

• With small data sets, outliers can have marked effects upon the values of some statistics.

• Such statistics can become UNREPRESENTATIVE of the data as a whole.

Page 30: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

30

An outlier (20 hits) exerts ‘leverage’ upon the value of the mean.

Page 31: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

31

Other measures of ‘the average’

• There are other measures of the average or central tendency which are more ROBUST to the influence of outliers.

• Two such measures are the MEDIAN and the MODE.

Page 32: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

32

The median

• The MEDIAN of a distribution is the MIDDLE number. It is the value below which 50% of the distribution lies.

• The medians of the scores in the Placebo and Caffeine groups are, respectively, 9 and 12.5 .

Page 33: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

33

Points about the median

• Notice that, for the Placebo group, the median does not have the value of any of the actual scores.

• With symmetrical distributions, the median and the mean have similar values.

Page 34: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

34

The mode • The MODE is the MOST FREQUENT value. • For the Placebo and Caffeine groups, the values

of the mode are 8 and 13, respectively. • On all three measures of central tendency or

level, therefore, the three averages agree that the Caffeine group typically performed at a higher level than did the Placebo group.

Page 35: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

35

Comparison of the three measures

• The mean is the basis of classical statistical theory, because it has many useful mathematical properties.

• The median is useful for exploring data sets, particularly in comparison with the mean. With an extremely asymmetrical distribution, the median is arguably a truer measure of level in the data as a whole.

• The mode is seldom used.

Page 36: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

36

Properties of the mean

• We have seen that deviations about the mean sum to zero.

• The sum of the SQUARES of deviations about the mean is a MINIMUM, that is, it is smaller than the sum of squared deviations about any other value.

Page 37: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

37

A property of the median

• The sum of ABSOLUTE deviations about the MEDIAN is also a minimum.

• But absolute values are less useful mathematically.

Page 38: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

38

A second scenario

• The scores of both groups cluster around the same value: 12 . Since the distributions are completely symmetrical, the mean of either is clearly 12.

• In the Caffeine distribution, however, the scores are more widely SPREAD OUT or DISPERSED than those of the Placebo group.

Page 39: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

39

The simple range

• The SIMPLE RANGE is the highest score minus the lowest score.

• So, for the Placebo group in Scenario 2, the simple range is (15 – 9) = 6 score units.

• For the Caffeine group, the simple range is (18 – 6) = 12 score units.

• On this measure of dispersion, therefore, the Caffeine distribution shows twice as much spread or dispersion of scores around the mean.

Page 40: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

40

A problem with the simple range

• The simple range statistic only uses TWO scores out of the whole distribution.

• Should those particular scores be highly atypical of the distribution, the range may not reflect the true spread of scores about the mean of the distribution. The data from the original scenario (left) exemplify this situation.

Page 41: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

41

Other range statistics

• Nevertheless, the simple range can be a very useful statistic when you are EXPLORING a data set.

• Also available are more complex RANGE STATISTICS (the interquartile range, the seminterquartile range) which use more of the information in a data set than does the simple range.

Page 42: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

42

The variance and the standard deviation (SD)

• The VARIANCE (s2) and the STANDARD DEVIATION (s or SD) are also measures of dispersion.

• Both statistics use the values of ALL the scores in the distribution.

Page 43: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

43

Deviation scores again • The DEVIATION SCORE

is the building block from which the variance and SD are calculated.

• Could the mean deviation serve as a measure of spread?

• No, because deviations about the mean sum to zero. So the mean deviation is also zero, whatever the spread of your data.

Page 44: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

44

Squared deviations

• The sum of the SQUARED deviations is always either positive (when scores have different values) or zero (if all the scores have the same value).

• If there is any variability in the scores at all, the sum of the squared deviations will have a positive value.

Page 45: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

45

Formula for the variance

• The Greek letter sigma (Σ) is used to indicate that you are to obtain the deviation of each score from the mean, square it, then add up all the squared deviations.

• The sample variance s2 is close to being the MEAN SQUARED DEVIATION.

• The value 1 is subtracted from n in order to improve the sample variance as an estimate of the spread of values in the population.

Page 46: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

46

Applying the formula

Page 47: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

47

Variance of the Caffeine scores in Scenario 1

Page 48: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

48

Adding a constant

• Adding a constant of ten to every score in the Caffeine group simply shifts the whole distribution ten units to the right.

• So the new mean will be the old one plus ten: new mean = 11.90 + 2 = 13.90

• The SPREAD of the scores, however, will be unaltered, so the variance and the SD will have the same values as before.

Page 49: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

49

Multiplying by a constant

• Multiplying each score by a constant of ten not only increases the mean by a factor of ten, but also increases the SPREAD of the scores about the new mean.

• The new mean will be ten times the old one. • The new variance will be ten SQUARED, that is

one hundred, times the old variance. • The new SD will be ten times the old one.

Page 50: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

50

Adding and multiplying scores by a constant of ten

Page 51: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

51

Examples

Page 52: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

52

Effect of centring

• When you centre scores by subtracting the mean, the mean becomes zero.

• The variance, however, remains unaltered.

Page 53: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

53

Interpreting the variance

• The simple range statistic has the merit of being in the same units as the raw data.

• The variance, since it is based on the squares of the deviations, is in SQUARED UNITS and is therefore difficult to interpret.

• If you take the (positive) square root of the variance, you have the STANDARD DEVIATION, which is in the original units of measurement.

Page 54: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

54

The standard deviation is the positive square root of the variance

• We found that the variance of the scores of the Caffeine group was 10.73

• To obtain the standard deviation, we take the square root of 10.73, which is 3.28 .

• The square root operation restores the measure of spread to the original measurement units: we can say that the standard deviation is 3.28 hits.

Page 55: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

55

Tables of results

• As well as means, always include the standard deviations.

Page 56: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

56

Vulnerability of variance and SD to outliers

• We have seen that the mean is vulnerable to the leverage exerted by outliers.

• This is true, a fortiori, of the variance, because it is the sum of the SQUARES of deviations from the mean.

• The leverage effect is NOT removed by taking the square root of the variance to obtain the standard deviation.

Page 57: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

57

Standard or z scores

• A standard or z score is a special kind of deviation score which expresses a value as so-many standard deviations above or below the mean (0):

Page 58: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

58

Mean and SD of z scores

• Their mean is always zero (because they are deviation scores).

• Their variance and standard deviation are 1.

Page 59: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

59

Advantage of z scores

• Scores in different units (heights and weights) cannot be directly compared.

• But when someone’s weight has a z score of –1 (one SD below the mean (0) and their height has a z score of +2 (two SDs above the mean), we can say that someone is tall and thin.

• If we can make additional assumptions about the distribution, knowledge of z scores is even more informative.

Page 60: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

60

Distribution shape

• We have measured the AVERAGE and the SPREAD of the Caffeine and Placebo distributions.

• We noted that both distributions were (at least approximately) SYMMETRICAL.

• There are circumstances in which that would not be the case.

Page 61: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

61

A disappointing result

• The mean for the Caffeine group is only very slightly greater than the Placebo mean.

• But note that both means are near the top of the scale (20).

• And notice how small the SD’s are.

Page 62: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

62

Ceiling effect

• The scores of both groups are bunched around the top of the scale.

• Any possible effect of caffeine intake has been masked by a CEILING EFFECT.

• The task chosen was TOO EASY for the participants.

• No conclusions about the effects of ingestion of caffeine can be drawn from these data.

Page 63: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

63

Another disappointing result

• Again the Caffeine mean is only slightly greater than the Placebo mean.

• But both means are near the bottom of the scale (zero).

• Once again, note the small SD’s.

Page 64: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

64

Floor effect

• The scores of either group are bunched around the bottom of the scale.

• The task was too difficult.

• No conclusions about the effects of ingestion of caffeine can be drawn from these data either.

Page 65: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

65

Skewness

• In both Scenarios 3 and 4, the distributions are asymmetric or SKEWED.

• When a distribution has a tail to the left, it is said to be NEGATIVELY SKEWED; when it has a tail to the right, it is POSITIVELY SKEWED.

• When there is a ceiling effect, the distributions are negatively skewed; when there is a floor effect, they are positively skewed.

Page 66: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

66

Screen violence and actual violence

• Does screened violence promote actual violence?

• Ethical and practical considerations may rule out direct manipulation of the amount of violent material that children watch.

• It may be more feasible to measure children on the amount of screen violence they watch and upon their actual violence.

Page 67: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

67

Correlation

• A statistical ASSOCIATION or CORRELATION is a tendency for events or values to occur together.

• If exposure to screen violence promotes actual violence, we should expect those who watch more violence to be more violent and those who watch less violence to be less violent.

• Such a POSITIVE ASSOCIATION would be at least consistent with the hypothesis.

Page 68: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

68

A scatterplot • Here is a picture of the

results of our study. • In this SCATTERPLOT,

each point represents one of the children.

• Richard got a score of 2 on Exposure and 4 on Actual.

• John got 9 on Exposure and 8 on Actual.

• Jim got scores of 5 on both Exposure and Actual.

Richard

John

Jim

Page 69: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

69

A strong positive correlation

• When the shape of a scatterplot is a narrow ellipse like this, a strong correlation is indicated.

• The results of the study are consistent with the hypothesis.

Page 70: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

70

A negative correlation?

• Does the number of complaints made against GPs very inversely with the average length of their appointments?

• The following scatterplot supports this hypothesis.

Page 71: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

71

A strong negative correlation

Page 72: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

72

Scatterplot indicating no association

• When the cloud of points is circular, there is NO ASSOCIATION between the variables.

Page 73: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

73

Linear functions

• Y is a LINEAR FUNCTION of X if the graph of Y upon X is a straight line.

• For example, temperature in degrees Fahrenheit is a linear function of temperature in degrees Celsius.

Page 74: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

74

The Pearson correlation

• The PEARSON CORRELATION (r), is designed to measure the strength of a supposed linear relationship between two variables.

• A correlation can only take values within the range from –1 to +1, inclusive.

• The closer the value of a correlation to unity (forgetting the sign), the STRONGER the linear association.

Page 75: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

75

Formula for the Pearson correlation

• There are several equivalent formulae. Here is the simplest.

• Transform X and Y to standard scores z.

• Divide the sum of the products of the pairs of standard scores by (n – 1).

Page 76: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

76

The calculation of r for the violence data

• The value of r (.892) is high and positive, consistent with the appearance of the scatterplot.

Page 77: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

77

Centring again

• What is the effect upon the value of r when the variables involved are centred?

• There is no effect. • In fact, no linear transformation of either variable

(or both variables) will change the ABSOLUTE value of r.

• Suppose you measure the heights and weights of 100 people in inches and pounds and find that the correlation is +.6 . If you convert the heights and weights to cms and grams, respectively, the correlation is still +.6 .

• Merely subtracting their mean from the values of each variable leaves the correlation unchanged.

Page 78: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

78

Reversing the slope

• If you multiply all the scores on one variable by –1, you will change the slope of the scatterplot; but the absolute value of r will remain the same.

Page 79: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

79

Centring in regression

• We have seen that centring does not change the variance of a variable in the data set.

• Nor does centring change the correlations among the variables.

• Centring is used in several multivariate procedures in order to help the algorithm to find a unique solution.

Page 80: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

80

Question

• We have been told of a bivariate data set, from which the calculated Pearson correlation is ZERO: r = 0.

• From this information alone, can we conclude that the two variables are independent, that is, there is no association between them?

• The answer is NO!

Page 81: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

81

The scatterplot

• There is a perfect, but nonlinear association between the two variables.

• Yet the Pearson correlation is zero.

Page 82: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

82

Anscombe’s data set

• Many years ago, Fred Anscombe (American Statistician, 1973) published a famous paper warning readers of the pitfalls awaiting the unwary user of information about correlations.

• There were four bivariate data sets, all of which produced a Pearson correlation with a value of +.82.

Page 83: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

83

An elliptical scatterplot

• This is fine. • The elliptical

scatterplot indicates that there is indeed a basically linear relationship between variable Y1 and variable X1.

Page 84: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

84

A non-linear relationship

• There is actually a perfect association between variable Y2 and variable X1.

• This relationship, however, is non-linear and is understated by the value of r.

Page 85: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

85

An understatement by r

• There is a substantial correlation.

• The scatterplot, however, is not elliptical.

• Basically there is a perfect linear relationship between Y3 and X1.

• The outlier (a typo?) has depressed the value of r.

Page 86: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

86

Anscombe’s rule

• When you examine a scatterplot (something you should ALWAYS do when interpreting a correlation), ask yourself the following question:

“Would the removal of one or two points at random affect the basically ellipical shape of the scatterplot? If the shape would remain essentially the same, the value of r accurately reflects the association between the variables”.

Page 87: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

87

In summary …

• The Pearson correlation r is a measure of the strength of a supposed LINEAR relationship between 2 variables.

• It is one of the most widely used of statistical measures; but it is also one of the most misused.

• Wherever possible, a value of r should be interpreted in the context of the scatterplot.

Page 88: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

88

Have we really gathered evidence for the hypothesis that viewing

screened violence increases actual violence?

Page 89: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

89

A famous dictum

CORRELATION

does not imply

CAUSATION

Page 90: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

90

A causal model

• The scientific hypothesis implies this CAUSAL MODEL.

• The results are CONSISTENT with the hypothesis.

Page 91: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

91

Another causal model

• The child’s violent tendencies towards and appetite for violence lead to his watching violent programmes as often as possible.

• This model is also consistent with the data.

Page 92: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

92

Yet another causal model

• NEITHER variable causes the other. • Both are determined by the behaviour of the

child’s parents.

Page 93: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

93

Direction of causality

• Returning to the caffeine experiment, it would be ridiculous to suggest that shooting accuracy determines the group to which one is assigned.

• In the violence study, however, which was of CORRELATIONAL, rather than EXPERIMENTAL design, the direction of causation is uncertain.

• Indeed, at least three possible MODELS OF CAUSATION are consistent with the results.

Page 94: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

94

A background variable

• Perhaps neither Exposure nor Actual violence cause one another.

• Perhaps they are caused by a background parental behaviour variable.

• We have data on such a variable.

• The background variable correlates highly with both Exposure and Actual violence.

Page 95: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

95

Partial correlation

A PARTIAL CORRELATION is what remains of a Pearson correlation between two variables when the influence of a third variable has been removed, or PARTIALLED OUT.

Page 96: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

96

The partial correlation

• The partial correlation fails to reach significance.• Now that we have taken the background variable into

consideration, we see that there is no significant correlation between Exposure and Actual violence.

• It appears that, of the three possible causal models, the ‘third party’ model gives the most convincing account of these data.

Page 97: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

97

Coffee break

Page 98: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

98

Histograms

• A HISTOGRAM is useful for displaying the distribution of a large data set.

• Here is a histogram of the heights of 1000 men.

Page 99: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

99

Heights of 1000 men

Page 100: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

100

Features of a histogram

• The entire range of variation (shown on the x-axis) is divided into CLASS INTERVALS.

• The heights of the bars are proportional to the FREQUENCIES of values (y-axis) falling within the class intervals represented by the bases of the bars.

• The bars touch each other, indicating the CONTINUOUS variation of the variable.

Page 101: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

101

A normal distribution

Page 102: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

102

Salaries in the US

• Many variables have asymmetrical distributions.

Skewness = 2.13

Page 103: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

103

Measuring skewness

• Asymmetry or skewness is measured with a statistic which I shall call simply ‘Skewness’.

• (Skewness is a complex measure, involving the cube of the deviations of the scores about their mean.)

• PASW will calculate the value of Skewness for any distribution.

• If the value of Skewness is positive, the distribution is positively skewed; a negative value indicates negative skewness.

Page 104: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

104

Skewness of three distributions

Page 105: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

105

Relative frequency as an area

• The area of a bar is the proportion of values within the range of its base.

• The green area is the proportion of heights between 70 inches and 75 inches.

Page 106: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

106

Proportion between 65” and 75”

Page 107: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

107

Proportion of heights either below 65” or above 75”.

Page 108: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

108

Unity

• All values lie within the total range.

• The area of the green bars is 100% or unity.

Page 109: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

109

Populations and samples

• We have some scores on shooting accuracy from the caffeine trial.

• The POPULATION of such scores is the reference set, that is, the infinite set of all possible scores.

• Our data are merely a subset or SAMPLE from the population.

Page 110: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

110

Theoretical populations or distributions

• In these talks, the term “population” always refers to a theoretical distribution.

• For example the 1000 men’s heights are a sample from a theoretical NORMAL population whose mean is 69” and whose standard deviation is 2.59”.

• This NORMAL distribution is symmetrical and bell-shaped.

Page 111: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

111

Statistics versus parameters

• STATISTICS are characteristics of SAMPLES.

• PARAMETERS are characteristics of populations.

Page 112: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

112

Notational convention

• Roman letters denote statistics such as our sample means and SDs.

• Greek letter denote the corresponding population characteristics or parameters

Page 113: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

113

Two parameters

• There is an infinitely large family of normal distributions.

• To specify a normal distribution you must assign values to TWO parameters:

1. The mean

2. The standard deviation

Page 114: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

114

The height population

Page 115: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

115

Probability

Page 116: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

116

Probability

• The PROBABILITY of an event is a measure of its likelihood, which can take values from zero (an impossible event) to unity (a certainty).

• There have been several definitions of probability.

• All of them raise serious philosophical questions.

Page 117: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

117

An ‘event’

• An EVENT is the outcome of an experiment of chance, such as rolling a die, tossing a coin – or running a psychological experiment.

• Chance is an important factor in the outcome of an experiment.

• Joe, Fred and Mary participated this time; but Anne, Jim and Fiona could easily have done so – and their scores would certainly have been different.

Page 118: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

118

Classical ‘probability’

• “The first impetus came from a situation in which the dissolute nobility of France were competing in a race to ruin at the gaming tables” (Hogben, 1967; p.551).

• In 1654, Pascal and Fermat analysed the gambling strategies of one particular nobleman.

• Their approach was to determine the number of ways an outcome (such as a particular hand in cards) could occur in comparison with the total number of possibilities.

Page 119: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

119

Classical definition of a probability

• The probability of an event is the NUMBER OF WAYS in which the event can occur, divided by the TOTAL NUMBER OF OUTCOMES.

• Roll a die. • What is the probability of a six? • There is ONE way of getting a six. There

are SIX possible outcomes. • So the probability of a six is 1/6.

Page 120: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

120

More examples

• Roll a die. What is the probability of an even number?• That could happen in three ways: 2 spots, 4 spots or six

spots. • So the probability is 3/6 = ½. • What is the probability of a seven? There is NO WAY in

which that could happen, so the probability is 0/6 = 0 (indicating an IMPOSSIBILITY).

• A number between 1 and 6, inclusive? That event could happen in six ways, so the probability is 6/6 = 1 (indicating a CERTAINTY).

Page 121: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

121

A formula for classical probability

• If an experiment of chance has N possible outcomes and an event E can occur in n ways,

Page 122: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

122

A problem with the classical definition

• The classical definition is circular.

• The “number of ways” in which an experiment of chance could turn out were stated to be “equally likely”, which (by implication) pressed the term into service for its own definition.

Page 123: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

123

The empirical definition of a probability

• This notion is implicit in the notion of a FAIR coin.

• A fair coin is one that, IN THE LONG RUN, shows heads half the time.

• This “convergence”, however, which is a special case of what I shall call simplistically “The law of large numbers”, is an empirical fact.

• It cannot, however, be proved “analytically”, that is, by mathematical deduction.

Page 124: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

124

Interpretation of a probability

• If a coin is ‘fair’, the probability of a head is ½. • This does not mean that if I toss the coin 100

times, I shall get 50 heads. • Nor does it mean that if I toss the coin a million

times, I shall get close to half a million heads. • But with a million tosses, the proportion of heads

will be closer to ½ than it would be if I were to toss the coin 10 times, 100 times or 1000 times.

• A probability is a PROPORTION to which we can get as close as desired by taking a sample of sufficient size.

Page 125: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

125

Health events

• A HEALTH EVENT is an uncertain occurrence, such as acute appendicitis, admission to a dental clinic - or death.

• ADVERSE events are those occurring after admission to hospital.

• The likelihood of such events occurring is quantified as proportions obtainable from the records over a period of time.

• These proportions are thus EMPIRICAL PROBABILITIES.

Page 126: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

126

The laws of large numbers

• You can make a sample resemble the population as closely as you like by making it sufficiently large.

• So small samples from the same population can show considerable variation; whereas very large samples show little variation.

Page 127: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

127

Example

• I draw five samples of size ten from a normal population with mean zero and standard deviation 1. (The STANDARD normal distribution.)

• I then draw five samples of size one million from the same population.

Page 128: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

128

Size ten versus size one million

Page 129: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

129

Large samples and populations

• With the lower histograms, you are looking at the population, rather than at samples.

• … relative frequencies become PROBABILITIES.

• Visualise the probability of a value within a specified interval as the area under the curve of the theoretical distribution between the limits of the interval.

Page 130: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

130

Relative frequency becomes probability

Page 131: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

131

Probability distribution

• When we take a measurement such as a person’s height, we assume we have performed an experiment of chance.

• We have sampled from a theoretical population.

• Since areas under the curve represent probabilities, theoretical distributions are known as PROBABILITY DISTRIBUTIONS.

Page 132: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

132

Random variable or variate

• A RANDOM VARIABLE or VARIATE is a variable that takes values in an unpredictable way.

• The values of a random variable make up a theoretical distribution or population, i.e., a probability distribution.

• Let X be a value selected at random from a normal population with mean 69 and standard deviation 2.58. The variable X is a normal random variable or normal VARIATE.

Page 133: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

133

Cumulative probability

• The cumulative probability of a value from a distribution is the probability of a value less than or equal to that value.

• The cumulative probability of 75 is .99; the cumulative probability of 70 is .65 .

Page 134: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

134

Cumulative probability of 75”

Page 135: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

135

Cumulative probability of 70”

Page 136: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

136

Probability of a height in the range from 70 to 75 inches

• Just subtract the cumulative probability of 70 from the cumulative probability of 75.

Page 137: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

137

Percentiles

• A PERCENTILE is the value below which a specified proportion of the distribution lies.

• The 90th percentile is the value below which 90% of values lie.

• The 10th percentile is the value below which 10% of values lie.

• The 50th percentile (the MEDIAN) is the value below which 50% of values lie.

Page 138: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

138

The 30th and 70th percentiles

• The green areas are the cumulative probabilities of the 30th and 70th percentile values.

Page 139: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

139

The median is the 50th percentile

• The cumulative probability of the median or middle value is .50.

Page 140: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

140

95% of the distribution

• 95% of ANY distribution lies between the 2.5th percentile and the 97.5th percentile.

• BELOW the 2.5th percentile lie .025 (2.5%) of the scores.

• ABOVE the 97.5th percentile lie .025 (2.5%) of the scores.

• Outside those limits lie .025+.025 = .05 (5%) of the scores.

Page 141: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

141

95% of ANY continuous distribution lies between the 2.5th and 97.5th percentiles

Page 142: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

142

Normal distribution • A NORMAL

DISTRIBUTION is symmetrical and bell-shaped.

• If a variable is normally distributed, 95% of values lie within 1.96 standard deviations (2 approx.) on EITHER side of the mean.

Page 143: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

143

The 95th percentile

• NINETY-FIVE per cent of values lie BELOW 1.64 standard deviations above the mean.

• (Because of the symmetry of the normal distribution, we can also say that 95% of values lie ABOVE the value that is 1.64 standard deviations BELOW the mean, i.e, mean – 1.64×SD.)

• These statements apply only to the normal distribution.

Page 144: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

144

The 95th percentile of a normal distribution

Page 145: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

145

The standard normal variable z

• Let X be a normal variable with mean μ and SD σ.

• Let z be defined as in the formula.

• z is also normally distributed, and is known as the STANDARD NORMAL VARIABLE.

Page 146: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

146

Mean and standard deviation of the standard normal distribution

• We have seen that the effect of standardising scores is to centre the distribution on zero and produce a variance and standard deviation of 1.

• Thus the standard normal distribution has a mean of zero and an SD of 1.

Page 147: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

147

Standard normal curve

Page 148: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

148

Any normal distribution can be transformed to the standard normal distribution by subtracting the mean

from each value and dividing the difference by the standard

deviation.

Page 149: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

149

The standard normal distribution

Page 150: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

150

Questions about probability

• Questions about the probabilities of ranges of values of a normally distributed random variable can always be rephrased in terms of the standard normal distribution.

• Just convert the raw values to z scores by subtracting the mean and dividing by the standard deviation.

Page 151: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

151

A question about IQ

• The IQ measure has an approximately normal distribution, with a mean of 100 and a standard deviation of 15.

• If 1000 people are drawn at random from the population, how many of them can we expect to have IQs greater than 130?

Page 152: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

152

Solution

• Transform 130 to z (2).

• A proportion of .025, that is, 25 in a thousand values, are at least as large as 130.

Page 153: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

153

Taking samples

• Suppose I take 16 people’s IQs and calculate the mean. It might be 95.1 . I take another 16 people and find that their mean is 102.6 .

• I draw a total of 4000 samples, calculating the value of the mean each time.

• The means will vary considerably, but not so much as the original distribution of IQs.

Page 154: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

154

The mean is a random variable

• A random variable X is one whose values are not predictable. One can only assign probabilities to ranges of its values.

• A statistic such as the mean M, since its value depends upon the values of X selected for the sample, is also a random variable or variate.

• The variate M has a distribution of its own.

Page 155: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

155

Sampling distribution

• The probability distribution of a STATISTIC (such as the mean or the variance) is known as its SAMPLING DISTRIBUTION.

• If X is normally distributed, then so is M.• If we can specify the sampling distribution

of M by giving a value to its SD, we can assign probabilities to ranges of values for M.

Page 156: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

156

The IQ distribution

Page 157: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

157

Drawing to scale

• If I request a histogram of the sampling distribution of the mean, it will look similar to the histogram of IQ.

• But if I ask for BACK-TO-BACK HISTOGRAMS, we can compare the two distributions drawn to the same scale.

• In the following figure, the distribution on the right is the sampling distribution of the mean.

Page 158: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

158

Back-to-back histograms

Page 159: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

159

Shape of the sampling distribution

• It’s narrower than the original distribution. • The standard deviation has been much

reduced. • The areas of both distributions are the

same (unity, 100%, or a probability of one).

• But values of the mean are particularly thick on the ground in the region of the population mean value of the IQ, that is 100.

Page 160: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

160

Sampling distribution of the mean

Page 161: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

161

Standard error of the mean

• The STANDARD ERROR of a statistic is the standard deviation of its SAMPLING or PROBABILITY distribution.

• It is called the standard “error” because, if a sample value were to be used as an estimate of the corresponding parameter (the population mean), the estimate would be, to at least some degree, wide of the mark.

Page 162: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

162

Standard error of the mean

• If we draw samples of size n from a normal distribution with mean μ and standard deviation σ, the standard error of the mean σM is given by

Page 163: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

163

Standard error of the mean

σstandard error of the mean

153.75

16

M

n

Page 164: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

164

Sample size

• As the sample size n increases, the denominator of the formula increases and the standard error of the mean is reduced.

• The distribution becomes taller and narrower.

• The effect of increasing the size of the sample is to reduce the dispersion or variance of the sampling distribution of the mean.

Page 165: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

165

Effect of increasing the sample size n

μ

The IQ distribution

Sampling distributions of the mean for n = 16 and n = 64.

n = 64

n = 16

Page 166: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

166

Referring to z

• A question about a range of values of ANY normally distributed variable can always be translated into a question about a range of values of the standard normal variable z.

• Just subtract the mean and divide by the standard deviation.

• BUT if your question is about a range of values for the MEAN, you must divide by the STANDARD ERROR, not the original population SD.

Page 167: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

167

Question

• If I select 9 IQs at random and take their mean M, what is the probability that M is at least 110?

Page 168: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

168

Convert values to z

• This question is about a mean, so we must refer to the sampling distribution of the mean.

• The standard error or the mean is 15 divided by the square root of 9, that is, 5.

• If M = 110, z = (110 – 100)/5 = 2.

• So we want the probability of a value of z of more than 2.

Page 169: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

169

Referring to the standard normal distribution

Page 170: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

170

Answer

Page 171: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

171

Important!

• If your question is about MEANS, divide by the STANDARD ERROR OF THE MEAN σM, not the standard deviation of the original population.

Page 172: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

172

Question

• If I select a sample of size n = 16 from the IQ population, what is the probability that the mean lies between 92.5 and 100?

Page 173: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

173

Convert values to z

• The question is about a mean, so we must use the standard error of the mean to find the z values.

• The SEM is 15 divided by root 16 (4), that is, 3.75 .

• So z = (92.5 – 100)/3.75 = –2.

• For 100, z = 0.

Page 174: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

174

Referring to the standard normal distribution

• 95% of values lie between –2 and +2.

• So green area is 47.5%.

• The probability is .475 .

Page 175: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

175

Two populations

• Suspend your disbelief and suppose that two barrels each contain millions of tickets, on each of which is the value of an IQ. So each barrel contains a normal distribution with mean 100 and SD 15.

• I draw a sample of size 16 from each barrel and calculate the means M1 and M2.

• I also calculate the difference M1 – M2 and put it in a third barrel.

• The process is repeated millions of times. • The third barrel now contains the sampling distribution of

the DIFFERENCE (between means). • The sampling distribution of the difference is also

normal.

Page 176: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

176

Barrels

Page 177: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

177

Another random variable

• We have seen that the sample mean M is a random variable, whose probability distribution is the sampling distribution of the mean.

• The difference between means M1 – M2 is also a random variable.

• Its probability distribution is known as the SAMPLING DISTRIBUTION OF THE DIFFERENCE (between means).

Page 178: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

178

Sampling distribution of the difference (between means)

Page 179: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

179

Variance of the difference

• We have seen that the sample means M1 and M2 are random variables.

• They are INDEPENDENT random variables – separate barrels.

• The variance of the sum OR DIFFERENCE BETWEEN independent random variables is the sum of their separate variances. (Remember that a variance cannot be negative.)

Page 180: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

180

Sampling variance of the difference

• Sampling variance of means from the first barrel:

• From the second:

• Sampling variance of M1– M2 :

• Standard error of the difference between means

Page 181: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

181

Standard error of the difference

Page 182: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

182

In our example,

Page 183: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

183

Question

• I draw a sample of size 16 from each of two identical IQ distributions, with mean 100 and SD 15.

• What is the probability that the difference (M1 – M2) is at least +10.61 ?

• What is the probability of a difference in EITHER direction?

Page 184: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

184

Answer

• The question is about a difference between means, so we must refer to the sampling distribution of the difference.

• We have found that the standard error of the difference is 5.3033 .

• As usual, we convert the value to z: • z = (10.61 – 0)/5.3033 = +2 . • So we want the probability of a value of z

at least as great as +2.

Page 185: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

185

Referring to the standard normal distribution

• We know that .025 (2.5%) of the distribution lies above z = 1.96 (2 approx).

• So the probability of a difference greater than + 10.61 is .025.

• The probability of a difference this large in EITHER direction

is .025 × 2 = .05 .

Page 186: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

186

Summary

• The three most important properties of a distribution are LEVEL, SPREAD and SHAPE.

• Several measures of these properties were discussed. • The notion of population was introduced and the notion

of probability introduced in that context. • The concept of a sampling distribution was introduced.• The sampling distributions of the mean and of the

difference between means were discussed. • Questions about the probabilities of ranges of values for

the mean and difference between means can be answered with reference to the standard normal distribution.

Page 187: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

187

Appendix

PROBABILITY

Page 188: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

188

An experiment of chance

• An EXPERIMENT OF CHANCE is a procedure with an uncertain outcome, such as tossing a coin or rolling a die.

• The classical notion of PROBABILITY arises in the context of an experiment of chance.

Page 189: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

189

The sample space

• Consider an experiment of chance in which a coin is tossed and a die is rolled.

• There are twelve possible outcomes, which can be set out in an array called a SAMPLE SPACE (S).

• Each outcome is known as an ELEMENTARY EVENT.

• The number of elementary events, n(S), is 12.

Page 190: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

190

Drawing of the sample space

Page 191: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

191

Drawing of an event space

Page 192: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

192

The classical definition revisited

• Let E be “a one or a two on the die”.

• Then n(E) = 4.

• Following the classical definition of a probability

Page 193: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

193

Complementary events

• Two elements are complementary if they are

1. Mutually exclusive; 2. Exhaustive.• If E is “a one or a two on the die”, the

event “not E”, which is denoted by Ē, is “any other number on the die”.

• Events E and Ē are complementary: they have no common outcome points and they exhaust the possibilities.

Page 194: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

194

Probabilities of complementary events

• If E and Ē are complementary events, their probabilities, p and q, respectively, sum to zero.

• So p + q = 1; p = 1 – q; q = 1 – p.

Page 195: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

195

Mutually exclusive events

• Two events, A and B, are said to be MUTUALLY EXCLUSIVE if the probability of their joint occurrence is zero.

• In terms of S, the event spaces of A and B have no elementary outcome points in common.

• For example, if A is “a six on the die” and B is “a one or a two on the die”, A and B are mutually exclusive.

Page 196: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

196

Two mutually exclusive events

Page 197: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

197

The exclusive OR rule

• If A and B are two mutually exclusive events, the Probability of either occurring, that is, Prob(A or B), is the sum of their separate probabilities.

Page 198: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

198

In our example,

Page 199: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

199

Independent events

• Two events A and B are INDEPENDENT if the occurrence of either has no effect upon the probability of the occurrence of the other.

• For example, if A is “a head” and B is “a six”, A and B are independent.

Page 200: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

200

AND rule for independent events

• If events A and B are independent, the probability of their joint occurrence Prob(A and B) is the product of their separate probabilities.

• In our example,

Page 201: 1 My contact details Colin Gray Room S2 (Thursday mornings, especially) E-mail address: psy045@abdn.ac.ukpsy045@abdn.ac.uk Telephone: (27) 2234 A rapid

201

References

• Hogben, L. (1967). Mathematics for the million. London: Pan Books. Chapter 12. The Algebra of Choice and Chance.

• Ross, S, (1976). A first course in probability New York: Macmillan. Pages 20 onwards.

• Woodroofe, M. (1975). Probability with Applications. Tokyo: McGraw-Hill Kogakusha. Chapter 2 - page 38 in particular.