Transcript
Page 1: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Distributions & Descriptive statistics

•Dr William Simpson•Psychology, University of Plymouth

1

Page 2: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Defining and measuring variables

2

Page 3: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Independent & dependent variables

• Independent variable: something we manipulate in an experiment

• Dependent variable: something we measure • By manipulating the IV, we expect to produce a

change in the DV

3

Page 4: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Scales of measurement

• variables classified according to type of scale–type of analysis depends on type of

scale

• Worst to best: Nominal, ordinal, interval, ratio

4

Page 5: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Nominal

•Nominal data: assign categorical labels to observations•Not really measurement•E.g. male/female; married/single/widowed/divorced•Numbers on football jerseys

5

Page 6: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Ordinal

•Ordinal data: values can be ranked (ordered). Categorical but rankable•E.g. small, medium, large; movie rating 1-5; Likert scale•Can only be ranked. Rating scale is not like cm. The diff between & is not nec the same as between &

6

Page 7: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• Adding a response of "strongly agree" (5) to two responses of "disagree" (2) would give us a mean of 4, but what is the meaning of that number?

7

Page 8: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Interval

•Interval data: ordinary measurement, e.g. temperature•Unlike ordinal data, we can say the diff between 1 & 2 deg C is same as diff between 4 & 5 deg

8

Page 9: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Ratio

•Ordinary measurements, but with an absolute, non-arbitrary zero point•E.g. weight, length: any scale must start at zero•deg C: not ratio, because 0 arbitrarily set at freezing pt of water

9

Page 10: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Discrete & continuous variables

• variables measured on interval & ratio scales are further identified as either:–discrete – Integers, no intermediate values. E.g.

#Smarties in a box

–continuous - measurable to any level of accuracy. E.g. Weight of Smarties contents

10

Page 11: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Frequency distributions

11

Page 12: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•We have a pile of scores•Not all scores are equally likely•How were scores distributed?

12

Page 13: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Subjects were timed (in sec) while completing a problem-solving task:•7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2

13

Page 14: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Stem & leaf

•Two components: the stem and the leaf•In problem-solving example, stem = ones, leaf = tenths•Stems range between 5 and 9

14

Page 15: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2• • 5|98• 6|821• 7|6347• 8|1182• 9|2•Key: 9|2 means 9.2

15

Page 16: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Heights in cm:154, 143, 148,139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157, 149, 146•- Put 2 digits in stem; split stems 0-4, 5-9•13|969•14|334323•14|87796•15|431•15|67•16|24•Key: 13|6 means 136

16

Page 17: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• GSR values: 23.25, 24.13, 24.76, 24.81, 24.98, 25.31, 25.57, 25.89, 26.28, 26.34, 27.09•- Round the last 2 digits•23|3•24|188•25|0369•26|33•27|1•Key: 23|3 means 23.3

17

Page 18: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Histogram

•Alternative way to look at distribution•It is like a version of stem-and-leaf turned 90 deg

18

Page 19: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Example

• Time to complete task (min):• 8 2 6 12 9 14 1 7 7 9 11 8

12 10 5 7 10 9 10 11 4 8 2 11 10 11 13 13 14 11 13 10 12 13 5 16 11 17 10 6 13 11 5 9 12 14 8 2 12 4

19

Page 20: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Sort scores into about 10 or so bins (similar to stem in stem-and-leaf)

20

Page 21: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Decide on sensible bins•Count the number of observations in each bin (length of each leaf in stem-and-leaf)•This number in each bin is called the frequency

21

Page 22: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

22

time frequency 0-1 12-3 34-5 56-7 58-9 810-11 1312-13 1014-15 316-17 2

Page 23: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•This table is then used to make the histogram•Histogram is bar chart with frequency on y axis and score on x axis•Sometimes done other ways, e.g. connect the dots (frequency distrib polygon)

23

Page 24: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

24

0

5

10

15

Fre

quen

cy

0 2 4 6 8 10 12 14 16 18 20Time (min)

Page 25: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

in R

•x<-c(8, 2, 6, 12, 9, 14, 1, 7, 7, 9, 11, 8, 12, 10, 5, 7, 10, 9, 10, 11, 4, 8, 2, 11, 10, 11, 13,13, 14, 11, 13, 10, 12, 13, 5, 16, 11, 17, 10, 6, 13, 11, 5, 9, 12, 14, 8, 2, 12, 4)•hist(x)•stem(x)•boxplot(x)

25

Page 26: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Probability distributions

•Histogram is estimate of true probability distribution•Many theoretical probability distributions exist•Basis of statistical models used to make inferences about population

26

Page 27: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Binomial distribution

• Binomial distribution is a discrete distribution• the binomial distribution applies when:

–there is a series of n trials (e.g., 10 coin tosses)

–only 2 possible outcomes per trial –outcomes are mutually exclusive (head or tail)–outcome of each trial independent of others

27

Page 28: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•The binomial distribution gives the chance of getting each total number of ‘successes’ after doing all the (binary) trials of the expt•E.g. it gives the chance of getting 1, 2, or 3 girls after giving birth to 6 children•p = p(success) = p(girl) = 0.5 each trial•q = p(failure) = p(boy) = 1-p = 0.5•n = number of trials = 6

28

Page 29: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• prob distribution where n = 6 and the prob of each outcome is 0.5 on each trial looks like:

29number of girls

probability

Page 30: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•For any probability distribution, the y-axis is given by a formula•For the binomial, it looks like this:

30

• k successes in n trials; () is binomial coefficient

• you don’t need to know it

Page 31: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Normal distribution

•Continuous probability distribution•Every probability distribution’s y-axis is given by a formula•For normal distribution, the y-axis (probability density) is:

31

Page 32: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

32

Page 33: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Descriptive statistics

33

Page 34: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•We have a pile of scores•Have made stem-and-leaf, histogram•Want to summarise further: descriptive statistics

34

Page 35: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

1. Centre (location)

•What is the ‘typical’ score? If you were to make a prediction for a new score, what would it be?

35

Page 36: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

a) Mean (average)

•Mean = sum(x)/n

36

Page 37: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Mean as balance point

•Imagine that each observation is a toy block•Place the blocks on a ruler; the position (1, 2, etc inches) represents the value•The balance point is the mean

37

Page 38: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•1 2 2 3

38

1 2 2 5 1 2 2 9

Mean is pulled towards extreme observation (outlier)

Page 39: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

b) Median

•Median is middle score; 50th percentile•useful when extreme scores (outliers) lie in one tail of distribution (skewed)•

39

Page 40: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Calculate the median

•Sort scores•If odd n, median is middle value•If even n, median is mean of 2 middle values•25 13 9 18 1 -> 1 9 13 18 25; med=13•25 13 9 18 -> 9 13 18 25•Median= (13+18)/2 = 15.5

40

Page 41: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Median and outliers

•1 2 2 3•1 2 2 5•1 2 2 9•Median = 2 in all cases

41

Page 42: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

c) Mode

•Mode is most frequently occurring score•Mean should really be used only for interval/ratio data. Mode good otherwise•E.g. mean movie rating – not really sensible. Mode sensible•Sometimes no unique mode exists (e.g. bimodal)

42

Page 43: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Bimodality can be due to mixture of two different populations (e.g. male and female)

43

Page 44: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• Mean = 9.36 Median = 10 Mode =11

44

0

5

10

15

Fre

quen

cy

0 2 4 6 8 10 12 14 16 18 20Time (min)

Time to complete task (min)

Page 45: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•mean(x)•median(x)•Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))]}•Mode(x)

45

Page 46: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Likert scale

• e.g. Brief Psychiatric Rating scale (BPRS)• Interview + observations of patient's

behaviour over preceding 2–3 days• Each item scored 0-7

46

Page 47: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• Suppose we have a new treatment• Does it reduce anxiety?• Define “anxiety” as score on Q2

47

Page 48: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• We use BPRS on lots of patients• Compare treatment and placebo• How? Find mean(treatment) vs

mean(placebo)?

48

Page 49: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

NO

49

Page 50: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• The numbers 0-7 are not really numbers!• They have only rank (order) info• Ordinal

50

Page 51: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• The “numbers” are really ordered labels: “normal”, “a bit anxious”, … , “extremely anxious”

51

Page 52: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• They lack a quantitative distance between them; calculating a mean level of anxiety for the group is not really appropriate

52

Page 53: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• It makes sense to find the mode• Most frequently occurring anxiety score

53

Page 54: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• It makes sense to measure the median: person in the middle of the group in terms of anxiety, with half the responses below and the other half above

54

Page 55: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Example

• Family-Focused Treatment Versus Individual Treatment for Bipolar Disorder: Results of a Randomized Clinical Trial

• J. Consulting & Clinical Psychology, 2003, 71, 482– 492

55

Page 56: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

56

“The psychiatrist made ratings of compliance on a 7-point Likert scale ranging from full compliance (1) to discontinued medication against medical advice (7)” p.486

Page 57: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• “On the whole, the participants were quite compliant with their medication, with at least 78% of the patients scoring within the compliant range at each assessment point” p.489

• - Must have made mistake before: 1 is bad, 7 is good compliance

57

Page 58: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• For each 3-month follow-up period, participants were placed in one of the following clinical outcome categories:

(a) relapse, defined as a rating of 6 or 7 on the BPRS/SADS-C core symptoms of depression (depressed mood, loss of interest), mania (hostility, elevated mood, grandiosity), or psychosis (unusual thought content, suspiciousness, hallucinations, conceptual disorganization) and at least two ancillary symptoms (suicidality, guilt, sleep disturbance, appetite disturbance, lack of energy, negative evaluation, discouragement, increased energy activity), or

(b) nonrelapse, defined as a score of 5 or below on all relevant BPRS/SADS-C core symptoms during the 3-month interval

58

Page 59: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

59

Page 60: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

60

Page 61: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

2. Spread (dispersion)

•Measure of centre (e.g. mean) tells what value we expect•Measure of spread tells how close a value will typically be to the centre

61

Page 62: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth
Page 63: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

a) Interquartile range

•Interquartile range (IQR) finds distance between the top 25% and bottom 25% of scores

Page 64: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Quartiles

•Quartiles divide the data into quarters•The median (Q2) divides the data into 2 piles (50% above, 50% below)•Q1 is the cutoff below which fall the bottom 25% of scores•Q3 is the cutoff below which fall the bottom 75%

Page 65: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

– Q1 has 25% of scores below it, Q2 has 50% (i.e. it is the median) and,Q3 has 75% of scores below it (25% above)

Page 66: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Finding quartiles

1. Sort the data2. Find the median = Q2 = value that

splits the data into two equal piles, half below it and half above

3. Q1 = median of lower half4. Q3 = median of upper half5. IQR = Q3 – Q1

Page 67: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•x<-c(8, 2, 6, 12, 9, 14, 1, 7, 7, 9, 11, 8, 12, 10, 5, 7, 10, 9, 10, 11, 4, 8, 2, 11, 10, 11, 13,13, 14, 11, 13, 10, 12, 13, 5, 16, 11, 17, 10, 6, 13, 11, 5, 9, 12, 14, 8, 2, 12, 4)•x<- sort(x); x•1 2 2 2 4 4 5 5 5 6 6 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 16 17

67

Page 68: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•n=50•Q2=(x[25]+x[26])/2 = (10+10)/2=10•Q1 = x[13] = 7•Q3= x[38] =12•IQR=Q3-Q1=12-7=5•We expect scores near 10, plus-or-minus 5 points

68

Page 69: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

in R

•fivenum(x)• 1 7 10 12 17•= min, Q1, Q2, Q3, max•IQR(x)•5

69

Page 70: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•boxplot(x)70

Page 71: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

b) Standard deviation

•Each point is some distance away from mean•Each distance from the mean is a deviation

•Deviation = score - mean

71

Page 72: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth
Page 73: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Each deviation contributes to the spread of the data about the mean•Is the total spread just the sum of the deviations, then? •No. Mean is a balance point, so positive and negative deviations cancel out•Can find a “sort of” average or “typical” deviation if we get rid of the signs

Page 74: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

“Average” deviation

•Average deviation actually is zero because signs cancel. Need to get rid of signs•Idea: square each deviation, average, then take (positive) square root. [RMS]•That is the standard deviation!

Page 75: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Calculating the SD

•Find the deviations•Square them•Find the average•Take the square root to undo the squaring•In symbols:

• N or n-1N

(X )2

Page 76: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

c) variance

•Variance = SD squared

•Useful for ANOVA (ANalysis Of VAriance)

76

Page 77: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Likert scale

• These “numbers” are not really numbers• Therefore cannot do operations like

subtraction, division, sqrt• Use IQR

77

Page 78: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

78

Page 79: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

79

Page 80: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Statistical Inference

•Usually we are interested in more than describing or summarising the numbers we have on hand•E.g. have a sample, calculate mean. What is mean of larger pop?•E.g. have done an expt, means differ. Is this a fluke or “real”?

80

Page 81: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

• The data we have on hand are samples from some (real or theoretical) population

• We want to make inferences about population

81

Page 82: Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Summary

•IV, DV•Nominal, ordinal, interval, ratio•Continuous, discrete•Stem & leaf, histogram•Probability distribution•Mean, median, mode•IQR, SD, variance

82


Top Related