summary statistics & confidence intervals annie herbert medical statistician research &...
TRANSCRIPT
Summary Statistics & Confidence Intervals
Annie HerbertMedical Statistician
Research & Development Support UnitSalford Royal NHS Foundation Trust
[email protected] 2064567
Timetable
Time Task
60 mins Presentation
20 mins Coffee Break
90 minsPractical Tasks in
IT Room
Outline
• Sampling
• Summary statistics
• Confidence intervals
• Statistics Packages
‘Population’ and ‘Sample’
• Studying population of interest. Usually would like to know typical value and spread of outcome measure in population.
• Data from entire population usually impossible or inefficient/expensive so take a sample(even census data can have missing values).
• Sample must be representative of population.
• Randomise!
E.g. Randomised Controlled Trial (RCT)
POPULATION SAMPLE
RANDOMISATION
GROUP 1
GROUP 2
OUTCOME
OUTCOME
Types of Data
Categorical
Example:• Yes/No• Blood Group
Graphs:• Bar Chart• Pie Chart
Summary: • Frequency (n)• Proportion (%)
Numerical/Continuous
Example:• Weight• Pain Score
Graphs:• Histogram• Box and Whisker Plot
Summary:• Mean & Standard Deviation (SD)• Median & Inter-quartile range (IQR)
Types of Average(‘Average’ - a number which typifies a set of numbers)
• Mean = Total divided by n
• Median = Middle value
• Mode = Most common value/group(rarely used)
Types of Average - Example
Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4
Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10
Mean = (1 + 3 + 4 + … + 10) ÷ 10 = 5.8Median = (6+7) ÷ 2 = 6.5Mode = 7
5th 6th 2nd 3rd 8th 9th
Median
Mean or Median?
0
5
10
15
20
-3 -1 1 3 5 7 9 11 13 15 17 19 21
0
10
20
30
40
50
0 1 2 3 4 5 6 7
Roughly Normally distributed: • Mean or median• Mean by convention
Skewed:• Median• Less affected by extreme values
Variation and Spread
• Standard Deviation (‘SD’)- Average distance from mean- Use alongside mean
• Inter-Quartile Range (‘IQR’)- Range in which middle 50% of the data lie(middle 50% when ordered)- Use alongside median
• Range- Highest and lowest value- Possibly quote in addition to SD/IQR
Types of Variation - Example
Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4
Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10
SD = 2.6 IQR = (3.75, 7.25)Range = (1,10)
IQR
5th 6th 2nd 3rd 8th 9th
Median
Standard Error
• Not the same as standard deviation.
• Calculated using a measure of variability and sample size.
• Used to construct confidence intervals.
• Not very informative when given alongside statistics or as error bars on a plot.
Sample statistic is the best guess of the (true) population value
• E.g. Sample mean is the best estimate of mean in population.
• Mean likely to be different if take a new sample from the population.
• Know that estimate not likely to be exactly right.
Confidence Intervals (CIs)
• Confidence interval = “range of values that we can be confident will contain the true value of the population”.
• The “give or take a bit” for best estimate.
• Convention is to use a 95% confidence interval (‘95% CI’).
• But also leaves 5% confidence that this interval does not contain the true value.
Example: Legislation for smoke-free workplaces and health of bar workers in Ireland: before and
after study (Allwright et al; BMJ Oct 2005)
Before
N=138
After
N=138
Difference
(95% CI)
Salivary cotinine (nmol/l)
Median
29.0 5.1 -22.7 (-26.7 to -19.0)
Any respiratory symptoms
n (%)90 (65%) 67 (49%) -16.7 (-26.1 to -7.3)
Runny nose/sneezing
n (%)61 (44%) 48 (35%) -9.4 (-19.8 to 0.9)
Example: Supplementary feeding with either ready-to-use fortified spread or corn-soy blend in wasted adults
starting antiretroviral therapy in Malawi (MacDonald et al; BMJ May 2009)
“After 14 weeks, patients receiving fortified spread had a greater increase in BMI and fat-free body mass than those receiving corn-soy blend: 2.2 (SD 1.9) v 1.7 (SD 1.6) (difference 0.5, 95% confidence interval 0.2 to 0.8), and 2.9 (SD 3.2) v 2.2 (SD 3.0) kg (difference 0.7 kg, 0.2 to 1.2 kg), respectively.”
Example: Sample size mattersWhat proportion of patients attending clinic are satisfied?
Sample size
Number satisfied
Proportion satisfied
95% CI for proportion
10 7 70% 35% to 93%
25 18 70% 50% to 88%
50 35 70% 55% to 82%
100 70 70% 60% to 79%
1000 700 70% 67% to 73%
Example: % confidence matters
Sample size = 50
No. satisfied = 35
Proportion satisfied= 70%
90% CI 58% to 81%
95% CI 55% to 82%
99% CI 51% to 85%
What proportion of patients attending clinic are satisfied?
p-values vs. Confidence Intervals
• p-value:- Weight of evidence to reject null hypothesis- No clinical interpretation
• Confidence Interval:- Can be used to reject null hypothesis- Clinical interpretation- Effect size- Direction of effect- Precision of population estimate
So… it’s not all about p-values!
• For some hypotheses p-value and CI will both indicate whether to reject it or not.
• A CI will also provide an estimate, as well as a range for that estimate.
• General medical journals prefer CI.
Statistical PackagesPackage Summary Statistics Confidence Intervals
SPSS
• Not user-friendly• Gives a large choice of statistics to calculate
Doesn’t provide a CI for some key comparative statistics:
e.g. simple percentage
Stats
Direct
• One right-click• Will produce a set 20 or so of the most commonly used statistics
Provides a CI for most statistics
Thanks for listening!