complex survey samples explaining the miracle: statistics and analysis in public health apheo...
TRANSCRIPT
![Page 1: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/1.jpg)
Complex Survey Samples
Explaining the Miracle: Statistics and Analysis in Public Health
APHEO Conference 2007, October 14-16, 2007
Susan Bondy,Department of Public Health Sciences,University of Toronto
![Page 2: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/2.jpg)
2
Outline
• Goals of complex survey analysis
• What is simple, what is complex– Issues and implications of complexities
• Working with software
• Tips for working with expert analysts
![Page 3: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/3.jpg)
3
What we report from surveys
• Descriptive statistics– Mean, median, counts, totals
• Measures of difference, association and effect– % diff, risk diff, OR, RR, rho, etc.
• Always reported with expression of variance– Margin of Error (MOE or +/- part)– Confidence intervals
– Point estimate versus variance
![Page 4: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/4.jpg)
4
Meet two users of survey data
The Describer The Modeller
![Page 5: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/5.jpg)
5
The describer
• Population inference is #1ALWAYS need true pop’n rep.
samples
• Sometimes just descriptive statistics (rates)
• Interest in comparisons:– monitoring and surveillance
(e.g., across time, space, sub-populations)
– Consistency is important
The modeller
• Hypothesis tests are #1
• Analyses simulate controlled experimentsRarely need true pop’n rep.
samples
• Interest in comparison:– Replication of experiments– Differences between studies
more interestingExtending and testing theory
![Page 6: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/6.jpg)
Complex samples
![Page 7: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/7.jpg)
7
Simple Random Sample
• Selection into sample is entirely at random
• Each member of pop has same chance of being in the sample
• No strata, no clusters, self-weighting
• Statistically efficient (all observations are independent – tightest margins of error)
![Page 8: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/8.jpg)
8
Complex designs
1. Selection by cluster
2. Stratification
3. Probability sample weights
4. Finite population correction
• Worst of all:– Mishmashes of all the above– & where you can’t have the information
![Page 9: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/9.jpg)
Cluster sampling
![Page 10: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/10.jpg)
10
Cluster sampling• E.g., people by FAMILY, students by CLASS, teeth by
MOUTH , etc.,
• Now WELL recognized as a problem– Non-independence means loss of statistical power (variance
understated, if ignored)
• Need:– New statistics textbooks– More expensive software
…will return to software options
![Page 11: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/11.jpg)
11
Sample logistic results
Model-based
95%CI
Linearized 95%CI
DEFF
Sex
Grade
Region 2
Region 3
Region 4
1.6
1.5
1.4
1.3
1.2
( 1.4 - 1.8 )
( 1.4 - 1.5 )
( 0.9 - 1.7 )
( 1.1 - 1.7 )
( 0.9 - 1.5 )
( 1.3 - 2.0 )
( 1.4 - 1.5 )
( 0.9 - 2.1 )
( 0.9 - 1.9 )
( 0.8 - 1.9 )
1.4
1.7
1.5
1.9
1.8
![Page 12: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/12.jpg)
12
Repeat after me:
“Failure to account for non-independence of observations, in the analysis, will always result in an underestimation of variances”
• Confidence intervals narrower…• p-values smaller… • results ‘less conservative’ …
… than they should be
![Page 13: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/13.jpg)
Stratification
![Page 14: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/14.jpg)
14
What is: stratification?
• Division of the target population into groups or layers from which samples are drawn
• e.g., Plan for reports on– Youth – Smaller pop’n regions
![Page 15: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/15.jpg)
15
Goals of stratification
1. For PLANNED descriptions of sub-populations• E.g., regions, age-groups
2. For design correction:• To prevent extreme unrepresentativeness• e.g., empty groups; extreme weights
3. To improve precision of the overall (or full pop) estimates
Implications…
![Page 16: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/16.jpg)
16
Stratification WEIGHTS
They come as a pair
![Page 17: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/17.jpg)
17
Impact of weights in analysis
• Impacts precision – a huge DEFF issue
• Other model problems– E.g., can create highly influential observations
• Restricts software and analysis choices
When, why of weights?
![Page 18: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/18.jpg)
18
Repeat after me:
“You knew clustering affected variance estimates and had to be taken into account…
Sometimes WEIGHTS have an even bigger bad effect on precision !”
Always use software and procedures specific to complex survey data, even when weighting is your only complexity.
![Page 19: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/19.jpg)
But wait a minute, I’ve been told unweighted is sometimes better
![Page 20: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/20.jpg)
20
Scenario A
People up-weighted People down-weighted
Weighted or unweighted is same slope !
![Page 21: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/21.jpg)
21
Scenario BSomething correlated with relative weights is associated with a different slope
Low educ.
Over educated
Exposure to materials
Rea
dine
ss to
qui
t
Weighted
![Page 22: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/22.jpg)
22
Scenario C
Distance from airport (km)
Annoyance ratings (%)
Weighted slope
Unweighted slope
![Page 23: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/23.jpg)
23
Scenario C
Distance from airport (km)
Annoyance ratings (%)
Weighted or unweighted curve
![Page 24: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/24.jpg)
24
Modeller’s adage
• If weighted and unweighted differ then, both are wrong
• There must be a complex relationship, or better model, to find and describe
![Page 25: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/25.jpg)
25
Pub. Hlth. Epis. are always DESCRIBERS
![Page 26: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/26.jpg)
26
Scenario BSomething correlated with relative weights is associated with a different slope
Low educ.
Over educated
Exposure to materials
Rea
dine
ss to
qui
t
Pop’n weighted is TRUE population estimate of ‘net’ or ‘average’ effect
![Page 27: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/27.jpg)
27
Model all possible interactions with age, sex and geography strata?
Yes, – Do look for effect modification where there are good
grounds (show net and specific data)
No, – In hundreds of age*sex*region strata, some random
variation by chance – In large samples lots of meaningless interactions can
be detected
– Pop average effect is still pop average effect
![Page 28: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/28.jpg)
28
Message so far…
Can never ignore:– Cluster sampling– Weighting
So, HOW to analyze data?
![Page 29: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/29.jpg)
29
2 most commonly used for complex survey variance estimation
“Taylor-Series”aka
“Linearized” variance estimation
“Bootstrap”
Usually achieved using bootstrap
replicate resampling weights
![Page 30: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/30.jpg)
30
Taylor SeriesComplex linear equations to estimate
corrected variance for every estimate• Requires assumptions about data !
–Normally distribution assumptions –Large sample sizes
• Very difficult for user to know:–when limits are being pushed–When procedure is accepted or controversial
• Requires full design information
• Even more ‘approximate’ with more complex designs
![Page 31: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/31.jpg)
31
Using “Taylor-series” type software
1) Use syntax (or even boxes) to declare the following:
• Weight variable• Stratification variable• Group unit for cluster sampling
– Primary sampling unit or PSU• (Ignore requests for finite population info)
2) Run your analysis as available in software• Using only ‘special’ commands for complex
samples
![Page 32: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/32.jpg)
32
Survey estimates
• Prevalence = 13.0 (95% CI = 10.0-16.0)
• Odds ratio = 2.1 (95% CI = 1.6-4.0)
Usual weighted point estimate Variance
calculated from a formula;
substituted in things like CIs
![Page 33: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/33.jpg)
33
Bootstrap variance weights• Sampling variability “observed” not calculated from a fixed formula
– Felt to reflect “true” sampling variability, – As due to chance alone if survey really repeated an infinite number of times
• Virtually free of assumptions– Tends to be more appropriate and conservative when assumptions for linearization fails
• Very broadly applicable
![Page 34: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/34.jpg)
34
Creation of BRR weights• Someone takes a lot of random COMPLEX sub-
samples of the full survey dataset (~500 times)
• The full algorithm for pop’n weighting is applied to each sub-sample– When obs not in sample, weight=zero– Rest re-weighted to reflect pop’n again
• RESULT– 500 weights, – When applied to full dataset, simulates taking 500 samples
again
![Page 35: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/35.jpg)
35
Bootstrapping (with weights)
• Point estimates taken from full sample– Mean = 13.0
• Same point estimate taken from 500 B.S. samples
• Observed variability in 500 B.S. estimates becomes variance for mean of 13.0.
![Page 36: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/36.jpg)
36
Survey estimates
• Prevalence = 13.0 (95% CI = 10.0-16.0)
• Odds ratio = 2.1 (95% CI = 1.6-4.0)
Usual weighted point estimate Variance reflects
OBSERVED variance in 500
estimates of prev. and OR.
![Page 37: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/37.jpg)
37
Software options (more?)Epi Info Linearized estimation only
Limited analysis options
SPSS Linearized estimation
Several analyses available
Stata Linearized or BS Weights
Good range of ‘canned’ complex analyses
SAS Linearized
Means, prop. linear and logistic (more in v10)
Wesvar Linearized or BS weights
Statistics Canada Bootvar
BS Weights,
Bonus output: CV and suppression rules
Somewhat limited analysis options
![Page 38: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/38.jpg)
38
Beware
• Stick to procedures custom-designed for complex survey samples– Will handle weights properly– Will give useful statistics, such as DEFF
• Bootstrapping without a set of BS weights– If you aren’t screaming in pain, you haven’t
got it right
![Page 39: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/39.jpg)
39
Tips for working in partnership
1. Get a geek to generate lots of useful sets of BS Weights for your survey
• e.g., your favourite standard pop’n• Does take expertise, but done once benefits many
many users
2. Get a nerd to do only your variance corrections for you
• Use your favourite software and keep very detailed programs (recodes, restrictions, etc)
• Have them repeat very defined results tables
![Page 40: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/40.jpg)
40
Table 4 Estimated precision of estimates resulting from an overall sample of 1000 residents from each of two strata of one Health Unit. Fictional smoking survey.
Percent daily smokers
Percent (95% CI) Number of cigs/day
Mean (95% CI) Whole sample Daily smokers only All ages 15-24 25+ All ages 15-24 25+ Health Unit 2000*
20% ±1.7
400* 20 ±3.8
1600* 20±1.9
380* 17±0.9
76* 17±2.1
304* 17±1.0
Rural sector 1000 20 ±2.4
200 20±5.3
800 20 ±2.7
190 17±1.3
38 17±3.0
152 17±1.5
Urban sector 1000 20 ±2.4
200 20±5.3
800 20 ±2.7
190 17±1.3
38 17±3.0
152 17±1.5
Embargoed
Not for release: Preliminary analyses pending adjustment of variance estimates to account for complex survey design
![Page 41: Complex Survey Samples Explaining the Miracle: Statistics and Analysis in Public Health APHEO Conference 2007, October 14-16, 2007 Susan Bondy, Department](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649e315503460f94b21f03/html5/thumbnails/41.jpg)
41
Q & A