sadc course in statistics preparing & presenting epidemiological information: i (session 07)

25
SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

Upload: paige-silva

Post on 28-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

SADC Course in Statistics

Preparing & presenting epidemiological information: I

(Session 07)

Page 2: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session, you will be able to

• access and browse vast amounts of web-based epidemiological material

• Explain, and at a basic level discuss, a number of broad terms used to describe the quality of epidemiological and other data

• recognise issues about inaccuracy in the ascertainment of binary data

Page 3: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

3To put your footer here go to View > Header and Footer

Reference Material: 1

Interested in epidemiology and want to learn more? A free way is to type “epidemiology supercourse” into a web search engine. From several sites you can access:-

“a global repository of lectures on public health and prevention targeting educators across the world. Supercourse has a network of over 42500 scientists in 174 countries who are sharing for free a library of over 3232 lectures in 26 languages. The concept of the Supercourse and its lecture style has been described as the Global Health Network University.”

Page 4: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

4To put your footer here go to View > Header and Footer

Reference Material: 2

N.B. One of the sites is the South African Medical Research Council’s

N.B. Site includes 2 best-selling books

“Statistics at Square One” & “Epidemiology for the Uninitiated” ~ read or download the whole of these books ~ free of charge.

N.B. Some lectures wordy/easy to follow, but some are more cryptic or more medically scientific.

Page 5: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

5To put your footer here go to View > Header and Footer

Epidemiological data in general

Most data reported as graphs, charts and tables of standard sorts. Subject matter makes these epidemiological, but reporting methods are standard (see journal papers, epi. books, supercourse etc for endless examples).

Quality and appropriateness often need to be checked – our concern here.

Page 6: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

6To put your footer here go to View > Header and Footer

The critical approach

To give a good summary/presentation of some data, you need to ensure it is well-enough explained that you have answered reasonable questions about your data.

Some will be specific: about the study location, reasons why the epidemiological problem is of significance in that place etc.

Many will be generic as below: these are Q’s you should ask about other people’s study write-ups as well as answering in your own!

Page 7: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

7To put your footer here go to View > Header and Footer

General concept of reliability

This is about individual measurements. There are several concepts/measurements. One, where there is no objective measurement, is “inter-rater reliability”: concerns measures of agreement between trained observers as to the scores they give to a set of subjects e.g. students’ essays on a set theme. Test/ retest reliability concerns whether different versions of the same attainment test give the students essentially the same grades or ranks.

Page 8: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

8To put your footer here go to View > Header and Footer

Repeatability/reproducibility: 1

Especially in industrial settings, reliability idea is extended: repeatability concerns whether the same observer, using the same methods & instruments would get essentially the same answer measuring the same thing. If not measurement itself is weak.

Reproducibility concerns whether different observers (e.g. different labs) using differing methods & instruments get very nearly the same answer. If yes, the measurement is quite robust.

Page 9: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

9To put your footer here go to View > Header and Footer

Repeatability/reproducibility: 2

Note that in more complex statistical studies, variance of measurements is broken down into components of variation

attributed to separable sources e.g. method, observer, instrument, laboratory variability

OR in a survey, interviewer, community, ethnic group, & other effects.

This uses a form of the general technique called “analysis of variance” – see higher modules.

Page 10: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

10To put your footer here go to View > Header and Footer

Use of reliability

For a given scenario, need a relevant plausible check on measurement reliability to be devised, explained and used. No single standard method exists.

Often check against a “gold standard” that is too expensive to use all the time e.g. self-administered depression questionnaire results are checked for a sample of patients vs.

gold-standard diagnoses after full-scale examination by trained psychiatrist

Page 11: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

11To put your footer here go to View > Header and Footer

General concept of validity

It is fairly obvious what constitutes an adequate measurement of “height” of a child standing up straight. Instruments and procedures will affect accuracy, but concept is clear. A more “abstract” idea will be harder e.g. a set of questionnaire measures to assess “user satisfaction” with local peri-natal care provision. Do all concerned agree set of Qs cover all aspects of what may (dis-)satisfy a user ~ in brief, but comprehensive & balanced way?

Page 12: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

12To put your footer here go to View > Header and Footer

Use of validity

Much more a “social science” concept than reliability. The question, “Does the measurement system properly reflect what it should?” raises other Qs e.g. “.. According to whom?” e.g. a poor community’s ideas of wealth/poverty may not be same elements as expert economist’s idea.

Often need a multi-disciplinary, consultative check for measures of relatively “big” abstract ideas.

Page 13: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

13To put your footer here go to View > Header and Footer

Accuracy vs. precision

Repeated measures of weight on a scale may be very precise (and therefore repeatable) if scale gives almost same measurement each time an object is weighed, but can still be consistently wrong if not calibrated correctly.

If measurement precise and gives virtually the correct answer each time, it is accurate. Checking this requires reference to a gold standard to know what is correct.

Page 14: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

14To put your footer here go to View > Header and Footer

Critiquing a set of measurements

Human and veterinary epidemiologists seldom measure one thing on a subject.

Overall quality of profile of measurements often looked at to ensure readings are mutually consistent and fit with selection criteria e.g. parent’s perception of child’s development and health looked at along with height, weight, age.

Page 15: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

15To put your footer here go to View > Header and Footer

Sampling

Often a main variable in an epidemiological study will be zero/one and sample size will need to be very large OR sometimes very detailed study has to be restricted to quite a small sample. Each poses questions about how sample was chosen.

Often not well described or justified e.g. small study claims to describe a whole country though in fact conducted in 1 or 2 tiny areas!

Page 16: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

16To put your footer here go to View > Header and Footer

Large samples

Often a sample of hundreds or thousands is treated as if it were a simple random sample, though collected in a stratified and/or clustered fashion, maybe even a number of arbitrarily selected convenience samples. Ask what was the sample design, how was this taken into account in the analysis and write-up, what if anything that sample can claim to “represent”, ways in which it might be biased or untypical.

Page 17: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

17To put your footer here go to View > Header and Footer

Errors in binary data

For any data, including binary, there are possible errors in sampling from the wrong frame, sampling biases and so on, but measurement error for observations of one binary variable reduces to false negatives and false positives. The rates at which these arise can only be assessed using "the usual" data collection procedures on samples whose Yes/No status is incontrovertibly known i.e. gold standard.

See table below next slide for concepts.

Page 18: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

18To put your footer here go to View > Header and Footer

Example

Consider a population of 100,000 which in reality has 5% seropositive for HIV. All members are screened using a second-generation ELISA assay whose sensitivity is 98% [i.e. out of 100 positive individuals the test will on average detect 98] and whose specificity is 99% [i.e. out of 100 negative individuals the test will on average correctly classify 99 as negative].

Page 19: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

19To put your footer here go to View > Header and Footer

THE INFECTION STATUS

Infected Not Infected

TESTPositive

TRUEPOSITIVE

(A)

FALSEPOSITIVE

(B)

(A+B)ALL TESTPOSITIVES

RESULTNegative

FALSENEGATIVE

(C)

TRUENEGATIVE

(D)

(C+D)ALL TEST

NEGATIVES

ALL TRULYINFECTED

(A+C)

ALL TRULY NOT

INFECTED(B+D)

TOTAL POPULATION

(A+B+C)

Page 20: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

20To put your footer here go to View > Header and Footer

Measures of data quality: 1

A general outcome measure is prevalence =

all truly infected A + B

total population of interest A + B + C + D

Genuine quality measures include sensitivity =

true positives A

all truly infected A + C

and specificity =

true negatives D

all truly non-infected B + D

=

=

=

Page 21: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

21To put your footer here go to View > Header and Footer

Measures of data quality: 2

Positive Predictive Value (PPV) =

true positives A

all test positives A + B

Negative Predictive Value (NPV) =

true negatives D

all test negatives B + D

See expected numbers for the example in slide below. No statistical variation in these!

=

=

Page 22: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

22To put your footer here go to View > Header and Footer

THE INFECTION STATUS

Infected Not Infected

ELISA

PositiveTRUE

POSITIVE4,900(A)

FALSEPOSITIVE

95,000- 94,050(B)

ALL ELISA POSITIVE

5,850(A+B)

RESULTNegative

FALSENEGATIVE

100(C)

TRUENEGATIVE

94,050(D)

ALL ELISA NEGATIVE

94,150(C+D)

ALL TRULYINFECTED

5,000(A+C)

ALL TRULY NOT INFECTED100,000-5,000

(B+D)

TOTAL POPULATION

100,000(A+B+C+D)

Page 23: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

23To put your footer here go to View > Header and Footer

Example results

Prevalence = 5000/100,000 = 5%

Sensitivity = 4900/5000 = 98%

Specificity = 94060/95000 = 99%

~ all as assumed & built into the table.

PPV = A/(A+B) = 4900/5850 = 83.8%*

NPV = D/(C+D) = 94,050/94150 = 99.89%

*so about 1/6th of all positives are false positives because of large proportion of uninfected in the overall population.

Page 24: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

24To put your footer here go to View > Header and Footer

Example conclusions

These results are from thinking about what might be expected given certain supposed rates in the population. Generally not knowable in reality, but represent need for caution in interpreting real binary data!

If these were the real figures we would expect on further examination to find 1/6 positives from initial testing were NOT real cases. Concepts of error/accuracy important as for measurement data!

Page 25: SADC Course in Statistics Preparing & presenting epidemiological information: I (Session 07)

25To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives

are achieved…