how to read a paper

How to read a paper ?

The actual title is….

How to read, analyze, and criticize a study.

Some Basic Concepts

Null hypothesis

The null hypothesis (H0) represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved.

For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug.

We would write H0: there is no difference between the two drugs on average. The alternative hypothesis (H1)is a statement of what a statistical hypothesis

test is set up to establish. The final conclusion once the test has been carried out is always given in

terms of the null hypothesis. We either "Reject H0 in favor of H1" or "Do not reject H0"; we never conclude "Reject H1", or even "Accept H1".

If we conclude "Do not reject H0", this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favor of H1.

Rejecting the null hypothesis then, suggests that the alternative hypothesis may be true.

Statistical Significance

Statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance.

In statistical testing, a result is deemed statistically significant if it is unlikely to have occurred by chance, and hence provides enough evidence to reject the null hypothesis.

As used in statistics, significant does not mean important or meaningful, as it does in everyday speech.

The amount of evidence required to accept that an event is unlikely to have arisen by chance is known as the significance level or critical p-value.

The p-value is the probability of observing data at least as extreme as that observed, given that the null hypothesis is true. If the obtained p-value is small then it can be said that either the null hypothesis is false or an unusual event has occurred.

Statistical power:

The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is false.

Power analysis can be used to calculate the minimum sample size required so that one can detect an effect of a given size.

Power analysis can also be used to calculate the minimum effect size that is likely to be detected in a study using a given sample size.

p-value

Is the probability of obtaining a test statistic similar to the one that was actually observed, if the null hypothesis was true.

One often "rejects the null hypothesis" when the p-value is less than the significance level α (Greek alpha), which is often 0.05 or 0.01.

When the null hypothesis is rejected, the result is said to be statistically significant.

The p-value is not the probability that the null hypothesis is true. The p-value is not the probability that a finding is "merely by chance" The p-value is not the probability of falsely rejecting the null hypothesis. The p-value is not the probability that a replicating experiment would not

yield the same conclusion. 1 − (p-value) is not the probability of the alternative hypothesis being true

Odds Ratio

The odds ratio is the ratio of the odds of an event occurring in one group to the odds of it occurring in another group.

An odds ratio of 1 indicates that the condition or event under study is equally likely to occur in both groups.

An odds ratio greater than 1 indicates that the condition or event is more likely to occur in the first group.

And an odds ratio less than 1 indicates that the condition or event is less likely to occur in the first group.

Confidence Interval

A confidence interval gives an estimated range of values which is likely to include an unknown population parameter.

The estimated range being calculated from a given set of sample data. Confidence intervals are usually calculated so that this percentage is 95%, but

we can produce 90%, 99%, 99.9% The width of the confidence interval gives us some idea about how uncertain

we are about the unknown parameter.

The difference between the two groups in FVL is 7.7% With a range of 2 – 13.4% How confident are we that the difference will always be within this range ? 95%

Step 1)What kind of study are we reading?

Meta-analysis

Contrasting and combining results from different studies, in the hope of identifying patterns among study results, sources of disagreement among those results, or other interesting relationships that may come to light in the context of multiple studies.

Randomized controlled trial (RCT) A specific type of scientific experiment, and

the preferred design for a clinical trial. RCT are often used to test the efficacy of various types of intervention within a patient population. RCT may also provide an opportunity to gather useful information about adverse effects, such as drug reactions.

Case-control study

Case-control studies are used to identify factors that may contribute to a medical condition by comparing subjects who have that condition/disease (the 'cases') with patients who do not have the condition/disease but are otherwise similar (the 'controls‘)

Cohort study

A cohort is a group of people who share a common characteristic or experience within a defined period (e.g., are born, are exposed to a drug or vaccine or pollutant, or undergo a certain medical procedure).

Can be retrospective, or prospective.

Cross-sectional study

Observation of all of a population (or a representative sample), at one specific point in time.

They differ from case-control studies in that they aim to provide data on the entire population under study, whereas case-control studies typically include only individuals with a specific characteristic.

Cross-sectional studies are descriptive studies.

What’s the point ?

Strength of evidence.

Level of strength of evidence, according to National Guideline Clearing House.

Level Source

1++ High-quality meta-analyses, systematic reviews of randomised controlled trials with a very low risk of bias.

1+ RCTs with a low risk of bias.

1- RCTs with a high risk of bias.

2++ High-quality case–control or cohort studies, with a very low risk of bias.

2+ Well-conducted case–control or cohort studies with a low risk of bias.

2- Case–control or cohort studies with a high risk of bias.

3 Non-analytical studies (e.g., case reports, case series)

4 Expert opinion, formal consensus

Level of strength of evidence, according to National Guideline Clearing House.

Level Source

1 Meta analysis.

2 RCT.

3 case–control or cohort studies.

4 Non-analytical studies (e.g., case reports, case series).

5 Expert opinion, formal consensus.

Step 2)Study design

Study aims, primary, and secondary outcome measures.

Study endpoints are formulated properly in the initial study protocol.

Scientific hypotheses under investigation should be pre-specified and explicitly mentioned.

Report blinding if possible (single, double, triple) . Use and report randomization. Report initial equality of baseline characteristics

and comparability of study groups. Report any drop out of follow up samples.

Step 3) Data: Types, Suitable Statistical tests.

DataNon

parametric

parametric

Nominal

Ordinal

Interval

Ratio

Variable

Discrete Continuous

Nominal data

Are given a particular name Male /Female Yes / No Order is not important Binominal, multinominal

Ordinal data

Similar to nominal data, but categories are ordered logically.

equal to, less than, or greater than Small, medium, large Easy, intermediate, difficult

Interval data

Data has a meaningful order Equal intervals between measurements

represent equal changes in the quantity of whatever is being measured

Data have no natural zero Example is Celsius scale of temperature. In

the Celsius scale, there is no natural zero, so we cannot say that 70°C is double than 35°C.

Ratio data

Ratio data has all the qualities of interval data (natural order, equal intervals) plus a natural zero point.

height, weight, length it can be said meaningfully that 10 m of

length is double than 5 m.

Any variable measured, in any type of data can be either: Discrete if there are only a finite number of

values possible (whole numbers) Continuous data makes up the rest of

numerical data (can have fractions)

Make sure that the suitable statistical test(s) was used for the collected data.

Step 4)Statistics

The only science that enables different experts using the same figures to draw different conclusions.

Evan Esar (1899–1995) American writer.

There are three kinds of lies: lies, damned lies, and statistics.

Benjamin Diserali

British Prime Minister (1804 – 1881)

If your experiment needs statistics, you ought to have done a better experiment.

Ernest Rutherford, (1871 – 1937) was a British chemist and physicist who became known as the father of nuclear physics.

I can prove anything by statistics except the truth.

George Canning (1770 – 1827) British statesman and politician.

Pitfalls of statistics…

1- Survey

Survey results in KSMC, is that 1 out of ten employees is a smoker…

How many were surveyed? Did they all respond? Male to female ratio. Occupation, age group, education. What were the questions? Did they understand the questions? Who questioned them? Bias

2- Risk Reduction

Event Rate

0

5

10

15

20

25

30

35

40

4540%

Event Rate

0

5

10

15

20

25

30

35

40

45

Event Rate:

Absolute Reduction =

40%

30%

Event Rate

0

5

10

15

20

25

30

35

40

45

Event Rate:

Absolute Reduction = 10%

Relative Reduction = 25%

40%

30%

3- Average

SAP of fifteen subjects (mmHg):90, 98, 110, 111, 113, 113, 121, 125, 127, 128,130, 132, 136, 140, 165 The Average is ? 113 121.8 122.6 125

ALL ARE CORRECT !!

Average

Mean: summation of all results, divided by their number.

Median: The middle value of arranged data. Mode: most frequently recurring value. Truncated mean: The same as mean, after

omitting the two extremes.

4- Multiple testing In 1980, researchers at Duke randomized 1073 heart disease

patients into two groups, but treated the groups equally. Not surprisingly, there was no difference in survival. Then they divided the patients into 18 subgroups based on

prognostic factors. In a subgroup of 397 patients (with three-vessel disease and an

abnormal left ventricular contraction) survival of those in “group 1” was significantly different from survival of those in “group 2” (p<.025).

How could this be since there was no treatment? (Lee et al. “Clinical judgment and statistics: lessons from a simulated randomized trial in coronary artery disease,” Circulation,

61: 508-515, 1980.)

The difference resulted from the combined effect of small imbalances in the subgroups.

A significance level of 0.05 means that your false positive rate for one test is 5%.

If you run more than one test, your false positive rate will be higher than 5%.

5- Reporting measurements withunnecessary precision

Many numbers do not need to be reported with full precision.

If a patient weighs 60 kg, reporting the weight as 60.18 kg adds only confusion.

even if the measurement was that precise. Thus, rounding numbers to two significant

digits improves communication.

6- Reporting only p-values for results

For main results, report the absolute difference between groups (relative or percent differences can be misleading).

Report the 95% confidence interval for the difference, instead of, or in addition to, p-values.

“The effect of the drug on lowering diastolicblood pressure was statistically significant (P< 0.05)”

Here, the size of the drop is not given, so its clinical importance is not known.

Also, P could be 0.049; statistically significant (at the 0.05 level) but so close to 0.05 that it should probably be interpreted similarly to a P value of 0.51, which is not statistically significant.

The use of an arbitrary cut point, such as 0.05, to distinguish between “significant” and “non significant” results is one of the problems of interpreting P values.

The drug lowered diastolic blood pressureby a mean of 18mmHg, from 110 to 92mmHg (95%CI=2 to 34mmHg; P=0.02).”

7- Unnecessarily reporting baselinestatistical comparisons in

randomized trials In a true randomized trial, each patient has an

equal probability of being assigned to either the treatment or the control group.

Thus, any differences between groups at baseline are, by definition, the result of chance.

Therefore, significant differences in baseline data do not indicate bias (as they might in other research designs).

8- Not defining “normal” or“abnormal” when reporting

diagnostic testresults

The importance of either a positive or a negative diagnostic test result depends on how “normal” and “abnormal” are defined.

In fact, “normal” has at least six definitions in medicine.

9- Confusing the “units ofobservation” when reporting and

interpretingresults

The “unit of observation” is what is actually being studied.

Problems occur when the unit is something other than the patient

For example, in a study of 50 eyes, how many patients are involved? What does a 50% success rate mean?

10- Percentage Vs Absolute

Sometimes statistics are given in absolute terms in large number studies.

They are given in percentages in small number studies.

Statistics better be reported in both ways.

We might hear that Blanko Corp. laid off 32 people or we might hear that they laid off 25% of their workforce.

Typically a news source will try to make the number sound as dramatic as it can, so if Blanko is a huge company - say it has 200,000 employees - the source might find it more impressive to say it laid off 20,000 people rather than 10% of the workforce.

If Blanko is small, say 100 employees, it sounds more impressive to say they laid off 10% rather than just 10 people.

If improvement in a study was 10% If the study included 2000 patients, it is

more impressive to report 200 improved cases rather than 10 %

But if the study included only 50 patients, it is more dramatic to report 10 % improvement, rather than just 5 patients.

Step 5) Graphs

Title

A B C D E

70605040302010

Source of data

Can graphs be manipulated to mislead you?

Title:

Great reduction of mortality rates with drug (X), compared to (Z)

Comparison of mortality rates between drugs (X) and (Z)

Y axis:

Y axis:

In a study on 3 antibiotics (A, B, C), the rate of infection with a certain organism was:

A : 972 cases B: 984 cases C: 955 cases

Misleading graph of NO differenc…

A B C

Misleading graph of: Difference

What difference do you see between the two graphs?

With line plot

3D graphs

Mortality rates

Mortality rates

Data on X axis

This graph, comparing causes of death is misleading, because it leaves out deaths from heart disease, cancer, and stroke.

The omission tends to make smoking look much more harmful by comparison.

Step 6) Discussion and conclusion

Statistical significance.

Does not guarantee clinical significance. Does not imply a cause-effect relationship. Small differences between large groups can

be statistically significant but clinically meaningless.

Failure to discuss sources of potential bias and confounding factors.

Statistical Insignificance.

Lack of statistical significance is not proof of the absence of an effect.

Results that are not statistically significant should not be interpreted as "evidence of no effect,” but as “no evidence of effect”

In studies with low statistical power, results that are not statistically significant are not negative, they are inconclusive.

Large differences between small groups can be clinically important but not statistically significant.

Watch out for…

Drawing conclusions not supported by the study data

Significance claimed without data analysis or statistical test mentioned.

Missing discussion of the problem of multiple significance testing if done.

My message…

When reading a study, read it from A to Z. Don’t just read the title and conclusion. Read it more than one time, then sit back,

and think about it.

how to read a paper

Health & Medicine

null hypothesis h0

statistical hypothesis

alternative hypothesis

critical pvalue

obtained pvalue

statistical power

reject h1

odds ratio greater