mark e. nunnally, md, fccm co-director, critical care fellowship and associate professor in the...

Mark E. Nunnally, MD, FCCM

Co-Director, Critical Care Fellowship and Associate Professor in the Department of Anesthesia and Critical Care University of Chicago Medical CenterChicago, Illinois

GRADE Methodology ExpertContributing Author, “Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock: 2012”

Making GRADE work: a how-to for guidelines authors

Mark E. Nunnally, MD, FCCMAssociate Professor

Department of Anesthesia & Critical CareThe University of Chicago

Course objectives I

• Translate evidence into graded recommendations.

• Identify the features that reduce or increase the quality of evidence.

Course objectives II

• Appraise clinical data to determine quality of evidence.

• Integrate quality of evidence for an intervention with costs, the balance between desirable and undesirable effects and values to determine the strength of a recommendation.

Contents

• GRADE- why?• Transparency and Certainty• The Guidelines process: a methodologist’s

perspective• GRADE- components• Summary

Conflict of interest.

I am a GRADE advisor for the Surviving Sepsis Campaign

Conflict of interest.

I am also only a consultant. YOU are the experts.

WHY GRADE?

Many guidelines, little standardizationSome inform…Some restrict…All claim to be evidence-based…

…how can we be certain a guideline is supported by the evidence?

…how can we be certain its recommendations will hold over time?

…how relevant is the recommendation to the things that matter to me?

Should we rate evidence?

• ‘Quality’ is a diluted term• Quality is a continuum• Decisions are always somewhat arbitrary• ‘Experts’ and clinicians don’t always share the

same view– This is one reason evidence and

recommendations should be separate.

Should we rate evidence?

• You need some reference• Simplicity• Transparency• Vividness

Grading of Recommendations Assessment, Development and

Evaluation

• www.gradeworkinggroup.org• International consensus document • Template for systematic reviews,

recommendations

http://www.gradeworkinggroup.org/

TRANSPARENCY AND CERTAINTY

QOE- Philosophical Bent

• We are going to make recommendations that we (or others) will subsequently change.– GRADE lets us:

• try to define how likely that is• communicate our certainty in any effect• translate findings to clinical realities, by accounting for

the costs, tradeoffs and effort behind following a recommendation

Example- Glycemic Control

• 2001: Van den Berghe publishes sentinel article: NEJM 2001, 345

• 2003-2008: Guidelines, protocols, quality metrics proposed

• 2009: NICE SUGAR• 2009-present: Re-write or retire

Be Explicit

• What are the data?• What are their limitations?• How easy is it to do something?• How confident are you in recommending?

The guidelines process: a methodologist’s perspective

Getting from evidence to guidelines

Evidence Hierarchy• Experience• Reports• Observational Studies• RCTs• Meta-analyses

Guidelines Hierarchy• Clinical biases• Experience-based

tendencies• Cost analyses• Decision analyses• Formal Guidelines

Not all guidelines are created equal

recommendation

Outcome1

Outcome2

Outcome3

Outcome4

Formulate question

Rateimportanceof outcomes

Critical

Important

Critical

Not important

Evidence Profile (GRADEpro)

Pooled estimate of effect for each outcome

Rate overall quality of evidence across outcomes

high low

1. risk of bias2. inconsistency3. indirectness4. imprecision5. publication bias

1. large effect2. dose-response3. antagonistic bias

Quality of evidence for each outcomeSelect

outcomes

High Moderate Low Very low

Formulate recommendationsFor or against an actionStrong or weak (strength)

Strong or weak:Quality of evidenceBalance benefits/downsidesValues and preferencesResource use (cost)

Wording “We recommend…” | “Clinicians should…” “We suggest…” | “Clinicians might…”

Systematic Review(outcomes across studies)

action

PICOrate down

RCT observational

rate

up

12

start

High | Moderate | Low | Very low

unambiguous clear implications for action transparent (values & preferences statement)

systematic review of evidence

Question

Evidence

Judge

PICO

Summarize

QOESOR

THE QUESTION

PICO

• Population – Ventilated patients, APACHE scores

• Intervention– Medicine, therapy, education, systems intervention

• Comparison– High(how high) versus low (how low) tidal volume

• Outcome– FBI: mortality (at what follow-up), LOS, VAP

Rating outcomes

• 7-9: critical [death, disability or both]• 4-6: important [skin breakdown, sepsis]• 1-3: limited [ileus, ICU stay]

THE EVIDENCE

Collect evidence

• Be thorough– Use explicit search strategies– Decide on published v unpublished data

• Consider gray literature in some cases– Proceedings papers– Abstracts– Clinicaltrials.gov

– ALWAYS consider comparator

Assembling Evidence is HardData have to be summarized to inform

GRADE pragmatic approach

• Get a good meta-analysis (MA)• If no MA, identify main studies• If possible, do your own MA• If no MA, describe main studies/results

– Be explicit (inclusion/exclusion, flaws)• Keep the link between recommendation and

evidence

Meta Analysis-the Good and the Bad

Don’t GRADE everything

• No plausible alternative– Surveying for infection, resuscitating shock,

practicing quality improvement

• Recommend to consider– As opposed to not considering?

• Statements lacking specificity– Intervention, Comparison, relevant Outcomes

(good and bad)

JUDGING

Judge Evidence and Recommendation

• Unique to GRADE• Related, but distinct• Recommendation must take clinical realities

into account– Costs– Burdens– Benefits/risks– Values

Recommendations

• Strength

• Direction

Have 2 Components:

GRADE COMPONENTS

Entering the GRADE meat-grinder

• RCT- High quality• Observational study- Low quality• Expert report- Very Low quality

Grade Down

Study Limitations/Risk of Bias

• Bias definition: 1. Unequal distribution of risk factors (confounders) across study groups. 2. Factors that systematically change study effects to result in a directional change in the signal.

Risk of Bias

• GRADE treats bias by individual outcomes– Pain scores- strong effect if unblinded– Mortality- effect of blinding less clear– Loss to follow-up for different outcome windows

• With multiple studies and different risks of bias, quality should be judged by the relative contribution of studies to the confidence in the effect.

Risk of Bias

• Blinding– Patient, clinician, data assessor

• Concealment of allocation• Intention-to-treat principle

– Absence negates the balance from randomization

Risk of Bias

• Stopping Early for Benefit, especially if trials have < 500 events– Brassler D, et al. JAMA, 2010;303(12):1180-7

• Selective outcome reporting– Only positive outcomes, composite results only, or

lack of pre-specified outcomes

• Loss to follow-up– Significance relates to # of events

Risk of bias- Observational Studies

• Prognosis can differ• Groups can have multiple differences:

– Time– Place– Population– Co-morbidityThis is why observational studies typically enter as

“Low” quality of evidence

Grade Down

Inconsistency

• Definition: 1. Heterogeneity. 2. Lack of similarity of point estimates or confidence intervals. 3. Variable findings unexplained by a priori hypotheses. 4. Subgroup effects that cannot be sufficiently explained.

Inconsistency

• Generally, effects are looked at in relative terms, rather than absolute– Subgroups may have different baseline rates, but

similar relative effects

Inconsistency

• Inconsistency can come from study diversity:– Populations– Interventions – Outcomes– Study methods

• Credible inconsistency may lead to split recommendations

Basic assessments of inconsistency

• Point estimates vary widely• Little or no CI overlap

• Test of heterogeneity shows a low p value– 𝛘2

• I2 is large:(P ≤ 0.10 may be sufficient)

Context

• It is only significant inconsistency if the variability would influence a clinical decision– If point estimates and CIs favor treatment over

costs/burdens/side effects, no need to downgrade

Inconsistency

• Example: • Low-dose steroids in sepsis:

– 6 studies, 3 high baseline mortality, 3 low, with difference in effect:

• Patel GP. Am J Respir Crit Care Med 2012;185:133-139

Grade Down

Indirectness

• Definition: 1. Evidence does not directly compare to the clinical question of interest. 2. Differing patients, interventions, comparisons or outcomes in available studies necessitate extrapolation of evidence to question being addressed.

Indirectness

• Examples: – Animal studies: downgrade 1 or 2 levels, in

general, but consider the relevance of the data (toxicity v therapeutic benefit)

– If drug A>B and B>C, is A>C?– Low-fat diet: US versus French population

• Setting, co-”interventions,” genetics

– Surrogate outcomes: Blood pressure control versus cardiovascular events

Indirectness

• Example: – H2RA and PPI: C. Difficile infection: observational

study not direct to critically ill patients, but with interesting effect: Very Low QOE

• Leonard J et al. Am J Gastroenterol 2007;102: 2047

Grade Down

Imprecision

• Definition: 1. High impact of random error on evidence quality. 2. Wide range of results to be expected from repetitive study. 3. Wide range in which the truth likely lies.

Imprecision

• Driven by # of events and by degree of effect• 95% confidence intervals may encompass

harm and benefit– Taken in the context of the recommendation

• More important: 95% CIs embrace absolute values that reduce our confidence in a recommendation

Use absolute effects

Toxicity

Imprecision

• Example: – NE v Vasopressin: Mortality CI wide, spanned RR =

1.• for ventricular arrhythmias, RR 0.47 (0.38, 0.58), but 21

events FRAGILE

– H2RA and pneumonia: unable to exclude harm– Negative factors may require tighter CIs:

• Side effects/toxicity• Burdens/costs

Grade Down

Publication Bias

• Definition: 1. Studies with statistically significant results more likely to be counted than negative studies. 2. Smaller, high-effect studies disproportionately impact published literature. 3. Published commercially-funded studies are more likely to be positive.

Publication Bias

• Publication: + Studies > – Studies (RR 1.78)– Hopewell S, The Cochrane Database of Systematic

Reviews, 2007.

• – Studies: delayed, obscure publication• + studies: duplicate publication• Small studies, industry sponsor ⇒

↑publication bias

Publication Bias

• How to detect? It’s more difficult than one might think.– Look for:

• Small trials• Conflicts in authors/study sponsors• Duplications• Abstracts, grey literature with negative findings• Unpublished data

– Ideally, we would trend MAs over time

Publication BiasPooled Estimate

Publication BiasSelective PublicationGreater Study LimitationsMore Restrictive/Responsive Population

Publication Bias

Publication Bias-Testing

• Tests of asymmetry• Imputing missing information• Repeated MA over time

Publication Bias-Addressing the Problem

• Thorough research– Gray Literature– FDA submissions– Abstracts, proceedings– Author Contact

• Clinicaltrials.gov– N.B: only for RCTs, not observational studies

Grade Up

Moving Up- Examples

• Very strong, consistent association; no plausible confounders, up 2 grades– insulin in diabetic ketoacidosis– antibiotics in septic shock

• Strong, consistent association with no plausible confounders up 1 grade

How to get GRADEpro on your computer?

• Cochrane IMS website• cc-ims.net/revman/gradepro/download• http://www.cc-ims.net/revman/gradepro/dow

nload• Google ‘gradepro’

http://www.cc-ims.net/revman/gradepro/download



GRADE output: Summary of Findings

GRADE output: Evidence Profile

Quality assessmentSummary of findings

Importance

No of patients Effect

QualityNo of studies Design Limitations Inconsistency Indirectness Imprecision

Other considerations

longer term (7 day) low dose (up to 300 mg/day of hydrocortisone)

glucocorticosteroids

controlRelative(95% CI)

Absolute

Mortality, 28 days

12 randomised trials

no serious limitations

serious1 no serious indirectness

no serious imprecision

none236/629 (37.5%)

264/599 (44.1%)

RR 0.84 (0.72 to 0.97)

71 fewer per 1000 (from 13 fewer to 123

fewer)

MODERATE

CRITICAL2

GI bleeding

3 randomised trials


no serious inconsistency3

no serious indirectness

serious4 none

65/827 (7.9%) 56/767 (7.3%)RR 1.12 (0.81 to

1.53)

9 more per 1000 (from 14

fewer to 39 more)

MODERATE

IMPORTANT

Superinfections

45 randomised trials


no serious inconsistency6

no serious indirectness

no serious imprecision7

none184/983 (18.7%)

170/934 (18.2%)

RR 1.01 (0.82 to 1.25)

2 more per 1000 (from 33

fewer to 46 more)

HIGH

IMPORTANT

1 Meta-regression examining the effect of severity of illness (baseline mortality) on efficacy suggested an effect - p value 0.04 using fixed effect and 0.06 using random effect model. JAMA 2009; 302:1643-1645.2 Reported for all trials3 I2=04 RR up to 1.535 need to check6 I2=8%

Question: Should longer term (7 day) low dose (up to 300 mg/day of hydrocortisone) glucocorticosteroids be used in severe sepsis and septic shock?Settings: ICUBibliography: Annane 2009

Final QOE

• High: A , ++++, ↑↑↑↑• Medium: B, +++-, ↑↑↑ • Low: C, ++--, ↑↑• Very Low: D, +---, ↑

Alternate QOE interpretation

• High- Further research very unlikely to change confidence

• Moderate- likely to have an important impact• Low- very likely to impact• Very Low- uncertain

Separate QOE and Strength of Recommendation

• Evidence: high or low quality?• likelihood estimates are true and adequate

• Recommendation: weak or strong?• confidence that following recommendation will cause

more good than harm

GRADE’s defining feature

Factors- STRONG vs WEAK

STRONG to stakeholders

• Patient: most people would want it• Clinician: most should receive, uniform

behavior• Policymaker: adopt as policy, use as quality

indicator

WEAK to stakeholders

• Patient: many people would not want it• Clinician: help patient make a balanced

decision– decision aid might be needed

• Policymaker: debate

Final Strength of Recommendations

recommendation

Outcome1

Outcome2

Outcome3

Outcome4

Formulate question

Rateimportanceof outcomes

Critical

Important

Critical

Not important

Evidence Profile (GRADEpro)

Pooled estimate of effect for each outcome

Rate overall quality of evidence across outcomes

high low

1. risk of bias2. inconsistency3. indirectness4. imprecision5. publication bias

1. large effect2. dose-response3. antagonistic bias

Quality of evidence for each outcomeSelect

outcomes

High Moderate Low Very low

Formulate recommendationsFor or against an actionStrong or weak (strength)

Strong or weak:Quality of evidenceBalance benefits/downsidesValues and preferencesResource use (cost)

Wording “We recommend…” | “Clinicians should…” “We suggest…” | “Clinicians might…”

Systematic Review(outcomes across studies)

action

PICOrate down

RCT observational

rate

up

12

start

High | Moderate | Low | Very low

unambiguous clear implications for action transparent (values & preferences statement)

systematic review of evidence

Useful Resources

• BMJ: GRADE series– GRADE Introduction:

• BMJ 2008;336;924-926

– Overview of Quality of Evidence:• BMJ 2008;336;995-998

– Translating Evidence to Recommendations:• BMJ 2008;336;1049-1051

– How to handle disagreements in guidelines panels: BMJ 2008;337:a744

Useful Resources II

• Journal of Clinical Epidemiology– GRADE Guidelines Series: 1-9. 2011– April, 2011 (64(4)): 1-4

• Intro, framing the question and outcomes, rating quality of evidence, risk of bias

– December, 2011 (64(12)): 5-9• Publication bias, imprecision, inconsistency,

indirectness, rating up

mark e. nunnally, md, fccm co-director, critical care fellowship and associate professor in the...

Documents