[email protected] karolinska institutet and ... filescientific travels and social activities....

60
About consensus in medicine Rodrigo Arriagada [email protected] [email protected] Karolinska Institutet and University Hospital, Radiumhemmet, Stockholm, Sweden Institut Gustave-Roussy (IGR), Villejuif, and Université Paris-Sud, France III Chilean Breast Cancer Consensus, Coquimbo, Chile

Upload: dodien

Post on 30-Mar-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

About consensus in medicine

Rodrigo Arriagada

[email protected]@ki.se

Karolinska Institutet and University Hospital,Radiumhemmet, Stockholm, Sweden

Institut Gustave-Roussy (IGR), Villejuif, andUniversité Paris-Sud, France

III Chilean Breast Cancer Consensus,Coquimbo, Chile

Consensus

- Latin: “consentire”- Agreement

- Wide meaning:

Definition

- Wide meaning:- Political (base of democracy)- Social (base of conviviality)- Sexual (base of reproduction)

- Also in medicine ?

Consensus in science

- Science is certainly non-based in consensus

At the opposite:- As more than 95% of common

scientists do not really understand the relativity or quantics theories,

Meaningless

the relativity or quantics theories, these theories should have been considered as false

- However, these theories have conditioned most of the scientific and technological progress in the XXth century

Consensus in science ?

- Consensus ǂ Science

- What about medicine?

- Is more related to science ?

Certainly not

- Is more related to science ?

- Or more related to politics ?

- Or more related to social or sexual activities?

- This is the DEBATE

Consensus in medicine ?

- Medicine is an heteregenous human activity

- Since more than a century partially based on science

Why not ?

- However, other main factors- Personal experience- Superstition or partial ignorance- Empathies- Economics

� Number of consultations or examinations� Scientific travels and social activities

Medicine based on science

- Not only based on observations (Antiquity, Middle Age)

- But hypotheses and serious testing at different levels of knowledge (going deeper and deeper)

It has been a long way

levels of knowledge (going deeper and deeper)

- C. Darwin (The Origin of Species, 1859)

- R. Virchow (Die Cellular-pathologie, 1858)“Every cell is derived from a pre-existing cell”

Tumours are derived from pre-existing cells in the body

Medicine based on science

- G. Mendel (Laws of inheritance through genes, 1866)

- W. Flemming (Chromosomes, 1879)

It has been a long way

- J. Watson and F. Crick (DNA double helix, 1953)

Molecular biology

A. Brahme (KI), 2003

Medicine based on science

- The acquired biological knowledge should move to be tested in clinical series

- Importance of statistical methodology

Biology ���� Medicine

- First randomised trials started in UK (Streptomycin)

- Danger of historical series or retrospective studies

- One clear hypothesis ���� strict testing

- Not all doctors like this (long) procedure

Non-randomised studiesPrognostic and predictive factors

• Non-randomised series may be used for prognostic studies

� Large numbers� Use of correct and well described multivariate

analysesanalyses� Exclude treatment-related factors

• Predictive factors (of a treatment effect)� Can only be analysed in a large randomised

material� Or after pre-operative treatment if tumour

response is the main criterion

Non-randomised studiesQuantification of treatment effect

Anticoagulant treatments in CV disease

Type of study Mortality reduction (SD)

Historical control 62 % (4)Historical control 62 % (4)

Concurrent control 34% (7)

Randomized trial 22% (8)

Hypothesis-generating studies in subgroupsExample: Good vs poor adherers

• Coronary Drug PRG:� Role of clofibrate in men� 5-yr mortality: 20.0% (1,103 treated) vs 20.9% (2,789 placebo). P = 0.55

• However: � Mortality in good adherers (≥≥≥≥ 80% of dose): 15.0% vs 24.6%. P = 0.00011

• It was not the original hypothesis: “fishing”

N Engl J Med 303: 1038, 1980

Hypothesis-generating studies in subgroupsExample: Good vs poor adherers

• What did happen to placebo patients ?

• 5-yr mortality

� Good adherers: 15.1%� Bad adherers: 28.3%

� P = 4.7 x 10-16

N Engl J Med 303: 1038, 1980

Hypothesis-generating studies in subgroupsGood vs poor responders

• Many trials in oncology involve comparisons between “responders” vs “non-responders”

• If differences are found, they are taken as a • If differences are found, they are taken as a proof of treatment effect

• Response is an attribute defined post hoc by a process related to “fishing expeditions”

Subgroup analysis in randomised trialsAspirin trial: number of deaths

Sign Aspirin Placebo P value

Geminis or 150 147 NSLibraLibra

Others 564 869 < 10-7

Total deaths 804 1016 < 10-6

Total N 8587 8600

TREATMENT EVALUATION GOLD Standard: Phase III randomised

trials

The scientific evaluation requires that the

only difference between two comparable

populations be the evaluated treatmentpopulations be the evaluated treatment

Only the randomised allocation ensure a

balance of all (known and unknown)

prognostic factors between the study

populations

TREATMENT EVALUATION Control statistical errors

• Alpha or type I error: To conclude that a treatment is beneficial when actually it is not: False positive

• Beta or type II error: To conclude that a treatment is ineffective when actually it is effective: False negative

• Statistical power: 1 - beta

TREATMENT EVALUATION Standard deviation (SD)

• 1 SD (p = 0.3): Almost “coin”

• 2 SD (p = 0.05): Dices, twice 6

The effect is only suggested, it is

moderately good for the results of a

randomised study

TREATMENT EVALUATION Standard deviation (SD)

• 3 SD (p = 0.003):

� Dices: 4 times 6

Very difficult to obtain by chance� Very difficult to obtain by chance

• 4 SD (p = 0.0001)

� Scientific corroboration

False positive or negative results

• Until recent years, doctors cared only to have false

positive results

� The conventional 0.05 is still dangerous

� One out of twenty possibilities to be wrong

� Less than 1 percent is more comfortable� Less than 1 percent is more comfortable

• What about false negative results ?

� Doctors thought that they were not relevant to

medical care

� e.g. Adjuvant tamoxifen

Tamoxifen in early breast cancerRandomised trials

N deaths Deleterious Beneficial BeneficialNS NS +++

< 100 3 15 < 100 3 15

100 - 250 6

251 - 750 2 2

Relative and absolute effects on survivalExamples of typical effects

Site Treatment HR Survivalbenefit

Breast Tamoxifen 0.83 6.2 % 10 yr

Breast Chemotherapy 0.84 6.3 % 10 yr

SCLC Thoracic RT 0.86 5.4 % 3 yr

NSCLC Cisplatin-based 0.87 4 % 2 yrchemotherapy

Moderate treatment effects mattersExample of multiplicative effects in early breast

cancer

• Mortality reduction:

� 0.90 adjuvant radiotherapy in N+� 0.90 adjuvant radiotherapy in N+

� 0.70 adjuvant tamoxifen x 5 yrs in ER+

� 0.80 adjuvant chemotherapy in young N+ pts

• In total: 0.50 mortality reduction

Coming back to consensus in medicine

- Science-based medicine does not constitute a problem

- Except if results are ambiguous or heterogeneous���� controversies

- With these informations it is possible to define areas of consensus and controversies

Procedures

- However, most informations are not scientifically produced or cleanly tested

- So ���� definition of the “state of the art”- Then: other factors intervene:

- Politics & social- Economics- Personal interests- Fashion

Are consensus dangerous?Potentially it could be

• Regarding radiotherapy indications (conclusion) in the 2007 St Gallen consensus:

• “I personally feel that these kinds of • “I personally feel that these kinds of recommendations apparently not based on sound science could be dangerous, as they are sometimes and in some places followed worldwide without further discussion”

Arriagada R. Ann Oncol, Nov 2007

Are consensus dangerous?Potentially less dangerous / answer

• Regarding radiotherapy indications (conclusion) in the 2007 St Gallen consensus:

• “recommendations such as those of the St Gallen • “recommendations such as those of the St Gallen Panel, on the basis of the discussions and consensus among an international panel of opinion leaders, are arguably less “dangerous” than assertions by individuals upon selected studies”

Goldhirsch A. et al. Ann Oncol, Nov 2007

Are consensus dangerous?“Assertions by individuals upon selected studies”

• “Selected studies”:

• Largest randomised trials on post-mastectomy radiotherapyradiotherapy

• Large breast cancer database

• Worlwide randomised evidence *

* Still unpublished

Are consensus dangerous?“International panel of opinion leaders”

• “Selected opinion leaders”:

• Certainly recognised experts in their fields• However, how were they chosen by the • However, how were they chosen by the

organisers?• Friendship• Close collaborators• Compromise• Similar points of view and similar personal

biases

���� SELECTION BIAS

Unbiased consensusProposal

• To randomise 40-50 experts among 2,000 – 3000 in the whole world

• Constitute 4 groups of 10 -12 randomised experts • Constitute 4 groups of 10 -12 randomised experts to work in the consensus

• Compare the 4 consensus to define real areas of consensus and controversies

• “Opinion leaders”: recognition that is a matter of opinion and not science-based

Consensus and systematic reviewsRecent examples

• Thresholds for therapies: St Gallen 2009 *• ASCO Guideline update on Pharmacologic

interventions for breast cancer risk reduction **• ASTRO consensus statement on APBI ***• ASTRO consensus statement on APBI ***• Danish Systematic review on APBI ****• British review: BCS, systemic therapies and

radiotherapy indications *****

* Goldhirsch A et al. Ann Oncol, June 17, 2009** Visvanathan K et al. J Clin Oncol 27: 3235-58, 2009*** Smith BD et al. Int J Radiat Oncol 74: 987-1001, 2009**** Offersen BV et al. Radiother Oncol 90: 1-13, 2009***** Mannino M et al. Radiother Oncol 90, 14-22, 2009

St Gallen 2009 ConsensusThresholds for therapies

• Panel of 43 experts• Emphasis on targeting adjuvant systemic therapies

according to subgroups defined by predictive markersmarkers

• ER and HER2 must be reliably and accurately measured

• Proliferation markers, including those identified in multigene array analyses, were recognised as important

* Goldhirsch A et al. Ann Oncol, June 17, 2009

St Gallen 2009 ConsensusThresholds for therapies

Treatment Indication Comments

Hormonal Any ER + ER- & PR+ probably artefactual

Anti HER2 ASCO/CAP HER2+ May use clinical trialdefinition

ChemotherapyChemotherapyHER2+ HT + antiHER2 without

CT in ER+++, logical but unproven

Triple (-) No proven alternativeER+, HER2- Depending on risk Table 3

Goldhirsch A et al. Ann Oncol, June 17, 2009

St Gallen 2009 ConsensusER+, HER2 (-): relative indications

Factors HT + CT Not useful HT alone

ER & PR Lower levels Higher levelsHist. grade 3 2 1

Proliferation High Intermediate Low

Nodes 4 N+ or + 1-3 N+ N(-)Nodes 4 N+ or + 1-3 N+ N(-)

PVI Extensive Absence

pT size > 5 2.1 – 5 <= 2

Gene signature High score Intermediate Low

Pt preference All available To avoid side

effects

Goldhirsch A et al. Ann Oncol, June 17, 2009

Accelerated Partial Breast Irradiation (APBI)Consensus / Reviews

Consensus Indication Groups

ASTRO * Clinical practice “Suitable”“Cautionary” “Unsuitable”

Danish ** Inclusion criteria in APBI protocolsDanish ** Inclusion criteria in APBI protocolsComparibility among studies is lowMore questions emerge than answers

St. Gallen *** It should still be considered experimental

* Smith BD et al. Int J Radiat Oncol 74: 987-1001, 2009** Offersen BV et al. Radiother Oncol 90: 1-13, 2009*** Goldhirsch A et al. Ann Oncol, June 17, 2009

Fashion“À la mode” is generally attractive“À la mode” is generally attractive“À la mode” is generally attractive“À la mode” is generally attractive

What is now fashionable ?

• Enormous recent progress in molecular biologyHypothesis:� Each patient is different and should receive

a different appropriate treatment schedule

Personalised medicine

a different appropriate treatment schedule

However,���� Do we know enough about each difference?���� Do we know how to determine appropriate

personalised treatment ?

� Do we know how to test them?

Personalised medicineWhy is fashionable ?

• Advances in molecular biology are very attractive

• Personalised medicine goes well with the quite old concept of applying a different treatment for each (type of) patientpatient

• It can be certainly tested for a limited number of characteristics (ER, HER2, N-/N+, older/younger pts)

• Only if the molecular targets are perfectly known it would be possible to test the treatment effect in smaller (but always) groups of patients

• However, our knowledge is still limited

PREDICTION

All to know for a small price

Personalised medicineExample: genomics signatures

• Different genomics signatures have been produced in different small series

• Major advantages:���� Esthetic presentations���� Esthetic presentations� High-level IF publications� Not to speak of commercial interests

• However, there are extremely few common genes among the so tested genomics signature

• Is at that time our knowledge enough solid?

Definition of new predictive factorsBIG/TRANSBIG: Mindact

US Intergroup TAILORx trial

• Thousands of N- patients to be included

• Europe: Amsterdam signature

• US: Oncotype

• Randomising adjuvant CT and HT

• Expensive +++

• Feasibility: on going

Prognostic signature challengesC. Sotiriou (Brussels)

• 10 – 20% discordance between labs

• Molecular classification: suboptimal reproducibilityreproducibility

• Fine-tuning needed

• Very small gene overlap

• Some validations

• Most prognostic genes are markers of proliferation

Statistical challenges related to micro-chips

Hopes and false positive results

S. Michiels, S. Koscielny, T. Boulet, C. Hill

Biostatistics and Epidemiology Department

16 April 2007

Example of clustering

• Breast cancer are divided in several subgroups • Breast cancer are divided in several subgroups (Perou, Nature 2000)

• Definition of a distance (e.g. Euclidien)

• Two neighbour tumours are grouped until the obtention of a tree. Branches are regrouped according to arbitrary cut-offs

• Main groups are those constituting a molecular classification: Her2+, Her2- and ER+ and Her2-et ER-

ClusteringMain flaws

• The tree is always constructed (noise) even if the data is not informative

• No consensus about the definition of subgroups• Genes to be used

• Technique of measure• Technique of measure

• Distance to be applied for sub-grouping

• Distance limits to cut branches.

• Discovery method ?• Clustering appears rather as a validation tool of an

already established classification (Michiels S et al. Brit J Cancer, 2007)

Genes identificationPioneer study (van’t Veer et al, Nature 2002)

• Identification of 70 genes defining the 5-year of metastases in breast cancer

• N: 78 patients N(-), 34 M+ at 5 years

False positive issue

• DNA micro-chip analysing 20,000 genes

• Play of chance: 1,000 genes (5% x 20 000) will wrongly appear to be related to prognosis (p< 0.05) or 200 genes with a p< 0.01

• Several criteria have been proposed to adjust the

p values (Dalmasso, Broet, Moreau. RESP 2004)

Impact of the sample size

• Re-analysis of seven pilot studies among those considered the most important in oncology for genomics signatures

• Published 1995 – 2003, with at least 60 patients

• Aims

• To quantify the reproducibility of molecular • To quantify the reproducibility of molecular signatures

• Evaluation of prognostic performances of these signatures

• To study the relation between the number of patients in the learning samples and the proportion of misclassification

Michiels S et al, Lancet 2005

Study strategy

• Leave-many-out cross-validation (Geisser. J Amer Stat 1975)

• Variation of the number of patients and for a size A in the learning sample, 500 randomisations

• Same a priori prediction rule used by van’t Veer et al.

• For each sample, search of the group of 50 genes • For each sample, search of the group of 50 genes more correlated with the prognosis

• Estimation of the proportion of misclassification with each signature in the sample constituted by the other patients

Michiels S et al, Lancet 2005

Genomics: General aspects0.2

0.3

0.4

0.5

0.6

Pro

po

rtio

n o

f m

iscla

ssific

atio

ns

Iizuka

0.2

0.3

0.4

0.5

0.6

Pro

po

rtio

n o

f m

iscla

ssific

atio

ns

Pomeroy

0.2

0.3

0.4

0.5

0.6

Pro

po

rtio

n o

f m

iscla

ssific

atio

ns

Bhattacharjee

0.2

0.3

0.4

0.5

0.6

Pro

po

rtio

n o

f m

iscla

ssific

atio

ns

Beer

20 40 60 80 100

Training set size

0.2

20 40 60 80 100

Training set size

0.2

20 40 60 80 100

Training set size

0.2

20 40 60 80 100

Training set size

0.2

20 40 60 80 100

Training set size

0.2

0.3

0.4

0.5

0.6

Pro

po

rtio

n o

f m

iscla

ssific

atio

ns

van 't Veer

20 40 60 80 100

Training set size

0.2

0.3

0.4

0.5

0.6

Pro

po

rtio

n o

f m

iscla

ssific

atio

ns

Yeoh

20 40 60 80 100

Training set size

0.2

0.3

0.4

0.5

0.6

Pro

po

rtio

n o

f m

iscla

ssific

atio

ns

Rosenwald

0.6

Prediction quality

95% CI

• Proportion of misclassification according to the number of number of patients in the learning sample

( van’t Veer, 2002)

Taille de l'échantillon d'apprentissage

Ta

ux d

e m

au

va

ise

s c

lassific

atio

ns

20 40 60 80 100

0.2

0.3

0.4

0.5

95% CI

Mean rate of misclassification

31%

Data of the validation 1 study

n=234 (55 with distant metastases)

Survival without distant metastasis, Cox model

(Dunkler, Michiels, Schemper. Eur J Cancer, 2007)

Explained variability of the pioneer study

R2

Model without factors 0%

Model with only conventional factors 16 % (±5%)

Model with only molecular signature 12% (±4%)

Model with conventional factors AND the molecular signature

19 % (±5%)

Added value of the molecular signature 3 %

Predictive factors of the effect of systemic treatmentsConclusions

• Useful predictive factors: HR, HER2• However, they explain only a part of the

variabilityvariability• They give probabilities and are not

deterministic• Biomics signatures: Hopes and a large field of

research• Clinical application: only robust results

Coming back to consensus in medicineAre consensus useful?

Certainly yes

• May avoid to read original papers

• Better quality of life for doctors, families and maybe friendsfamilies and maybe friends

• Avoid to do major mistakes ���� patient advantage

• Useful if considered as guidelines or recommendations but not as laws

Consensus limitationsStatistical versus clinical significance

• Trial(s) may show a significant statistical and clinical difference

Some others (large trials) may show a statistical • Some others (large trials) may show a statistical difference but of limited clinical impact

• In the latter case, toxicity and quality of life should be evaluated very carefully

• Full information to the patient

Consensus limitationsStatistical versus clinical significance

Trial(s) may show a significant statistical and clinical difference (HERA)

Piccart M et al. N Engl J Med 353: 1659-72, 2005

Bevacizumab + Erlotinib Bevacizumab + placebo HR

EFS 4.76 3.75 0.722p=0.0012

1,0

HR=0.722 (0.592-0.881)HR=0.722 (0.592-0.881)

Consensus limitationsStatistical versus clinical significance

Some others (large) trials may show a statistical difference but of limited clinical impact

743 patients surviving to 4 CT 0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,0

0 3 6 9 12 15 18 21

Progression-Free Survival (months)

HR=0.722 (0.592-0.881)

Log-rank P=0.0012

HR=0.722 (0.592-0.881)

Log-rank P=0.0012

Proportion W

ithout Even

t

Bev+Erlotinib (n=370)

Bev+Placebo (n=373)

373 142 58 27 15 6 18 0Bev+Placebo

370 178 81 43 20 6 3 1Bev+Erlotinib

Nb. of patients at risk:

Miller et al. abstr 2002 ASCO 2009

surviving to 4 CT cycles + bevacizumab

Evaluation +++costtoxicity

Consensus in MedicineConclusions

• Consensus ǂ science

• Consensus in medicine may be dangerous

• Consensus in medicine may be useful

• Especially if based on an intensive previous work

and full discussion in small groups of specialistsand full discussion in small groups of specialists

• It is safe if not related to politics, economics, social

or sexual activities (i.e. other kinds of consensus)

• Useful and protective for patients if considered as

guidelines or recommendations

• Need for regular updating

• Safe knowledge is elusive

Are consensus based on safe knowledge ?

??

What is it safe knowledge ?

Cosmos is rather stochastic and probably not deterministicprobably not deterministic

Safe knowledge ?

Cosmos is rather stochastic and probably not deterministicprobably not deterministic

Don’t be anxious, it is not new !

Safe knowledge ?

Cosmos is rather stochastic and probably not deterministicprobably not deterministic

Don’t be anxious, it is not new !

We will do best next time!