[email protected] karolinska institutet and ... filescientific travels and social activities....
TRANSCRIPT
About consensus in medicine
Rodrigo Arriagada
[email protected]@ki.se
Karolinska Institutet and University Hospital,Radiumhemmet, Stockholm, Sweden
Institut Gustave-Roussy (IGR), Villejuif, andUniversité Paris-Sud, France
III Chilean Breast Cancer Consensus,Coquimbo, Chile
Consensus
- Latin: “consentire”- Agreement
- Wide meaning:
Definition
- Wide meaning:- Political (base of democracy)- Social (base of conviviality)- Sexual (base of reproduction)
- Also in medicine ?
Consensus in science
- Science is certainly non-based in consensus
At the opposite:- As more than 95% of common
scientists do not really understand the relativity or quantics theories,
Meaningless
the relativity or quantics theories, these theories should have been considered as false
- However, these theories have conditioned most of the scientific and technological progress in the XXth century
Consensus in science ?
- Consensus ǂ Science
- What about medicine?
- Is more related to science ?
Certainly not
- Is more related to science ?
- Or more related to politics ?
- Or more related to social or sexual activities?
- This is the DEBATE
Consensus in medicine ?
- Medicine is an heteregenous human activity
- Since more than a century partially based on science
Why not ?
- However, other main factors- Personal experience- Superstition or partial ignorance- Empathies- Economics
� Number of consultations or examinations� Scientific travels and social activities
Medicine based on science
- Not only based on observations (Antiquity, Middle Age)
- But hypotheses and serious testing at different levels of knowledge (going deeper and deeper)
It has been a long way
levels of knowledge (going deeper and deeper)
- C. Darwin (The Origin of Species, 1859)
- R. Virchow (Die Cellular-pathologie, 1858)“Every cell is derived from a pre-existing cell”
Tumours are derived from pre-existing cells in the body
Medicine based on science
- G. Mendel (Laws of inheritance through genes, 1866)
- W. Flemming (Chromosomes, 1879)
It has been a long way
- J. Watson and F. Crick (DNA double helix, 1953)
Molecular biology
Medicine based on science
- The acquired biological knowledge should move to be tested in clinical series
- Importance of statistical methodology
Biology ���� Medicine
- First randomised trials started in UK (Streptomycin)
- Danger of historical series or retrospective studies
- One clear hypothesis ���� strict testing
- Not all doctors like this (long) procedure
Non-randomised studiesPrognostic and predictive factors
• Non-randomised series may be used for prognostic studies
� Large numbers� Use of correct and well described multivariate
analysesanalyses� Exclude treatment-related factors
• Predictive factors (of a treatment effect)� Can only be analysed in a large randomised
material� Or after pre-operative treatment if tumour
response is the main criterion
Non-randomised studiesQuantification of treatment effect
Anticoagulant treatments in CV disease
Type of study Mortality reduction (SD)
Historical control 62 % (4)Historical control 62 % (4)
Concurrent control 34% (7)
Randomized trial 22% (8)
Hypothesis-generating studies in subgroupsExample: Good vs poor adherers
• Coronary Drug PRG:� Role of clofibrate in men� 5-yr mortality: 20.0% (1,103 treated) vs 20.9% (2,789 placebo). P = 0.55
• However: � Mortality in good adherers (≥≥≥≥ 80% of dose): 15.0% vs 24.6%. P = 0.00011
• It was not the original hypothesis: “fishing”
N Engl J Med 303: 1038, 1980
Hypothesis-generating studies in subgroupsExample: Good vs poor adherers
• What did happen to placebo patients ?
• 5-yr mortality
� Good adherers: 15.1%� Bad adherers: 28.3%
� P = 4.7 x 10-16
N Engl J Med 303: 1038, 1980
Hypothesis-generating studies in subgroupsGood vs poor responders
• Many trials in oncology involve comparisons between “responders” vs “non-responders”
• If differences are found, they are taken as a • If differences are found, they are taken as a proof of treatment effect
• Response is an attribute defined post hoc by a process related to “fishing expeditions”
Subgroup analysis in randomised trialsAspirin trial: number of deaths
Sign Aspirin Placebo P value
Geminis or 150 147 NSLibraLibra
Others 564 869 < 10-7
Total deaths 804 1016 < 10-6
Total N 8587 8600
TREATMENT EVALUATION GOLD Standard: Phase III randomised
trials
The scientific evaluation requires that the
only difference between two comparable
populations be the evaluated treatmentpopulations be the evaluated treatment
Only the randomised allocation ensure a
balance of all (known and unknown)
prognostic factors between the study
populations
TREATMENT EVALUATION Control statistical errors
• Alpha or type I error: To conclude that a treatment is beneficial when actually it is not: False positive
• Beta or type II error: To conclude that a treatment is ineffective when actually it is effective: False negative
• Statistical power: 1 - beta
TREATMENT EVALUATION Standard deviation (SD)
• 1 SD (p = 0.3): Almost “coin”
• 2 SD (p = 0.05): Dices, twice 6
The effect is only suggested, it is
moderately good for the results of a
randomised study
TREATMENT EVALUATION Standard deviation (SD)
• 3 SD (p = 0.003):
� Dices: 4 times 6
Very difficult to obtain by chance� Very difficult to obtain by chance
• 4 SD (p = 0.0001)
� Scientific corroboration
False positive or negative results
• Until recent years, doctors cared only to have false
positive results
� The conventional 0.05 is still dangerous
� One out of twenty possibilities to be wrong
� Less than 1 percent is more comfortable� Less than 1 percent is more comfortable
• What about false negative results ?
� Doctors thought that they were not relevant to
medical care
� e.g. Adjuvant tamoxifen
Tamoxifen in early breast cancerRandomised trials
N deaths Deleterious Beneficial BeneficialNS NS +++
< 100 3 15 < 100 3 15
100 - 250 6
251 - 750 2 2
Relative and absolute effects on survivalExamples of typical effects
Site Treatment HR Survivalbenefit
Breast Tamoxifen 0.83 6.2 % 10 yr
Breast Chemotherapy 0.84 6.3 % 10 yr
SCLC Thoracic RT 0.86 5.4 % 3 yr
NSCLC Cisplatin-based 0.87 4 % 2 yrchemotherapy
Moderate treatment effects mattersExample of multiplicative effects in early breast
cancer
• Mortality reduction:
� 0.90 adjuvant radiotherapy in N+� 0.90 adjuvant radiotherapy in N+
� 0.70 adjuvant tamoxifen x 5 yrs in ER+
� 0.80 adjuvant chemotherapy in young N+ pts
• In total: 0.50 mortality reduction
Coming back to consensus in medicine
- Science-based medicine does not constitute a problem
- Except if results are ambiguous or heterogeneous���� controversies
- With these informations it is possible to define areas of consensus and controversies
Procedures
- However, most informations are not scientifically produced or cleanly tested
- So ���� definition of the “state of the art”- Then: other factors intervene:
- Politics & social- Economics- Personal interests- Fashion
Are consensus dangerous?Potentially it could be
• Regarding radiotherapy indications (conclusion) in the 2007 St Gallen consensus:
• “I personally feel that these kinds of • “I personally feel that these kinds of recommendations apparently not based on sound science could be dangerous, as they are sometimes and in some places followed worldwide without further discussion”
Arriagada R. Ann Oncol, Nov 2007
Are consensus dangerous?Potentially less dangerous / answer
• Regarding radiotherapy indications (conclusion) in the 2007 St Gallen consensus:
• “recommendations such as those of the St Gallen • “recommendations such as those of the St Gallen Panel, on the basis of the discussions and consensus among an international panel of opinion leaders, are arguably less “dangerous” than assertions by individuals upon selected studies”
Goldhirsch A. et al. Ann Oncol, Nov 2007
Are consensus dangerous?“Assertions by individuals upon selected studies”
• “Selected studies”:
• Largest randomised trials on post-mastectomy radiotherapyradiotherapy
• Large breast cancer database
• Worlwide randomised evidence *
* Still unpublished
Are consensus dangerous?“International panel of opinion leaders”
• “Selected opinion leaders”:
• Certainly recognised experts in their fields• However, how were they chosen by the • However, how were they chosen by the
organisers?• Friendship• Close collaborators• Compromise• Similar points of view and similar personal
biases
���� SELECTION BIAS
Unbiased consensusProposal
• To randomise 40-50 experts among 2,000 – 3000 in the whole world
• Constitute 4 groups of 10 -12 randomised experts • Constitute 4 groups of 10 -12 randomised experts to work in the consensus
• Compare the 4 consensus to define real areas of consensus and controversies
• “Opinion leaders”: recognition that is a matter of opinion and not science-based
Consensus and systematic reviewsRecent examples
• Thresholds for therapies: St Gallen 2009 *• ASCO Guideline update on Pharmacologic
interventions for breast cancer risk reduction **• ASTRO consensus statement on APBI ***• ASTRO consensus statement on APBI ***• Danish Systematic review on APBI ****• British review: BCS, systemic therapies and
radiotherapy indications *****
* Goldhirsch A et al. Ann Oncol, June 17, 2009** Visvanathan K et al. J Clin Oncol 27: 3235-58, 2009*** Smith BD et al. Int J Radiat Oncol 74: 987-1001, 2009**** Offersen BV et al. Radiother Oncol 90: 1-13, 2009***** Mannino M et al. Radiother Oncol 90, 14-22, 2009
St Gallen 2009 ConsensusThresholds for therapies
• Panel of 43 experts• Emphasis on targeting adjuvant systemic therapies
according to subgroups defined by predictive markersmarkers
• ER and HER2 must be reliably and accurately measured
• Proliferation markers, including those identified in multigene array analyses, were recognised as important
* Goldhirsch A et al. Ann Oncol, June 17, 2009
St Gallen 2009 ConsensusThresholds for therapies
Treatment Indication Comments
Hormonal Any ER + ER- & PR+ probably artefactual
Anti HER2 ASCO/CAP HER2+ May use clinical trialdefinition
ChemotherapyChemotherapyHER2+ HT + antiHER2 without
CT in ER+++, logical but unproven
Triple (-) No proven alternativeER+, HER2- Depending on risk Table 3
Goldhirsch A et al. Ann Oncol, June 17, 2009
St Gallen 2009 ConsensusER+, HER2 (-): relative indications
Factors HT + CT Not useful HT alone
ER & PR Lower levels Higher levelsHist. grade 3 2 1
Proliferation High Intermediate Low
Nodes 4 N+ or + 1-3 N+ N(-)Nodes 4 N+ or + 1-3 N+ N(-)
PVI Extensive Absence
pT size > 5 2.1 – 5 <= 2
Gene signature High score Intermediate Low
Pt preference All available To avoid side
effects
Goldhirsch A et al. Ann Oncol, June 17, 2009
Accelerated Partial Breast Irradiation (APBI)Consensus / Reviews
Consensus Indication Groups
ASTRO * Clinical practice “Suitable”“Cautionary” “Unsuitable”
Danish ** Inclusion criteria in APBI protocolsDanish ** Inclusion criteria in APBI protocolsComparibility among studies is lowMore questions emerge than answers
St. Gallen *** It should still be considered experimental
* Smith BD et al. Int J Radiat Oncol 74: 987-1001, 2009** Offersen BV et al. Radiother Oncol 90: 1-13, 2009*** Goldhirsch A et al. Ann Oncol, June 17, 2009
Fashion“À la mode” is generally attractive“À la mode” is generally attractive“À la mode” is generally attractive“À la mode” is generally attractive
What is now fashionable ?
• Enormous recent progress in molecular biologyHypothesis:� Each patient is different and should receive
a different appropriate treatment schedule
Personalised medicine
a different appropriate treatment schedule
However,���� Do we know enough about each difference?���� Do we know how to determine appropriate
personalised treatment ?
� Do we know how to test them?
Personalised medicineWhy is fashionable ?
• Advances in molecular biology are very attractive
• Personalised medicine goes well with the quite old concept of applying a different treatment for each (type of) patientpatient
• It can be certainly tested for a limited number of characteristics (ER, HER2, N-/N+, older/younger pts)
• Only if the molecular targets are perfectly known it would be possible to test the treatment effect in smaller (but always) groups of patients
• However, our knowledge is still limited
Personalised medicineExample: genomics signatures
• Different genomics signatures have been produced in different small series
• Major advantages:���� Esthetic presentations���� Esthetic presentations� High-level IF publications� Not to speak of commercial interests
• However, there are extremely few common genes among the so tested genomics signature
• Is at that time our knowledge enough solid?
Definition of new predictive factorsBIG/TRANSBIG: Mindact
US Intergroup TAILORx trial
• Thousands of N- patients to be included
• Europe: Amsterdam signature
• US: Oncotype
• Randomising adjuvant CT and HT
• Expensive +++
• Feasibility: on going
Prognostic signature challengesC. Sotiriou (Brussels)
• 10 – 20% discordance between labs
• Molecular classification: suboptimal reproducibilityreproducibility
• Fine-tuning needed
• Very small gene overlap
• Some validations
• Most prognostic genes are markers of proliferation
Statistical challenges related to micro-chips
Hopes and false positive results
S. Michiels, S. Koscielny, T. Boulet, C. Hill
Biostatistics and Epidemiology Department
16 April 2007
Example of clustering
• Breast cancer are divided in several subgroups • Breast cancer are divided in several subgroups (Perou, Nature 2000)
• Definition of a distance (e.g. Euclidien)
• Two neighbour tumours are grouped until the obtention of a tree. Branches are regrouped according to arbitrary cut-offs
• Main groups are those constituting a molecular classification: Her2+, Her2- and ER+ and Her2-et ER-
ClusteringMain flaws
• The tree is always constructed (noise) even if the data is not informative
• No consensus about the definition of subgroups• Genes to be used
• Technique of measure• Technique of measure
• Distance to be applied for sub-grouping
• Distance limits to cut branches.
• Discovery method ?• Clustering appears rather as a validation tool of an
already established classification (Michiels S et al. Brit J Cancer, 2007)
Genes identificationPioneer study (van’t Veer et al, Nature 2002)
• Identification of 70 genes defining the 5-year of metastases in breast cancer
• N: 78 patients N(-), 34 M+ at 5 years
False positive issue
• DNA micro-chip analysing 20,000 genes
• Play of chance: 1,000 genes (5% x 20 000) will wrongly appear to be related to prognosis (p< 0.05) or 200 genes with a p< 0.01
• Several criteria have been proposed to adjust the
p values (Dalmasso, Broet, Moreau. RESP 2004)
Impact of the sample size
• Re-analysis of seven pilot studies among those considered the most important in oncology for genomics signatures
• Published 1995 – 2003, with at least 60 patients
• Aims
• To quantify the reproducibility of molecular • To quantify the reproducibility of molecular signatures
• Evaluation of prognostic performances of these signatures
• To study the relation between the number of patients in the learning samples and the proportion of misclassification
Michiels S et al, Lancet 2005
Study strategy
• Leave-many-out cross-validation (Geisser. J Amer Stat 1975)
• Variation of the number of patients and for a size A in the learning sample, 500 randomisations
• Same a priori prediction rule used by van’t Veer et al.
• For each sample, search of the group of 50 genes • For each sample, search of the group of 50 genes more correlated with the prognosis
• Estimation of the proportion of misclassification with each signature in the sample constituted by the other patients
Michiels S et al, Lancet 2005
Genomics: General aspects0.2
0.3
0.4
0.5
0.6
Pro
po
rtio
n o
f m
iscla
ssific
atio
ns
Iizuka
0.2
0.3
0.4
0.5
0.6
Pro
po
rtio
n o
f m
iscla
ssific
atio
ns
Pomeroy
0.2
0.3
0.4
0.5
0.6
Pro
po
rtio
n o
f m
iscla
ssific
atio
ns
Bhattacharjee
0.2
0.3
0.4
0.5
0.6
Pro
po
rtio
n o
f m
iscla
ssific
atio
ns
Beer
20 40 60 80 100
Training set size
0.2
20 40 60 80 100
Training set size
0.2
20 40 60 80 100
Training set size
0.2
20 40 60 80 100
Training set size
0.2
20 40 60 80 100
Training set size
0.2
0.3
0.4
0.5
0.6
Pro
po
rtio
n o
f m
iscla
ssific
atio
ns
van 't Veer
20 40 60 80 100
Training set size
0.2
0.3
0.4
0.5
0.6
Pro
po
rtio
n o
f m
iscla
ssific
atio
ns
Yeoh
20 40 60 80 100
Training set size
0.2
0.3
0.4
0.5
0.6
Pro
po
rtio
n o
f m
iscla
ssific
atio
ns
Rosenwald
0.6
Prediction quality
95% CI
• Proportion of misclassification according to the number of number of patients in the learning sample
( van’t Veer, 2002)
Taille de l'échantillon d'apprentissage
Ta
ux d
e m
au
va
ise
s c
lassific
atio
ns
20 40 60 80 100
0.2
0.3
0.4
0.5
95% CI
Mean rate of misclassification
31%
Data of the validation 1 study
n=234 (55 with distant metastases)
Survival without distant metastasis, Cox model
(Dunkler, Michiels, Schemper. Eur J Cancer, 2007)
Explained variability of the pioneer study
R2
Model without factors 0%
Model with only conventional factors 16 % (±5%)
Model with only molecular signature 12% (±4%)
Model with conventional factors AND the molecular signature
19 % (±5%)
Added value of the molecular signature 3 %
Predictive factors of the effect of systemic treatmentsConclusions
• Useful predictive factors: HR, HER2• However, they explain only a part of the
variabilityvariability• They give probabilities and are not
deterministic• Biomics signatures: Hopes and a large field of
research• Clinical application: only robust results
Coming back to consensus in medicineAre consensus useful?
Certainly yes
• May avoid to read original papers
• Better quality of life for doctors, families and maybe friendsfamilies and maybe friends
• Avoid to do major mistakes ���� patient advantage
• Useful if considered as guidelines or recommendations but not as laws
Consensus limitationsStatistical versus clinical significance
• Trial(s) may show a significant statistical and clinical difference
Some others (large trials) may show a statistical • Some others (large trials) may show a statistical difference but of limited clinical impact
• In the latter case, toxicity and quality of life should be evaluated very carefully
• Full information to the patient
Consensus limitationsStatistical versus clinical significance
Trial(s) may show a significant statistical and clinical difference (HERA)
Piccart M et al. N Engl J Med 353: 1659-72, 2005
Bevacizumab + Erlotinib Bevacizumab + placebo HR
EFS 4.76 3.75 0.722p=0.0012
1,0
HR=0.722 (0.592-0.881)HR=0.722 (0.592-0.881)
Consensus limitationsStatistical versus clinical significance
Some others (large) trials may show a statistical difference but of limited clinical impact
743 patients surviving to 4 CT 0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,0
0 3 6 9 12 15 18 21
Progression-Free Survival (months)
HR=0.722 (0.592-0.881)
Log-rank P=0.0012
HR=0.722 (0.592-0.881)
Log-rank P=0.0012
Proportion W
ithout Even
t
Bev+Erlotinib (n=370)
Bev+Placebo (n=373)
373 142 58 27 15 6 18 0Bev+Placebo
370 178 81 43 20 6 3 1Bev+Erlotinib
Nb. of patients at risk:
Miller et al. abstr 2002 ASCO 2009
surviving to 4 CT cycles + bevacizumab
Evaluation +++costtoxicity
Consensus in MedicineConclusions
• Consensus ǂ science
• Consensus in medicine may be dangerous
• Consensus in medicine may be useful
• Especially if based on an intensive previous work
and full discussion in small groups of specialistsand full discussion in small groups of specialists
• It is safe if not related to politics, economics, social
or sexual activities (i.e. other kinds of consensus)
• Useful and protective for patients if considered as
guidelines or recommendations
• Need for regular updating
• Safe knowledge is elusive
What is it safe knowledge ?
Cosmos is rather stochastic and probably not deterministicprobably not deterministic
Safe knowledge ?
Cosmos is rather stochastic and probably not deterministicprobably not deterministic
Don’t be anxious, it is not new !