the alchemy of clinical trials
TRANSCRIPT
This is a post-peer-review version of an article publishedin Biosocieties. The definitive publisher-authenticatedversion Will, C. (2007)The alchemy of clinical trials.Biosocieties. 2, 1, 85-100. is available online at:http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1002728&fileId=S1745855207005078
Biosocieties, Special Issue, March 2007
An elusive evidence base? The construction and governance of randomised controlled trials.
The alchemy of clinical trials
Catherine M Will
Contact details:
Catherine M WillDepartment of SociologyUniversity of SussexFalmerEast SussexBN1 9SN
Email. [email protected]
Tel. (until Easter 2007) 01223 334617Fax. 01223 335993
Word count: 7000 plus abstract (120), acknowledgements (112)
and references (1238).
The alchemy of clinical trials
Abstract
This paper considers the complex construction of randomised
controlled trials that lie behind the rhetoric of the gold
2
standard. Drawing on insights from Science and Technology
Studies and empirical material, I argue that trials in the
field of cardiovascular disease prevention are constituted
as ‘research’ rather than ‘science’. In these examples,
control emerges out of a dual concern with practices of
purification and involvement of aspects of the world outside
the experiment, as the designers of trials aspire to
relevance as well as rigour. Such strategies of
‘contextualisation’ mean that trials are more like alchemy
than assay, proceeding by complex moves to transform ‘base’
matters rather than distilling elements of clinical
practice.
Keywords
clinical trial; RCT; contextualisation; boundary work;
transaction space
3
Introduction
This paper considers the complex construction of randomised
controlled trials (RCTs) by placing empirical material
alongside insights from Science and Technology Studies.
Sociological responses to evidence-based medicine (EBM) have
often expressed concerns about the application of a
hierarchy of evidence, with the RCT at its apex, to the
world of the clinic. They draw attention to the continued
use and value of observational research, expertise and
experience, existing practice and patient preference in
shaping care, yet often leave the RCT itself under-examined.
My narrow focus allows a more careful discussion of the ways
in which trials themselves generate categories of evidence
and practice in one field of medicine.
The title borrows from the conclusion of a recent paper
by De Vries and Lemmens where the authors argue that ‘the
alchemy of the clinical trial transforms EBM from a
challenger to a protector of corporate agendas’ (2006: 2704)
4
and identify what they call the ‘structural’ and ‘cultural’
sources of bias in the design of research in Dutch
obstetrics. Such detailed studies clearly chime with my own
interest in understanding the production of knowledge in
medicine, but rather than seek out bias in this article I am
interested in the ways in which modifications of the ‘pure’
world of the experiment may also be seen as strengthening
the evidence it is intended to produce. Here I take
inspiration from recent work charting the importance of
‘contextualisation’ in the research (Nowotny, Scott &
Gibbons, 2001).
These authors argue that clinical trials may become
examples of a more contextualised science when patients are
able to modify the design and organisation of medical
research, as happens in the cases discussed by writers such
as Callon and Rabeharisoa (2003) and Epstein (1996). Callon
and Rabeharisoa’s work has also been used by Latour (1998)
to argue that there is a shift from ‘science’ associated
with certainty and detachment, to ‘research’, which is
animated by involvement. Though the clinical trials examined
5
here are not the subject of patient activism, nonetheless
they do proceed by involving healthcare staff, objects and
aspects of daily organisation. As examples of public
experiments (Schaffer, 2005) the emphasis is still on the
method rather than the identity of those who apply it. But
this performance is not concentrated on the typical tasks of
assay: extracting and classifying separate elements. Instead
the method is used to accommodate continuing uncertainty
about the contents and context for specific clinical proofs,
and gains strength as much from incorporation as
purification.
The trials I use to illustrate this claim include
studies of pharmaceutical products and what become known as
‘complex interventions’ (Campbell et al, 2000) intended to
help prevent cardiovascular disease in the UK population.
These have combined limited knowledge about lifestyle
factors such as overweight, diet and exercise from the 1970s
and 1980s with the increasing use of both old and new drugs
designed to lower blood pressure and cholesterol. The
challenge is now seen as coordinating and effectively
6
applying some of these ideas to patients at particular risk,
whether they have already had heart disease (secondary
prevention) or not (primary or primordial prevention).
Cardiology itself has been the site of much of the
stabilisation of the RCT methodology in the last four
decades, especially though large-scale drug trials. In the
UK RCTs were also used relatively early to adjudicate on the
issue of the location of care, in work on the effectiveness
of the coronary care unit under the guidance of Archie
Cochrane (discussed below). However, the designers of the
studies considered in this paper continue to struggle to
produce what they would understand as robust evidence. This
is especially true when they try to apply the methodology to
further questions of organisation and improving the delivery
and application of lifestyle modification alongside
preventive medication. Yet I will argue that trials of such
complex interventions are not unusual in requiring the
coordination of hybrid groups of actors and objects,
assembled together out of what might otherwise be seen as
distinct modes of policy, practice and research.
7
As a result we should be wary of seeing RCTs as
experiments in specific disciplines, of simply following to
see how they are received by particular groups or looking to
see if they are applied to clinical practice. Instead,
trials are necessarily constituted through and with current
practice and personnel in a number of domains, and become a
kind of ‘transaction space’ between different collectives
(Nowotny, Scott & Gibbons, 2001). I attempt here to chart
some of the characteristics of this space and the
accommodations that may be performed within it. In
particular I look to see how these may work to re-animate
rather than restrict the complex relations that make up
clinical practice, since trials are designed with an eye to
relevance as well as rigour. It is the difficulty of pulling
off this balance between methodological adequacy and
useability (Moreira, 2005) – or internal and external
validity – that makes trials more like research than
science, and as much alchemy as assay.
8
The meaning of a method
Statements that the randomised controlled trial is ‘the gold
standard’ of clinical research depend on two interwoven
methodological strategies: the use of a control or
comparative approach to testing an intervention, and of
randomisation to the groups being compared. Increasingly,
randomisation is required for admissibility of evidence to
processes of further evaluation and regulation such as
systematic review, meta-analysis or guideline writing, as
professional groups and policy makers accept the position of
the RCT at the apex of evidence in healthcare. Though the
randomised comparison has therefore become the signature of
a confident empiricism, it may also be seen as a modest
response to the limits of the experimenter’s ability.
Randomised allocation, and the use of single or double
blinding are solutions to the problem of unconscious as well
as conscious bias among the trial personnel and patients
9
(Marks, 2000), while contemporaneous controls are viewed as
necessary because the experimenter cannot halt the flow of
events outside the study and therefore cannot themselves
guarantee a true comparison (Richards, 1991).
Nevertheless, the proponents of trials have seen them
as a way of improving control over healthcare in general,
managing the pace of both therapeutic innovation and growing
demand. For Archie Cochrane, often viewed as the father of
evidence-based medicine in the UK, the RCT was the best way
of ‘producing order out of … chaos’ in the National Health
Service (1972: 11), promoting ‘effectiveness and
efficiency’, the title of his famous monograph. Cochrane was
determined to persuade doctors that the methods of
randomisation and control should be seen as ‘scientific
techniques’ (1972: 67), though practitioners were often ‘too
easily bemused by evidence from what appears to be a more
basic science,’ (1972: 30) whether that was patho-physiology
or biochemistry. His confidence in these simple strategies
was clear: ‘in the sort of applied research needed the
hypotheses about effectiveness, place of treatment, length
10
of stay, are readymade. The technique will nearly always be
a RCT,’ (1972: 80).1
Cochrane’s confidence in the method and the
identification of the research question is striking. Yet as
De Vries and Lemmens (2006) point out, there is considerable
play of judgement around the choice of the object of
randomisation (which is formulated within the research
question) and of the measurements taken to stand for benefit
or harm. Such questions may become the subject of heated
debates between different clinical groups (Jones, 2000) or
between statisticians and physicians (Marks, 1997). Thus
histories of clinical research draw attention to the ways in
which trials emerge out of conflict or collaboration between
different disciplinary groups, as well as working as means
of managing interactions between doctors, researchers and
pharmaceutical companies, or demanding patients and
governments. The design of the trial is therefore often a
much more complicated matter than Cochrane’s quote might
lead us to expect. As I have learned from ethnographic
1 I am grateful to Dr Ann Kelly for drawing my attention to this passage.
11
engagement with a team carrying out a complex intervention
trial since October 2006, organising an RCT requires careful
negotiations between the requirements of rigour and
relevance, which I seek here to illustrate with examples
from published discussions of trial design. Within these it
becomes clear that agreement on the bare bones of research
methodology is not enough to deliver what Porter (1995)
describes as ‘procedural objectivity’. In the next three
sections I introduce several strategies through which
trialists attempt to strengthen their results by redrawing
the boundaries between research and practice, before
returning to the notion of transaction in my final
discussion.
Making sense of the setting of the trial –
varieties of selection
Trials need to have clear relations to practice in order to
be relevant to doctors. This understanding is built rather
explicitly into the model by which pharmaceutical products
12
are tested in studies categorised as lying between phase I
and IV. Phase 1 trials are seen as adjudicating on the
safety of drugs, usually in healthy volunteers, phase II and
III trials are used to estimate ‘efficacy’ and the way the
drug works in particular groups of patients. Finally phase
IV trials are used to judge ‘effectiveness’, the expected
outcomes when used in the target population in the clinic.
Even so the completion of such trials often sparks arguments
about the degree to which their results can be extrapolated
to groups who may not have been included in research, for
example women or the elderly. At the same time, phase IV
trials function as a kind of marketing, allowing companies
to encourage doctors to familiarise themselves with the
product in the context of the studies, with the opportunity
of creating impressive statistics for use in publicity. This
is an additional reason for companies to run such trials in
important markets, such as the United States, even when they
have clinical evidence from other countries. Companies cite
differences in diagnostic cultures, clinical organisation
and ethnic mix as reasons for carrying out such location
13
specific research, but these claims about external validity
are not always convincing or well-understood (Rothwell,
2005). In this section I will focus on ways in which the
setting of the trial is connected with particular selections
of the patients or clinicians to be included, and the
implications of this for the data produced.
Even trials recognised as phase IV pharmacological
studies may heavily select participants, generally
justifying such actions as a way to improve the likelihood
of finding a strong relationship between the hoped-for
benefit and the drug. The designers of an early cholesterol-
lowering trial in the UK, the West of Scotland Coronary
Prevention Study, celebrated the virtues of a relatively
unselected male population identified in a particular site,
for increasing both the power and clinical relevance of the
trial.
Stable hypertension, controlled diabetes mellitus
and angina pectoris (not requiring hospitalization
within the previous year) are not exclusion
14
criteria since men with these conditions would
theoretically experience benefit from cholesterol
lowering therapy over the period of the trial… We
feel that the inclusion of such subjects is
appropriate since they would not normally be
included in a Secondary Prevention Study. This
term ‘primary prevention’ is a misnomer in this
type of study, since all subjects will have
coronary artery disease to a greater or lesser
degree. The partition of all men aged 45-64 into
those who are appropriate for ‘primary’ or
‘secondary’ prevention is, to some extent,
arbitrary. (West of Scotland Coronary Prevention
Study Group, 1992: 852)
The design allowed the study’s sponsor to market their
product as a primary prevention drug in ‘otherwise healthy
patients with elevated cholesterol’ (Bristol Myers Squibb
marketing material, BMJ 25/11/1995) at the conclusion of the
trial.
15
Another Phase IV trial, the Heart Protection study of
vitamins and cholesterol-lowering for patients with clinical
evidence of existing vascular disease or diabetes similarly
claimed clinical applicability, but grounded that in the
possibilities of diagnostic classification rather than its
absence.
Such people are at particularly high risk of heart
attacks and strokes, and so have the most to gain
from a reduction in their risk [its website
claimed]… they are also easily identified from
existing medical records, so it should be
straightforward for doctors to use the findings
for the care of their patients. (Heart Protection
Study, 2002a: para. 7)
Yet the Study also had a long list of exclusion criteria
relating to contact between the trial and the patient or
doctor in question and reported that only ‘compliant
individuals… were randomly allocated. Of those who entered
16
the run in, 36% were not subsequently randomised [and] 26%
chose not to enter the trial or did not seem likely to be
compliant for 5 years,’ (Heart Protection Study, 2002b: 10).
Despite these selections, the organisers celebrated the
variety of patients included in the trial, which enrolled
women and the elderly in unusual numbers across the UK. ‘The
study’s size and the wide range of high-risk patients
included, means that doctors now have evidence that is
uniquely clear and reliable,’ (Medical Research Council,
2001). Claims to generalisability may thus be based on
different choices of patient selection in phase IV studies,
grounded in either an implicit assumption of diversity
captured in a pragmatic sample or an explicit and
disciplined hunt for variety.
A similar choice is evident in trials of complex
interventions, which are less easy to identify with any
specific phase. The Oxcheck/ICR study, reporting the year
before the West of Scotland results were released, accessed
a similar number of participants in a trial designed to
‘assess the effectiveness of health checks by nurses in
17
reducing risk factors for cardiovascular disease in patients
from general practice’ recruiting from just five urban
general practices in Bedfordshire. Randomisation between
being offered the intervention of a nurse check up or not
happened early in the process, among 11,000 men and women
who responded to an initial questionnaire. Despite the small
number of practices, this was celebrated as ‘stopp[ing] the
intervention from being too unrealistic by covering a high
proportion of the practice population rather than a selected
few’ and ensuring that ‘the results could be generally
applied to practices in the United Kingdom,’ (Muir et al,
1994: 311). However those who moved out of the area over the
course of the study were excluded in each of two consecutive
analyses, presumably because the data was inaccessible, so
that the report in 1994 was based on results from just over
6000, and in 1995 on just over 4000 participants (Muir et
al, 1994 and 1995). A similar RCT, the Family Heart Study,
recruited and retained rather larger numbers of patients
from 28 practices ‘throughout Britain’ which were ‘chosen
according to specific demographic criteria’ (Wood et al,
18
1994: 313). As with the Heart Protection Study the value of
this trial was located in its careful identification of a
range of subjects, rather than concentration on a few at
selected sites. Even then, the study needed additional audit
to support this claim to generalisability. In the first year
of the work quality assurance ‘processes showed that one
nurse ha[d] departed from a number of protocol
requirements,’ and this practice was excluded from the trial
(Wood et al, 1994: 315).
Randomisation alone is not enough to discipline the
actions of healthcare professionals in the trial, just as
extra care may be taken to ensure compliant patients. Though
randomisation manages possible bias at treatment allocation,
trials also rely on detailed research protocols (such as
those studied in Berg, 1997) as an attempt to guide other
actions. Unwillingness of clinical staff to apply these
protocols may itself lead to serious difficulties of
recruitment (Garcia, Elborne & Snowdon, 2004). This has been
formalised in some situations as a failure of the ethical
requirement of ‘clinical equipoise’: a setting where there
19
is genuine professional uncertainty about the benefits of
the experimental intervention. Under this arrangement,
healthcare professionals are asked to select themselves for
active participation in the trial. Yet as with patients, the
relevance of these results is not guaranteed by a lack of
selection, indeed considerable efforts may be made to select
varied and complex populations as an alternative route to
relevance. Though Cochrane celebrated randomisation as an
approach which allowed you ‘not to worry about the
characteristics of the patients’ (1972: 22), these examples
show considerable care going into questions about the type
of involvement asked of patients and other participants.
Modelling the intervention - bracketing in action
Trial designers perform a second balancing act, between the
need for their experimental intervention to be stable and
visible in the active arm and tractable and detachable
enough to be withheld in the control. We have already seen
how doubts about the durability of the screening programme
endangered the results of the Family Heart Study. A quote
20
from ICR/Oxcheck illustrates the difficulty of defining this
kind of intervention. ‘Nurses were instructed to counsel
patients about risk factors, with the emphasis on
ascertaining the patients’ view on change and negotiating
priorities and targets for risk reduction.’ In this case,
the threat to the study was specifically described in terms
of ‘contamination of the control group by contact between
patients and by demand for health checks from patients
assigned to be controls,’ (Muir et al, 1994: 309-310). By
recruiting participants from within practices, rather than
across them, the trial faced the difficulty with its
assumption that they would act as individuals. In
comparison, in the Family Heart Study whole practices were
randomised together to the active or control groups and
within them families were made the object of study.
Pharmacological trials make similar accommodations to
social relations imagined in the process of design. For
example, the clinical relationship may be carefully set
apart from research ones, so that general practitioners or
referring physicians can veto the entry of individual
21
patients into the trial. While designers sometimes ask
practitioners to refrain from prescribing additional
medication that is the same or similar to that in the trial,
others propose that the research relation is adding to
rather than replacing the primary clinical one and allow
concurrent treatment. Both approaches were tried in the
Heart Protection Study, which switched to allowing
additional cholesterol reduction about half way through.
This move may be celebrated as a way of further increasing
the heterogeneity of participants in the trial, so that it
includes subjects taking a range of other medication, but
works by bracketing rather than selection.
We noted above that randomisation is a way of managing
unknown aspects of the clinical context beyond simple bias.
Yet trial designers also have to model this context in order
to take decisions about the boundaries between the
intervention and normal care. This modelling often involves
citation from laboratory work in biochemistry or pathology.
The result is the dilution of the intense empiricism of the
RCT, disrupting the cautionary contrast made by Cochrane
22
between what might seem ‘more basic science’ and clinical
experiments. In the case of non-pharmacological treatments,
designers may rely on psychology, organisational theory or
even sociology to support their choice of intervention
(Campbell et al, 2000). The models that lie behind trials
may therefore gain weight from scientific credentials
accrued elsewhere (Latour, 1987), but this is not always
made clear.
Despite such potential allies, trialists from Cochrane
onwards have tried to restrict explicit reference to theory
in their activities. One of the results is the preference
for defining outcomes in terms of mortality, the ultimate
hard end-point. Despite the emphasis on the empirical value
of such a simple outcome, mortality may also be understood
as a measure of research rather than science through the
claim to ‘clinical significance’ (regardless of how people
measure the benefit or disbenefit of different illness
states, they are thought to agree on valuing life) and ease
of adjudication in the real world (different diagnostic
23
preferences are unlikely to alter the ways different doctors
define death).
Pharmacological trials in cardiovascular prevention may
well achieve this standard. In the West of Scotland Study,
‘pravastatin produced a significant reduction in the risk of
the combined primary end point of definite nonfatal
myocardial infarction and death from coronary heart
disease,’ (Shepherd et al, 1995: 1301). Where using
mortality as an outcome is not possible, trials rely on
chains of assumed, or surrogate, relationships that are
hybrid, in the sense of mixing the material body and the
motivation of the patient, practitioner or policy maker.
Again the Family Heart Study is a good example of the kind
of leaps this requires.
‘If reductions of 0.1 mmol/l in blood cholesterol
and 1.5 mm Hg in diastolic blood pressure (half
that observed) but no reduction in cigarette
smoking were therefore attributed to this
programme… using information from reviews of the
24
effects of blood pressure and cholesterol on the
risk of coronary heart disease… and making the
crucial and untested assumption that the changes
in risk factors would be maintained long term, we
estimate the long term proportionate reduction in
coronary heart disease risk to be 12%… If the
screening and intervention programme used in this
trial were implemented in the same way by every
general practice in the country, and if such
programmes achieved the same reductions in risk
factors (which were then maintained long term) and
if this was translated into prevention of
myocardial infarction and saving of lives, the
overall impact on the population burden of
coronary heart disease would be small,’ (Wood et
al, 1994: 319).
Other trials make a virtue out of producing more qualitative
effects that are close to the individual. Cupples and
McKnight (1994) claim success in ‘lessening restriction of
25
everyday activities’ for their health education intervention
in secondary care, ‘despite having no significant effect on
objective cardiovascular risk factors’, while Campbell et al
(1998) report ‘significant improvements in six of eight
health status domains (all functioning scales, pain,
wellbeing and general health)’ thanks to ‘secondary
prevention clinics run by nurses’ but do not report on their
subjects’ blood pressure, cholesterol and other
physiologically defined risk factors. This move need not be
seen simply as a retreat from the demands of the scientific
trial format, but also as a reworking of the trial in ways
that appeal to the general practice audience (as well as
potentially to patients: see Epstein, 1996 for a discussion
of AIDS activists attempts to influence the use of surrogate
markers in new studies).
Another approach legitimates accounts that ignore
variation in the application of the protocol at the level of
the doctor or patient and ignores the question of the
stability or tractability of the intervention. The
‘intention to treat’ analysis means that the results of the
26
trial are presented for the original randomisation around a
single object and thus defines effectiveness as a measure of
benefit achieved in spite of, rather than without, the
messiness of clinical relationships and activities. This
might include imperfect compliance of patients or
practitioners, as discussed in the previous section, but
also the unpredictability of drugs or tools in different
hands and bodies. Like the preference for mortality or
quality of life as an outcome, the intention to treat
analysis works through further strategies of bracketing
(after Riles, 2001), which operate within the apparent
transparency provided in the trial. Unlike them, it does not
fall back on citation to fill in the gaps and seems to
resist the temptation to model the intervention, relying
instead on a kind of epistemological containment. Here trial
designers again present themselves as modest reporters on
research that overlays rather than reassembles the
complexity of practice, and yet both the use of mortality
and intention to treat work as a kind of alchemy to
strengthen rather than weaken their results.
27
Embedding trials in practice – proceeding by
incorporation
The final strategy has elements in common with both the
previous ones, but takes us into new territory. Relevance is
assured through recruitment, but here of objects and systems
rather than people or organisations. Without being
celebrated, such techniques support claims of
generalisability while avoiding compromising the intense
empiricism of the RCT by incorporating elements of practice
that are not seen as requiring, nor yet refusing,
explanation.
For pharmacological trials, the most obvious example of
this is the drug in pill form, which can be quickly inserted
into existing practices of prescription and (possible)
compliance or concordance. Trials of complex interventions
are also increasingly arranged around existing
organisational forms and objects in the clinic. The SHIP
trial (Jolly et al, 1999) introduced nurses into secondary
28
prevention, but asked them to coordinate current services
rather than institute new clinics, as happened in the
ICR/Oxcheck and Family Heart studies. More strikingly the
POST trial of secondary prevention (Feder et al, 1999)
intervened by sending postal prompts to patients and GPs
suggesting appointments for follow-up care. Another group
reported on the presentation of guidelines to improve
processes of care via pre-existing software in general
practices chosen ‘because their computer systems were
extensively used.’ Nevertheless the team acknowledged
difficulties in producing robust results and interpreted
these in terms of a failure to adapt to the imagined
realities of the field. ‘Computerised support systems for
decision-making must be integrated into the clinical
workflow. They must present the right information, in the
right format, at the right time, without requiring special
effort,’ (Eccles et al, 2002: 941).
These strategies recall the ‘technogoverance’ described
by May et al (2006) who discuss the ways in which decision
support tools may mediate between evidence-based and
29
patient-centred approaches to medicine, allowing doctors to
share uncertainty. Even counselling has been made amenable
to the RCT under this apparently permissive regime. Thus a
trial concluded that ‘counselling directed at behavioural
and attitudinal change may produce greater changes than
traditional educational approaches to health promotion,
particularly when tailored to the individual’s readiness to
change,’ (Steptoe et al, 1999: 943).
RCTs have also reached towards accommodation between
the fields of practice and policy (as well as practice and
research). Baker et al used existing guidelines from the
North of England Development Project on angina, which were
then reformatted to encourage improved processes of care.
The trial designers further defined one of their outcomes as
‘the proportion of patients with raised blood pressure
managed in accordance with the British Hypertension
Society’s guidelines’ (2003: 284). Like models, guidelines
may come with credibility attached. A more opportunistic
approach is expressing baseline data and outcomes in terms
of ongoing audit parameters. Thus the POST trial claimed
30
success in terms of the ‘measurement and recording of
coronary risk factors’ in patient records. In ASSIST the
effects of ‘a register and recall system’ to either the
doctor or the nurse and ‘audit’ alone were evaluated in
relation to a primary outcome defined as ‘adequate
assessment of three risk factors’ (Moher et al, 2001), a
standard derived in turn from a modernisation plan contained
in the National Service Framework on Coronary Heart Disease
(Department of Health, 2000).
These measures of process suggest a rather different
solution to the need for relevance than the careful
recording of physiological measures in the ICR/Oxcheck and
Family Heart studies. Designers are much less preoccupied
with the possible ‘scientific’ aspects of their work than
with improving practice in line with policy. Though the
decision aids and guidelines used in such research are
potentially controversial and often judged ineffective, they
may nevertheless both give and gain stability in the trial.
The observation supports the work of Cambrosio et al (2006)
on the emergence of ‘regulatory objectivity’ through the
31
tools and standards crossing research and regulation in
oncology, as well as analysis by Timmermans and Berg (1997,
2003) on standardisation and the use of protocols in
medicine. In these cardiovascular examples, using such
practical ‘ready-mades’ for information management and
reporting allows trial designers to make the RCT amenable to
an imagined ‘real world’, in a move logically prior to the
remaking of that practice in implementing trial results
(c.f. Berg, 1997).
Measures of process and the use of existing systems for
data collection also bring the RCT closer to practice
through the activities of audit and evaluation. Partly as a
result of this, the distinction between research and audit
has become problematic for institutions within the National
Health Service. Because research requires additional ethical
oversight, healthcare practitioners are increasingly likely
to define their investigations as audit and stress their
everyday nature as an integral part of the work of
management and care, but the boundaries are unclear. In one
of the most prominent efforts to resolve this uncertainty,
32
the Central Office for Research Ethics Committees has issued
guidance that proposes that one of the key differences is
randomisation. According to this, research may involve
randomised allocation to comparison groups: audit and health
services evaluation will not do so (COREC, 2006). However,
this distinction is not always used in practice, and other
attempts fall back on differences discerned between the aims
of these activities. Thus COREC defines research as ‘the
attempt to derive generalisable new knowledge’ in comparison
with the efforts of audit and evaluation to produce mere
‘information’ on local care.
There is some irony here. Efforts to improve the
contextualization of the RCT as research moves from phase I
to IV are already equated with improving the
generalisability of the results from small groups in well-
controlled settings. The previous sections of this paper
have emphasized that selection goes on in phase IV, but on
different grounds, while the methodology may be used to
allow bracketing of the less controlled aspects of practice,
or incorporation of its most stable elements. The addition
33
of audit and service evaluation further disrupts the
original continuum by proposing that surrendering to local
aims is both a categorical shift, but also a step back
towards a more narrow yet robust conclusion. However as I
note above, these attempts are still unstable and contested
on practical and political grounds, as well as
epistemological ones.
From the other side, proponents of Evidence-Based Policy
are even less able to insist on the dominance of the RCT
than supporters of EBM, and regulation is frequently
grounded in descriptive studies or audit as evidence of what
can be done with the resources at hand. In the case of the
government, the local nature of such studies is often not
seen as problematic and they are asked to work simply as
‘examples of good practice which could be applied more
widely,’ (Department of Health 2000: 11). Yet both the
National Service Framework on Coronary Heart Disease and the
independent Campbell Collaboration also celebrate examples
of enthusiasm and involvement as improving the adequacy of
research and development. The Collaboration exists to review
34
and disseminate evidence relating to social interventions of
various kinds. It embodies one of the more ‘purist’
conceptions of evidence with its emphasis on the RCT, or
non-randomised studies of high quality, but nevertheless
advocates managing bias ‘though a variety of approaches such as
abiding by high standards of scientific evidence, ensuring
broad participation, and avoiding conflicts of interest,’
(Campbell Collaboration, undated. Emphasis added).
Discussion: puzzling at the boundaries
In the past two decades students of science and
technology have followed Latour (1988) and others in
becoming increasingly wary of providing their own
definitions of ‘science’. Instead there has been growing
interest in mapping descriptions provided from within
disciplines claiming the title, as exemplified in a now
classic paper by Gieryn (1983). Such descriptions may in
35
turn be used as a measure against which the claims of
scientific practice can be characterised and even shown
wanting. The approach is encapsulated in studies of bias,
such as the one quoted in my introduction (De Vries and
Lemmens, 2006), and may be a very useful starting point for
critical accounts of particular fields of medical research.
However, the approach of this paper, following Gieryn, has
been rather to look to see when and how such boundaries may
be important for a specific set of informants. Such as
strategy produces some interesting findings.
While the methodology of the RCT continues to be
claimed as scientific, accounts of doing trials are more
likely to be described as research, with cheerful
acknowledgement of the importance of application and
estimation and limited claims to knowledge based on
empirical rather than theoretical foundations. The designers
of RCTs have a complicated relationship with explanation and
with other preoccupations of academic disciplines, though
they seek to use their empiricism to produce general
knowledge. Trials may even be described as audit or
36
evaluation if by doing so designers can avoid additional
regulatory burdens, though they must balance this concern
with their wish to achieve recognition through publication
in journals.
In designing their RCTs, trialists tend not to worry
about the categories of science or research in themselves.
Though they must clearly define the focus of their
investigation, a process that leads by implication to the
definition of other aspects as ‘context’, they show less
interest in what Knorr Cetina (1999) describes as ‘the work
of boundary maintenance with regard to the natural and
everyday order,’ (1999: 44) as in achieving relevance
through strategic connections with that order. Are these
efforts examples of ‘boundary work’ – the term suggested by
Gieryn to describe his own cases?2 If we would wish to
restrict this concept to work around science, it might be
appropriate to drop it at once. A second and more compelling
reason is perhaps that the term was originally used to
2 A fuller definition of this concept is ‘an effective ideological style for protecting professional
autonomy: public scientists construct a boundary between the production of scientific knowledge and its
consumption by non-scientists’ (Gieryn 1983: 789).
37
describe attempts to maintain and extend a division, even
though the repertoire with which this was done was flexibly
applied. In contrast, I have tried to demonstrate work
undertaken to cross, as well as maintain, a line that is
more likely to be seen as a problem than a strength of
medical research that uses the basic components of the RCT
methodology.
I started this paper with Cochrane’s enthusiasm for the
rigour of the randomised controlled trial. He also praised
its ‘wide applicability’ (Cochrane and Blythe, 1989: 159),
but in a recent discussion of the problem of external
validity, Rothwell reminds us that Cochrane was well aware
of the potential distance between trial and practice,
describing a ‘gulf between measurements based on RCTs and
benefit… in the community,’ (2005: 82). More recently this
‘gulf’ has been reinvented as a implementation ‘gap’, where
rather than practice being inferior to the contemporary
trial because of differences of resource and attention,
practice is viewed as lagging behind the results of trials.
This perception has been given much greater relevance by the
38
use of RCTs beyond the direct clinical sphere for the
assessment of cost-effectiveness by organisations like the
National Institute for Health and Clinical Effectiveness as
Cochrane’s ideas have been incorporated into health service
management and regulation.
The paper has described ways in which researchers set
out to narrow this gap by increasing the generalisability or
applicability of RCT results though design strategies
modifying the pure world of the experiment, among which I
have singled out selection, bracketing and incorporation. A
number of the trial designs considered here use the ability
of persons or objects to exist in two or more different
spheres at the same time to support such strategies.
Examples might include general practices as units of care;
protocols; mortality records; questionnaires; prompts and
decision aids; the organisational structures of audit and
other metrological regimes. Their identification owes much
to Star and Griesener’s well-known account of ‘boundary
objects’ as ‘plastic enough to adapt to local needs and the
constraints of several parties employing them, yet robust
39
enough to maintain a common identity across sites,’ (1989:
508). Unfortunately, this does not tell us much about the
dynamics or effects of these relations, nor the moments when
participants resist the explicit enrolment or elucidation of
separate elements of practice in favour of bracketing
unexplored assemblages. To explore those issues, it is
helpful to develop an understanding of RCTs as a
‘transaction space’ (Nowotny, Scott & Gibbons, 2001: 97).
Without specifically describing elements of my own
developing ethnography, the best account I can find of the
trial as a zone of transaction comes from Cochrane’s
autobiography. The story comes from his discussion of his
growing interest in the 1960s in trials of the place of
treatment. Rather ruefully he recalls the breakdown of
randomisation in the first of these trials, comparing
inpatient and outpatient treatment of varicose veins, as a
result of the additional purification carried out by a
surgical registrar ‘abstracting “interesting” cases from one
side,’ (Cochrane and Blythe, 1989: 208). His next project
was to be a trial of coronary care units versus home care
40
for people who had suffered heart attacks. The origins of
this study are identified in his own research interest,
concern from someone at the Department of Health and Social
Security (DHSS) about the cost of these units and the
clinical doubts expressed by a cardiac consultant in
Bristol. Cochrane set out to enrol the DHSS and Medical
Research Council (MRC) in a discussion about the ethics of a
study at a national level. Meanwhile started talking to
local general practitioners and cardiac consultants in
Cardiff, who ‘seemed interested and not particularly
antagonistic, particularly as there was a chance of their
getting a new coronary care unit through the plan,’ (1989:
209). A meeting of the joint DHSS and MRC Committee provided
the first confrontation between elites in research and
cardiology, but gave approval for the trial. ‘The very next
day the cardiac consultants made it clear to me that
whatever Lord Platt had said there was going to be no
randomised controlled trial of coronary care units in
Cardiff. I was at first disbelieving, then furious, and
cursed the day I had become a professor in the hope of
41
better cooperation with consultants. Later I became
fascinated by the psychology underlying the decision,’
(1989: 210).
Cochrane’s story neatly encapsulates the ways in which
different actors bring different aspirations for the trial
to the table. Though these agendas may nevertheless be
played out under the umbrella of the RCT methodology,
Cochrane was clearly well aware of the difficulties of
design and organisation facing any single trial. In the
process of designing a study, even within the apparently
rigid methodological stipulations, the RCT thus becomes a
‘transaction space’ in which all groups have something to
bring and take away in a project shaped throughout by
context (Nowotny, Scott & Gibbons, 2001: 141). This meshing
of networks as well as materials cannot be reduced to bias:
without it knowledge production could not go ahead. Even in
this 1960s example, authoritative ethical approval is the
prerequisite for such a project, but Cochrane is confident
initially that the trial methodology can also accommodate
the diverse networks he describes. Though randomisation is
42
designed to eliminate bias, the trial cannot go ahead
without working on and through recognisable emotions,
passions and ideology (Latour, 1998). Similarly in the
contemporary trials discussed in this paper, researchers
enrol diverse assemblages of policy-makers, doctors, nurses,
patients who must all be ‘interested’ in at least part of
the processes captured in research. Along with the
incorporation of ready-made units of organisation and
measurement, such selection allows trialists to produce
knowledge that is contextualised as a condition of its
elaboration. The need for ethical approval is thus only one
of the sites where social concerns shape the nature of the
experiment, despite the absence in the trials discussed in
this paper of particularly active patients or communities.
However, not all contextualisation relies on processes
of making things explicit. Other actors and objects are
involved in research but are made real and allowed to remain
attached to practice precisely by being excluded from the
twin techniques of randomisation and control. This paradox
was debated in a conversation between Alvan Feinstein and a
43
Dr Pinsky reported in the Proceedings of a Workshop on
Evaluation of Therapy, which took place in September 1983.
Commenting on a paper on trial methodology Feinstein argued
that the ‘messiness’ of the situation (with difficulties
selecting patients, holding interventions constant etc.)
means that the RCT, even with intention to treat analysis,
becomes meaningless. However for Pinsky ‘in the very
situation in which there is so much messiness, you are
suggesting the use of a messy technique. I would propose to
you that such a situation is just the type in which a messy
technique will yield no answer at all.’ Responding,
Feinstein argues that they should be more ambitious to study
‘the total world instead of that world that we can
conveniently capture,’ (Discussion following Brown 1984:
350). The conversation recalls Law on the need to
acknowledge ‘mess’ in social scientific research. His
solution, like Feinstein, is accepting more ‘messy’
techniques, yet he is also alive to the power of what he
calls ‘manifest’ rather than ‘Othered’ absence in accounts
of methodology, which seems to resonate with the strategies
44
of bracketing and incorporation discussed here (2004: 14).
Researchers in the field of cardiovascular research continue
debates on these issues as much through their own design
choices as though such self-consciously methodological
discussions, but the connection also reminds us that these
decisions may be relevant for our own research practice.
Conclusion
Randomised controlled trials have many things in common with
other tests, including assays. They share a quality of
public demonstration explored by Schaffer (2005) and the
importance of staged evaluation of the ingredients or
quality of objects. A substantial note of formality is
introduced by the demands made for randomisation and control
that appear to effect what Latour (1993) would understand as
purification. Yet other aspects of trial design expose the
difficulties of defining both the content and context of the
experiment and acknowledge the limits of scientific
practice. In particular, the designers of RCTs are careful
to use selection and incorporation as ways of claiming
45
relevance as well as rigour, and fully exploit the
possibilities that randomisation opens up for strategic
retreats from comprehensive control as well as from
explanation. Though the RCT continues to be celebrated as a
bulwark against bias, complex sets of persons and things
must be involved and interested in the production of
knowledge. In the process the categories of science,
research and practice, are reworked in the struggle for
persuasive evidence. The result is the ‘alchemy of the RCT’,
characterised by the ritual invocation of randomisation and
control as tools to transform these imperfect materials into
the stuff of certainty.
Acknowledgements
This is a development of a paper first presented at the
European Association for the Study of Science and Technology
meeting in Paris 2004. I would particularly like to thank
the conveners of a session on Evidence in Practice for the
stimulus to consider these issues, and among them Tiago
46
Moreira for his helpful comments and encouragement. I am
also grateful to the three anonymous reviewers for their
useful suggestions on the paper in its current form. The
research was made possible by two awards, one for work on my
PhD and a second for postdoctoral study (ESRC R42200134004
and ESRC/MRC PTA 037270093) in the field of cardiovascular
research and innovation. (112 words)
References
Baker, R., Fraser, R.C., Stone, M., Lambert, P., Stevenson, K. and Shiels, C. (2003). ‘Randomised controlled trial of the impact of guidelines, prioritised review criteria and feedback on implementation of recommendations for angina andasthma.’ British Journal of General Practice, 53, 284-291.
Berg, M. (1997). Rationalizing Medical Work: Decision-Support Techniques and Medical Practices. The MIT Press: Cambridge MA and London.
Brown, BW. (1984). ‘The randomized clinical trial.’ (Printedwith following discussion) Statistics in Medicine 3, 307-311.
Callon M. and Rabeharisoa V. (2003). ‘Research in the wild and the shaping of new social identities.’ Technology in Society,25, 193-204.
Cambrosio A, Keating P, Schlich T and Weisz G (2006) ‘Regulatory Objectivity and the Generation and
47
Management of Evidence in Medicine,’ Social Science & Medicine, 63, 189-99.
Campbell Collaboration (undated) About the Campbell Collaboration [Accessed online 18 January 2007] www.campbellcollaboration.org/About.asp
Campbell, N, Thain, J, Deans, H, George, R, Lewis D., Rawles, JM, Squair, JL. (1998). ‘Secondary prevention clinics for coronary heart disease: randomised trial of effect on health.’ BMJ 316, 1434-7.
Campbell, M, Fitzpatrick, R, Haines, A, Kinmouth A-L, Sandercock P, Spiegelhalter D and Tyrer D (2000). ‘Frameworkfor design and evaluation of complex interventions to improve health.’ BMJ, 321, 694-6.
Cochrane, A. (1972). Effectiveness and Efficiency. Random Reflections on Health Services. Nuffield Provincial Hospitals Trust: London.
Cochrane, A. and Blythe, M. (1989). One Man’s Medicine. An autobiography of Professor Archie Cochrane. The British Medical Journal: London.
COREC (2006). Differentiating audit, service evaluation and research. [Accessed online 18 January 2007] www.corec.org.uk applicants/help/docs/Audit_or_Research_table.pdf.
Cupples, M.E. and McKnight, A. (1994). ‘Randomised controlled trial of health promotion in general practice forpatients at high cardiovascular risk.’ BMJ, 309, 993-6.
De Vries, R. and Lemmens, T. (2006). ‘The social and cultural shaping of medical evidence: Case studies from pharmaceutical research and obstetric science.’ Social Science &Medicine 62, 11, 2694-2706.
Department of Health (2000). National Service Framework on Coronary Heart Disease. HMSO: London.
48
Eccles M., McColl E, Steen N, Rousseau N, Grimshaw J., Parkin D, Purves I. (2002). ‘Effect of computerised evidence-based guidelines on management of asthma and anginain adults in primary care: cluster randomised controlled trial.’ BMJ, 315, 941
Epstein, S. (1996). Impure science: AIDS, activism and the politics of knowledge. Berkeley: University of California Press.
Feder, G., Griffiths, C., Eldridge S, Spence M. (1999). ‘Effect of postal prompts to patients and general practitioners on the quality of primary care after a coronary event (POST): randomised controlled trial.’ BMJ 318, 1522-6.
Garcia, J., Elbourne, D. and Snowdon, C. (2004). ‘Equipoise:a case study of the views of clinicians involved in two neonatal trials.’ Clinical Trials 4, 1, 170-178.
Gieryn, T. F. (1983). ‘Boundary-work and the demarcation of science from nonscience: strains and interests in professional ideologies of scientists.’ American Sociological Review 48, 6, 781-795.
Heart Protection Study Collaborative Group (2002a). Questions and answers. [Accessed online 18 January 2007] www.ctsu.ox.ac.uk/~hps/June02QandA.shtml
Heart Protection Study Collaborative Group (2002b). ‘MRC/BHFHeart Protection Study of cholesterol lowering with simvastatin in 20,536 high-risk individuals: a randomised placebo-controlled trial.’ The Lancet 360, 7-22.
Jolly K., Bradley F., Sharp S., Smith H., Thompson S, Kinmouth A-L and Mant D. on behalf of the SHIP collaborativegroup (1999). ‘Randomised controlled trial of follow up carein general practice of patients with myocardial infarction
49
and angina: final results of the Southampton heart integrated care project (SHIP).’ BMJ 318, 706-11.
Jones D.S. (2000) Vision of a cure: visualization, clinical trials, and controversies in cardiac therapeutics, 1968-1998. Isis 91, 3, 504-541
Knorr Cetina K. (1999). Epistemic cultures. How the sciences make knowledge. Harvard University Press: Cambridge MA.
Latour, B. (1987). Science in Action. Open University Press: Milton Keynes, UK
Latour B. (1988) The Pasteurisation of France. Harvard University Press, Cambridge MA and London, England.
Latour, B. (1993). We have never been modern. Harvard UniversityPress: Cambridge MA. Translated Catherine Porter.
Latour, B. (1998). ‘From the world of science to the world of research?’ Science 280, 208-209.
Law, J. (2004). After method. Mess in social science research. Routledge: London and New York.
Marks, H.M. (1997). The Progress of Experiment. Science and Therapeutic Reform in The United States, 1900-1990. Cambridge University Press: Cambridge.
Marks, H.M. (2000). ‘Trust and Mistrust in the Marketplace: Statistics and Clinical Research, 1945-1960.’ History of Science 38, 343-355.
May, C., Rapley, T., Moreira, T., Finch, T. and Heaven, B. (2006). ‘Technogovernance: evidence, subjectivity and the clinical encounter in primary care medicine.’ Social Science & Medicine 62, 4, 1022-1030.
50
Medical Research Council (2001) Press release for the Heart Protection Study. [Accessed online 17 August 2005] www.mrc.ac.uk/txt/index/public-interest/public-news_centre/public-press_office/public-press_releases_2001/public-13_november_2001b.htm
Moher M., Yudkin P, Wright, L., Turner R., Fuller A, Schofield T. Mant D. for the Assessment of Implementation Strategies (ASSIST) trial collaborative group (2001). ‘Cluster randomised controlled trial to compare three methods of promoting secondary prevention of coronary heart disease in primary care.’ BMJ 322, 1-7.
Moreira, T. (2005). ‘Diversity in clinical guidelines: the role of repertoires of evaluation.’ Social Science & Medicine 60, 9, 1975-1985.
Muir J., Mant D., Jones L., Yudkin P., on behalf of the Imperial Cancer Research Fund OXCHECK study group (1994). ‘Effectiveness of health checks conducted by nurses in primary care: results of the OXCHECK study after one year.’ BMJ 308, 308-12.
Nowotny, H., Scott, P., & Gibbons, M. (2001). Re-thinking science: knowledge and the public in an age of uncertainty. Cambridge: Polity Press.
Porter, T. (1995). Trust in numbers. The pursuit of objectivity in science and public life. Princeton University Press: Princeton New Jersey.
Richards, E. (1991). Vitamin C and Cancer: Medicine or Politics? Macmillan: London.
Riles, A. (2001). The network inside out. University of Michigan Press, Ann Arbor.
51
Rothwell, P. (2005). ‘Treating individuals 1. External validity of randomised controlled trials: “To whom do the results of this trial apply?”’ The Lancet 365, 82-93.
Schaffer, S. (2005). ‘Public Experiments’ in Making Things Public. Atmospheres of Democracy, ed. by B. Latour and P Weibel. The MIT Press: Cambridge MA and London, England. 298-307.
Shepherd J., Cobbe S.M, Ford I., Isles C.G., Lormier A.R., Macfarlance P.W., McKillop, J.H. and Packard C.J. for the West of Scotland Coronary Prevention Study Group (1995). ‘Prevention of coronary heart disease with pravastatin in men with hypercholesterolaemia.’ New England Journal of Medicine 335, 1001-9
Star, S.L. and Griesemer, J.R. (1989). ‘Institutional ecology, ‘translations’ and boundary objects: amateurs and professionals in Berkeley’s museum of vertebrate zoology.’ Social Studies of Science 19, 3, 387-420.
Steptoe, A., Doherty, S., Rink, E., Kerry S., Kendrick T andHilton S. (1999). ‘Behavioural counselling in general practice for the promotion of healthy behaviour among adultsat increased risk of coronary heart disease: randomised trial.’ BMJ 319, 943-8.
Timmermans, S. and Berg, M. (2003). The Gold Standard. The Challenge of Evidence-Based Medicine and Standardization in Health Care. Temple University Press: Philadelphia.
Timmermans S and Berg M (1997). ‘Standardisation in Action: Achieving Local Universality through Medical Protocols.’ Social Studies of Science 27, 2, 273-305.
West of Scotland Coronary Prevention Study Group (1992). A coronary primary prevention study of Scottish men aged 45-64years: trial design. Journal of Clinical Epidemiology 45, 849-860.
52
Wood, D.A., Kinmouth, A-L., Davies G A., Yarwood J., Thompson S.G., Pyke, S.D.M., Kok Y., Cramb R., Le Guen, C., Marteau T.M., Durrington P.N. for the Family Heart Study Group (1994). ‘Randomised controlled trial evaluating cardiovascular screening and intervention in general practice: principal results of British family heart study.’ BMJ 308, 313-320.
53