the alchemy of clinical trials

This is a post-peer-review version of an article publishedin Biosocieties. The definitive publisher-authenticatedversion Will, C. (2007)The alchemy of clinical trials.Biosocieties. 2, 1, 85-100. is available online at:http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1002728&fileId=S1745855207005078

Biosocieties, Special Issue, March 2007

An elusive evidence base? The construction and governance of randomised controlled trials.

The alchemy of clinical trials

Catherine M Will

Contact details:

Catherine M WillDepartment of SociologyUniversity of SussexFalmerEast SussexBN1 9SN

Email. [email protected]

Tel. (until Easter 2007) 01223 334617Fax. 01223 335993

Word count: 7000 plus abstract (120), acknowledgements (112)

and references (1238).

The alchemy of clinical trials

Abstract

This paper considers the complex construction of randomised

controlled trials that lie behind the rhetoric of the gold

2

mailto:[email protected]

standard. Drawing on insights from Science and Technology

Studies and empirical material, I argue that trials in the

field of cardiovascular disease prevention are constituted

as ‘research’ rather than ‘science’. In these examples,

control emerges out of a dual concern with practices of

purification and involvement of aspects of the world outside

the experiment, as the designers of trials aspire to

relevance as well as rigour. Such strategies of

‘contextualisation’ mean that trials are more like alchemy

than assay, proceeding by complex moves to transform ‘base’

matters rather than distilling elements of clinical

practice.

Keywords

clinical trial; RCT; contextualisation; boundary work;

transaction space

3

Introduction

This paper considers the complex construction of randomised

controlled trials (RCTs) by placing empirical material

alongside insights from Science and Technology Studies.

Sociological responses to evidence-based medicine (EBM) have

often expressed concerns about the application of a

hierarchy of evidence, with the RCT at its apex, to the

world of the clinic. They draw attention to the continued

use and value of observational research, expertise and

experience, existing practice and patient preference in

shaping care, yet often leave the RCT itself under-examined.

My narrow focus allows a more careful discussion of the ways

in which trials themselves generate categories of evidence

and practice in one field of medicine.

The title borrows from the conclusion of a recent paper

by De Vries and Lemmens where the authors argue that ‘the

alchemy of the clinical trial transforms EBM from a

challenger to a protector of corporate agendas’ (2006: 2704)

4

and identify what they call the ‘structural’ and ‘cultural’

sources of bias in the design of research in Dutch

obstetrics. Such detailed studies clearly chime with my own

interest in understanding the production of knowledge in

medicine, but rather than seek out bias in this article I am

interested in the ways in which modifications of the ‘pure’

world of the experiment may also be seen as strengthening

the evidence it is intended to produce. Here I take

inspiration from recent work charting the importance of

‘contextualisation’ in the research (Nowotny, Scott &

Gibbons, 2001).

These authors argue that clinical trials may become

examples of a more contextualised science when patients are

able to modify the design and organisation of medical

research, as happens in the cases discussed by writers such

as Callon and Rabeharisoa (2003) and Epstein (1996). Callon

and Rabeharisoa’s work has also been used by Latour (1998)

to argue that there is a shift from ‘science’ associated

with certainty and detachment, to ‘research’, which is

animated by involvement. Though the clinical trials examined

5

here are not the subject of patient activism, nonetheless

they do proceed by involving healthcare staff, objects and

aspects of daily organisation. As examples of public

experiments (Schaffer, 2005) the emphasis is still on the

method rather than the identity of those who apply it. But

this performance is not concentrated on the typical tasks of

assay: extracting and classifying separate elements. Instead

the method is used to accommodate continuing uncertainty

about the contents and context for specific clinical proofs,

and gains strength as much from incorporation as

purification.

The trials I use to illustrate this claim include

studies of pharmaceutical products and what become known as

‘complex interventions’ (Campbell et al, 2000) intended to

help prevent cardiovascular disease in the UK population.

These have combined limited knowledge about lifestyle

factors such as overweight, diet and exercise from the 1970s

and 1980s with the increasing use of both old and new drugs

designed to lower blood pressure and cholesterol. The

challenge is now seen as coordinating and effectively

6

applying some of these ideas to patients at particular risk,

whether they have already had heart disease (secondary

prevention) or not (primary or primordial prevention).

Cardiology itself has been the site of much of the

stabilisation of the RCT methodology in the last four

decades, especially though large-scale drug trials. In the

UK RCTs were also used relatively early to adjudicate on the

issue of the location of care, in work on the effectiveness

of the coronary care unit under the guidance of Archie

Cochrane (discussed below). However, the designers of the

studies considered in this paper continue to struggle to

produce what they would understand as robust evidence. This

is especially true when they try to apply the methodology to

further questions of organisation and improving the delivery

and application of lifestyle modification alongside

preventive medication. Yet I will argue that trials of such

complex interventions are not unusual in requiring the

coordination of hybrid groups of actors and objects,

assembled together out of what might otherwise be seen as

distinct modes of policy, practice and research.

7

As a result we should be wary of seeing RCTs as

experiments in specific disciplines, of simply following to

see how they are received by particular groups or looking to

see if they are applied to clinical practice. Instead,

trials are necessarily constituted through and with current

practice and personnel in a number of domains, and become a

kind of ‘transaction space’ between different collectives

(Nowotny, Scott & Gibbons, 2001). I attempt here to chart

some of the characteristics of this space and the

accommodations that may be performed within it. In

particular I look to see how these may work to re-animate

rather than restrict the complex relations that make up

clinical practice, since trials are designed with an eye to

relevance as well as rigour. It is the difficulty of pulling

off this balance between methodological adequacy and

useability (Moreira, 2005) – or internal and external

validity – that makes trials more like research than

science, and as much alchemy as assay.

8

The meaning of a method

Statements that the randomised controlled trial is ‘the gold

standard’ of clinical research depend on two interwoven

methodological strategies: the use of a control or

comparative approach to testing an intervention, and of

randomisation to the groups being compared. Increasingly,

randomisation is required for admissibility of evidence to

processes of further evaluation and regulation such as

systematic review, meta-analysis or guideline writing, as

professional groups and policy makers accept the position of

the RCT at the apex of evidence in healthcare. Though the

randomised comparison has therefore become the signature of

a confident empiricism, it may also be seen as a modest

response to the limits of the experimenter’s ability.

Randomised allocation, and the use of single or double

blinding are solutions to the problem of unconscious as well

as conscious bias among the trial personnel and patients

9

(Marks, 2000), while contemporaneous controls are viewed as

necessary because the experimenter cannot halt the flow of

events outside the study and therefore cannot themselves

guarantee a true comparison (Richards, 1991).

Nevertheless, the proponents of trials have seen them

as a way of improving control over healthcare in general,

managing the pace of both therapeutic innovation and growing

demand. For Archie Cochrane, often viewed as the father of

evidence-based medicine in the UK, the RCT was the best way

of ‘producing order out of … chaos’ in the National Health

Service (1972: 11), promoting ‘effectiveness and

efficiency’, the title of his famous monograph. Cochrane was

determined to persuade doctors that the methods of

randomisation and control should be seen as ‘scientific

techniques’ (1972: 67), though practitioners were often ‘too

easily bemused by evidence from what appears to be a more

basic science,’ (1972: 30) whether that was patho-physiology

or biochemistry. His confidence in these simple strategies

was clear: ‘in the sort of applied research needed the

hypotheses about effectiveness, place of treatment, length

10

of stay, are readymade. The technique will nearly always be

a RCT,’ (1972: 80).1

Cochrane’s confidence in the method and the

identification of the research question is striking. Yet as

De Vries and Lemmens (2006) point out, there is considerable

play of judgement around the choice of the object of

randomisation (which is formulated within the research

question) and of the measurements taken to stand for benefit

or harm. Such questions may become the subject of heated

debates between different clinical groups (Jones, 2000) or

between statisticians and physicians (Marks, 1997). Thus

histories of clinical research draw attention to the ways in

which trials emerge out of conflict or collaboration between

different disciplinary groups, as well as working as means

of managing interactions between doctors, researchers and

pharmaceutical companies, or demanding patients and

governments. The design of the trial is therefore often a

much more complicated matter than Cochrane’s quote might

lead us to expect. As I have learned from ethnographic

1 I am grateful to Dr Ann Kelly for drawing my attention to this passage.

11

engagement with a team carrying out a complex intervention

trial since October 2006, organising an RCT requires careful

negotiations between the requirements of rigour and

relevance, which I seek here to illustrate with examples

from published discussions of trial design. Within these it

becomes clear that agreement on the bare bones of research

methodology is not enough to deliver what Porter (1995)

describes as ‘procedural objectivity’. In the next three

sections I introduce several strategies through which

trialists attempt to strengthen their results by redrawing

the boundaries between research and practice, before

returning to the notion of transaction in my final

discussion.

Making sense of the setting of the trial –

varieties of selection

Trials need to have clear relations to practice in order to

be relevant to doctors. This understanding is built rather

explicitly into the model by which pharmaceutical products

12

are tested in studies categorised as lying between phase I

and IV. Phase 1 trials are seen as adjudicating on the

safety of drugs, usually in healthy volunteers, phase II and

III trials are used to estimate ‘efficacy’ and the way the

drug works in particular groups of patients. Finally phase

IV trials are used to judge ‘effectiveness’, the expected

outcomes when used in the target population in the clinic.

Even so the completion of such trials often sparks arguments

about the degree to which their results can be extrapolated

to groups who may not have been included in research, for

example women or the elderly. At the same time, phase IV

trials function as a kind of marketing, allowing companies

to encourage doctors to familiarise themselves with the

product in the context of the studies, with the opportunity

of creating impressive statistics for use in publicity. This

is an additional reason for companies to run such trials in

important markets, such as the United States, even when they

have clinical evidence from other countries. Companies cite

differences in diagnostic cultures, clinical organisation

and ethnic mix as reasons for carrying out such location

13

specific research, but these claims about external validity

are not always convincing or well-understood (Rothwell,

2005). In this section I will focus on ways in which the

setting of the trial is connected with particular selections

of the patients or clinicians to be included, and the

implications of this for the data produced.

Even trials recognised as phase IV pharmacological

studies may heavily select participants, generally

justifying such actions as a way to improve the likelihood

of finding a strong relationship between the hoped-for

benefit and the drug. The designers of an early cholesterol-

lowering trial in the UK, the West of Scotland Coronary

Prevention Study, celebrated the virtues of a relatively

unselected male population identified in a particular site,

for increasing both the power and clinical relevance of the

trial.

Stable hypertension, controlled diabetes mellitus

and angina pectoris (not requiring hospitalization

within the previous year) are not exclusion

14

criteria since men with these conditions would

theoretically experience benefit from cholesterol

lowering therapy over the period of the trial… We

feel that the inclusion of such subjects is

appropriate since they would not normally be

included in a Secondary Prevention Study. This

term ‘primary prevention’ is a misnomer in this

type of study, since all subjects will have

coronary artery disease to a greater or lesser

degree. The partition of all men aged 45-64 into

those who are appropriate for ‘primary’ or

‘secondary’ prevention is, to some extent,

arbitrary. (West of Scotland Coronary Prevention

Study Group, 1992: 852)

The design allowed the study’s sponsor to market their

product as a primary prevention drug in ‘otherwise healthy

patients with elevated cholesterol’ (Bristol Myers Squibb

marketing material, BMJ 25/11/1995) at the conclusion of the

trial.

15

Another Phase IV trial, the Heart Protection study of

vitamins and cholesterol-lowering for patients with clinical

evidence of existing vascular disease or diabetes similarly

claimed clinical applicability, but grounded that in the

possibilities of diagnostic classification rather than its

absence.

Such people are at particularly high risk of heart

attacks and strokes, and so have the most to gain

from a reduction in their risk [its website

claimed]… they are also easily identified from

existing medical records, so it should be

straightforward for doctors to use the findings

for the care of their patients. (Heart Protection

Study, 2002a: para. 7)

Yet the Study also had a long list of exclusion criteria

relating to contact between the trial and the patient or

doctor in question and reported that only ‘compliant

individuals… were randomly allocated. Of those who entered

16

the run in, 36% were not subsequently randomised [and] 26%

chose not to enter the trial or did not seem likely to be

compliant for 5 years,’ (Heart Protection Study, 2002b: 10).

Despite these selections, the organisers celebrated the

variety of patients included in the trial, which enrolled

women and the elderly in unusual numbers across the UK. ‘The

study’s size and the wide range of high-risk patients

included, means that doctors now have evidence that is

uniquely clear and reliable,’ (Medical Research Council,

2001). Claims to generalisability may thus be based on

different choices of patient selection in phase IV studies,

grounded in either an implicit assumption of diversity

captured in a pragmatic sample or an explicit and

disciplined hunt for variety.

A similar choice is evident in trials of complex

interventions, which are less easy to identify with any

specific phase. The Oxcheck/ICR study, reporting the year

before the West of Scotland results were released, accessed

a similar number of participants in a trial designed to

‘assess the effectiveness of health checks by nurses in

17

reducing risk factors for cardiovascular disease in patients

from general practice’ recruiting from just five urban

general practices in Bedfordshire. Randomisation between

being offered the intervention of a nurse check up or not

happened early in the process, among 11,000 men and women

who responded to an initial questionnaire. Despite the small

number of practices, this was celebrated as ‘stopp[ing] the

intervention from being too unrealistic by covering a high

proportion of the practice population rather than a selected

few’ and ensuring that ‘the results could be generally

applied to practices in the United Kingdom,’ (Muir et al,

1994: 311). However those who moved out of the area over the

course of the study were excluded in each of two consecutive

analyses, presumably because the data was inaccessible, so

that the report in 1994 was based on results from just over

6000, and in 1995 on just over 4000 participants (Muir et

al, 1994 and 1995). A similar RCT, the Family Heart Study,

recruited and retained rather larger numbers of patients

from 28 practices ‘throughout Britain’ which were ‘chosen

according to specific demographic criteria’ (Wood et al,

18

1994: 313). As with the Heart Protection Study the value of

this trial was located in its careful identification of a

range of subjects, rather than concentration on a few at

selected sites. Even then, the study needed additional audit

to support this claim to generalisability. In the first year

of the work quality assurance ‘processes showed that one

nurse ha[d] departed from a number of protocol

requirements,’ and this practice was excluded from the trial

(Wood et al, 1994: 315).

Randomisation alone is not enough to discipline the

actions of healthcare professionals in the trial, just as

extra care may be taken to ensure compliant patients. Though

randomisation manages possible bias at treatment allocation,

trials also rely on detailed research protocols (such as

those studied in Berg, 1997) as an attempt to guide other

actions. Unwillingness of clinical staff to apply these

protocols may itself lead to serious difficulties of

recruitment (Garcia, Elborne & Snowdon, 2004). This has been

formalised in some situations as a failure of the ethical

requirement of ‘clinical equipoise’: a setting where there

19

is genuine professional uncertainty about the benefits of

the experimental intervention. Under this arrangement,

healthcare professionals are asked to select themselves for

active participation in the trial. Yet as with patients, the

relevance of these results is not guaranteed by a lack of

selection, indeed considerable efforts may be made to select

varied and complex populations as an alternative route to

relevance. Though Cochrane celebrated randomisation as an

approach which allowed you ‘not to worry about the

characteristics of the patients’ (1972: 22), these examples

show considerable care going into questions about the type

of involvement asked of patients and other participants.

Modelling the intervention - bracketing in action

Trial designers perform a second balancing act, between the

need for their experimental intervention to be stable and

visible in the active arm and tractable and detachable

enough to be withheld in the control. We have already seen

how doubts about the durability of the screening programme

endangered the results of the Family Heart Study. A quote

20

from ICR/Oxcheck illustrates the difficulty of defining this

kind of intervention. ‘Nurses were instructed to counsel

patients about risk factors, with the emphasis on

ascertaining the patients’ view on change and negotiating

priorities and targets for risk reduction.’ In this case,

the threat to the study was specifically described in terms

of ‘contamination of the control group by contact between

patients and by demand for health checks from patients

assigned to be controls,’ (Muir et al, 1994: 309-310). By

recruiting participants from within practices, rather than

across them, the trial faced the difficulty with its

assumption that they would act as individuals. In

comparison, in the Family Heart Study whole practices were

randomised together to the active or control groups and

within them families were made the object of study.

Pharmacological trials make similar accommodations to

social relations imagined in the process of design. For

example, the clinical relationship may be carefully set

apart from research ones, so that general practitioners or

referring physicians can veto the entry of individual

21

patients into the trial. While designers sometimes ask

practitioners to refrain from prescribing additional

medication that is the same or similar to that in the trial,

others propose that the research relation is adding to

rather than replacing the primary clinical one and allow

concurrent treatment. Both approaches were tried in the

Heart Protection Study, which switched to allowing

additional cholesterol reduction about half way through.

This move may be celebrated as a way of further increasing

the heterogeneity of participants in the trial, so that it

includes subjects taking a range of other medication, but

works by bracketing rather than selection.

We noted above that randomisation is a way of managing

unknown aspects of the clinical context beyond simple bias.

Yet trial designers also have to model this context in order

to take decisions about the boundaries between the

intervention and normal care. This modelling often involves

citation from laboratory work in biochemistry or pathology.

The result is the dilution of the intense empiricism of the

RCT, disrupting the cautionary contrast made by Cochrane

22

between what might seem ‘more basic science’ and clinical

experiments. In the case of non-pharmacological treatments,

designers may rely on psychology, organisational theory or

even sociology to support their choice of intervention

(Campbell et al, 2000). The models that lie behind trials

may therefore gain weight from scientific credentials

accrued elsewhere (Latour, 1987), but this is not always

made clear.

Despite such potential allies, trialists from Cochrane

onwards have tried to restrict explicit reference to theory

in their activities. One of the results is the preference

for defining outcomes in terms of mortality, the ultimate

hard end-point. Despite the emphasis on the empirical value

of such a simple outcome, mortality may also be understood

as a measure of research rather than science through the

claim to ‘clinical significance’ (regardless of how people

measure the benefit or disbenefit of different illness

states, they are thought to agree on valuing life) and ease

of adjudication in the real world (different diagnostic

23

preferences are unlikely to alter the ways different doctors

define death).

Pharmacological trials in cardiovascular prevention may

well achieve this standard. In the West of Scotland Study,

‘pravastatin produced a significant reduction in the risk of

the combined primary end point of definite nonfatal

myocardial infarction and death from coronary heart

disease,’ (Shepherd et al, 1995: 1301). Where using

mortality as an outcome is not possible, trials rely on

chains of assumed, or surrogate, relationships that are

hybrid, in the sense of mixing the material body and the

motivation of the patient, practitioner or policy maker.

Again the Family Heart Study is a good example of the kind

of leaps this requires.

‘If reductions of 0.1 mmol/l in blood cholesterol

and 1.5 mm Hg in diastolic blood pressure (half

that observed) but no reduction in cigarette

smoking were therefore attributed to this

programme… using information from reviews of the

24

effects of blood pressure and cholesterol on the

risk of coronary heart disease… and making the

crucial and untested assumption that the changes

in risk factors would be maintained long term, we

estimate the long term proportionate reduction in

coronary heart disease risk to be 12%… If the

screening and intervention programme used in this

trial were implemented in the same way by every

general practice in the country, and if such

programmes achieved the same reductions in risk

factors (which were then maintained long term) and

if this was translated into prevention of

myocardial infarction and saving of lives, the

overall impact on the population burden of

coronary heart disease would be small,’ (Wood et

al, 1994: 319).

Other trials make a virtue out of producing more qualitative

effects that are close to the individual. Cupples and

McKnight (1994) claim success in ‘lessening restriction of

25

everyday activities’ for their health education intervention

in secondary care, ‘despite having no significant effect on

objective cardiovascular risk factors’, while Campbell et al

(1998) report ‘significant improvements in six of eight

health status domains (all functioning scales, pain,

wellbeing and general health)’ thanks to ‘secondary

prevention clinics run by nurses’ but do not report on their

subjects’ blood pressure, cholesterol and other

physiologically defined risk factors. This move need not be

seen simply as a retreat from the demands of the scientific

trial format, but also as a reworking of the trial in ways

that appeal to the general practice audience (as well as

potentially to patients: see Epstein, 1996 for a discussion

of AIDS activists attempts to influence the use of surrogate

markers in new studies).

Another approach legitimates accounts that ignore

variation in the application of the protocol at the level of

the doctor or patient and ignores the question of the

stability or tractability of the intervention. The

‘intention to treat’ analysis means that the results of the

26

trial are presented for the original randomisation around a

single object and thus defines effectiveness as a measure of

benefit achieved in spite of, rather than without, the

messiness of clinical relationships and activities. This

might include imperfect compliance of patients or

practitioners, as discussed in the previous section, but

also the unpredictability of drugs or tools in different

hands and bodies. Like the preference for mortality or

quality of life as an outcome, the intention to treat

analysis works through further strategies of bracketing

(after Riles, 2001), which operate within the apparent

transparency provided in the trial. Unlike them, it does not

fall back on citation to fill in the gaps and seems to

resist the temptation to model the intervention, relying

instead on a kind of epistemological containment. Here trial

designers again present themselves as modest reporters on

research that overlays rather than reassembles the

complexity of practice, and yet both the use of mortality

and intention to treat work as a kind of alchemy to

strengthen rather than weaken their results.

27

Embedding trials in practice – proceeding by

incorporation

The final strategy has elements in common with both the

previous ones, but takes us into new territory. Relevance is

assured through recruitment, but here of objects and systems

rather than people or organisations. Without being

celebrated, such techniques support claims of

generalisability while avoiding compromising the intense

empiricism of the RCT by incorporating elements of practice

that are not seen as requiring, nor yet refusing,

explanation.

For pharmacological trials, the most obvious example of

this is the drug in pill form, which can be quickly inserted

into existing practices of prescription and (possible)

compliance or concordance. Trials of complex interventions

are also increasingly arranged around existing

organisational forms and objects in the clinic. The SHIP

trial (Jolly et al, 1999) introduced nurses into secondary

28

prevention, but asked them to coordinate current services

rather than institute new clinics, as happened in the

ICR/Oxcheck and Family Heart studies. More strikingly the

POST trial of secondary prevention (Feder et al, 1999)

intervened by sending postal prompts to patients and GPs

suggesting appointments for follow-up care. Another group

reported on the presentation of guidelines to improve

processes of care via pre-existing software in general

practices chosen ‘because their computer systems were

extensively used.’ Nevertheless the team acknowledged

difficulties in producing robust results and interpreted

these in terms of a failure to adapt to the imagined

realities of the field. ‘Computerised support systems for

decision-making must be integrated into the clinical

workflow. They must present the right information, in the

right format, at the right time, without requiring special

effort,’ (Eccles et al, 2002: 941).

These strategies recall the ‘technogoverance’ described

by May et al (2006) who discuss the ways in which decision

support tools may mediate between evidence-based and

29

patient-centred approaches to medicine, allowing doctors to

share uncertainty. Even counselling has been made amenable

to the RCT under this apparently permissive regime. Thus a

trial concluded that ‘counselling directed at behavioural

and attitudinal change may produce greater changes than

traditional educational approaches to health promotion,

particularly when tailored to the individual’s readiness to

change,’ (Steptoe et al, 1999: 943).

RCTs have also reached towards accommodation between

the fields of practice and policy (as well as practice and

research). Baker et al used existing guidelines from the

North of England Development Project on angina, which were

then reformatted to encourage improved processes of care.

The trial designers further defined one of their outcomes as

‘the proportion of patients with raised blood pressure

managed in accordance with the British Hypertension

Society’s guidelines’ (2003: 284). Like models, guidelines

may come with credibility attached. A more opportunistic

approach is expressing baseline data and outcomes in terms

of ongoing audit parameters. Thus the POST trial claimed

30

success in terms of the ‘measurement and recording of

coronary risk factors’ in patient records. In ASSIST the

effects of ‘a register and recall system’ to either the

doctor or the nurse and ‘audit’ alone were evaluated in

relation to a primary outcome defined as ‘adequate

assessment of three risk factors’ (Moher et al, 2001), a

standard derived in turn from a modernisation plan contained

in the National Service Framework on Coronary Heart Disease

(Department of Health, 2000).

These measures of process suggest a rather different

solution to the need for relevance than the careful

recording of physiological measures in the ICR/Oxcheck and

Family Heart studies. Designers are much less preoccupied

with the possible ‘scientific’ aspects of their work than

with improving practice in line with policy. Though the

decision aids and guidelines used in such research are

potentially controversial and often judged ineffective, they

may nevertheless both give and gain stability in the trial.

The observation supports the work of Cambrosio et al (2006)

on the emergence of ‘regulatory objectivity’ through the

31

tools and standards crossing research and regulation in

oncology, as well as analysis by Timmermans and Berg (1997,

2003) on standardisation and the use of protocols in

medicine. In these cardiovascular examples, using such

practical ‘ready-mades’ for information management and

reporting allows trial designers to make the RCT amenable to

an imagined ‘real world’, in a move logically prior to the

remaking of that practice in implementing trial results

(c.f. Berg, 1997).

Measures of process and the use of existing systems for

data collection also bring the RCT closer to practice

through the activities of audit and evaluation. Partly as a

result of this, the distinction between research and audit

has become problematic for institutions within the National

Health Service. Because research requires additional ethical

oversight, healthcare practitioners are increasingly likely

to define their investigations as audit and stress their

everyday nature as an integral part of the work of

management and care, but the boundaries are unclear. In one

of the most prominent efforts to resolve this uncertainty,

32

the Central Office for Research Ethics Committees has issued

guidance that proposes that one of the key differences is

randomisation. According to this, research may involve

randomised allocation to comparison groups: audit and health

services evaluation will not do so (COREC, 2006). However,

this distinction is not always used in practice, and other

attempts fall back on differences discerned between the aims

of these activities. Thus COREC defines research as ‘the

attempt to derive generalisable new knowledge’ in comparison

with the efforts of audit and evaluation to produce mere

‘information’ on local care.

There is some irony here. Efforts to improve the

contextualization of the RCT as research moves from phase I

to IV are already equated with improving the

generalisability of the results from small groups in well-

controlled settings. The previous sections of this paper

have emphasized that selection goes on in phase IV, but on

different grounds, while the methodology may be used to

allow bracketing of the less controlled aspects of practice,

or incorporation of its most stable elements. The addition

33

of audit and service evaluation further disrupts the

original continuum by proposing that surrendering to local

aims is both a categorical shift, but also a step back

towards a more narrow yet robust conclusion. However as I

note above, these attempts are still unstable and contested

on practical and political grounds, as well as

epistemological ones.

From the other side, proponents of Evidence-Based Policy

are even less able to insist on the dominance of the RCT

than supporters of EBM, and regulation is frequently

grounded in descriptive studies or audit as evidence of what

can be done with the resources at hand. In the case of the

government, the local nature of such studies is often not

seen as problematic and they are asked to work simply as

‘examples of good practice which could be applied more

widely,’ (Department of Health 2000: 11). Yet both the

National Service Framework on Coronary Heart Disease and the

independent Campbell Collaboration also celebrate examples

of enthusiasm and involvement as improving the adequacy of

research and development. The Collaboration exists to review

34

and disseminate evidence relating to social interventions of

various kinds. It embodies one of the more ‘purist’

conceptions of evidence with its emphasis on the RCT, or

non-randomised studies of high quality, but nevertheless

advocates managing bias ‘though a variety of approaches such as

abiding by high standards of scientific evidence, ensuring

broad participation, and avoiding conflicts of interest,’

(Campbell Collaboration, undated. Emphasis added).

Discussion: puzzling at the boundaries

In the past two decades students of science and

technology have followed Latour (1988) and others in

becoming increasingly wary of providing their own

definitions of ‘science’. Instead there has been growing

interest in mapping descriptions provided from within

disciplines claiming the title, as exemplified in a now

classic paper by Gieryn (1983). Such descriptions may in

35

turn be used as a measure against which the claims of

scientific practice can be characterised and even shown

wanting. The approach is encapsulated in studies of bias,

such as the one quoted in my introduction (De Vries and

Lemmens, 2006), and may be a very useful starting point for

critical accounts of particular fields of medical research.

However, the approach of this paper, following Gieryn, has

been rather to look to see when and how such boundaries may

be important for a specific set of informants. Such as

strategy produces some interesting findings.

While the methodology of the RCT continues to be

claimed as scientific, accounts of doing trials are more

likely to be described as research, with cheerful

acknowledgement of the importance of application and

estimation and limited claims to knowledge based on

empirical rather than theoretical foundations. The designers

of RCTs have a complicated relationship with explanation and

with other preoccupations of academic disciplines, though

they seek to use their empiricism to produce general

knowledge. Trials may even be described as audit or

36

evaluation if by doing so designers can avoid additional

regulatory burdens, though they must balance this concern

with their wish to achieve recognition through publication

in journals.

In designing their RCTs, trialists tend not to worry

about the categories of science or research in themselves.

Though they must clearly define the focus of their

investigation, a process that leads by implication to the

definition of other aspects as ‘context’, they show less

interest in what Knorr Cetina (1999) describes as ‘the work

of boundary maintenance with regard to the natural and

everyday order,’ (1999: 44) as in achieving relevance

through strategic connections with that order. Are these

efforts examples of ‘boundary work’ – the term suggested by

Gieryn to describe his own cases?2 If we would wish to

restrict this concept to work around science, it might be

appropriate to drop it at once. A second and more compelling

reason is perhaps that the term was originally used to

2 A fuller definition of this concept is ‘an effective ideological style for protecting professional

autonomy: public scientists construct a boundary between the production of scientific knowledge and its

consumption by non-scientists’ (Gieryn 1983: 789).

37

describe attempts to maintain and extend a division, even

though the repertoire with which this was done was flexibly

applied. In contrast, I have tried to demonstrate work

undertaken to cross, as well as maintain, a line that is

more likely to be seen as a problem than a strength of

medical research that uses the basic components of the RCT

methodology.

I started this paper with Cochrane’s enthusiasm for the

rigour of the randomised controlled trial. He also praised

its ‘wide applicability’ (Cochrane and Blythe, 1989: 159),

but in a recent discussion of the problem of external

validity, Rothwell reminds us that Cochrane was well aware

of the potential distance between trial and practice,

describing a ‘gulf between measurements based on RCTs and

benefit… in the community,’ (2005: 82). More recently this

‘gulf’ has been reinvented as a implementation ‘gap’, where

rather than practice being inferior to the contemporary

trial because of differences of resource and attention,

practice is viewed as lagging behind the results of trials.

This perception has been given much greater relevance by the

38

use of RCTs beyond the direct clinical sphere for the

assessment of cost-effectiveness by organisations like the

National Institute for Health and Clinical Effectiveness as

Cochrane’s ideas have been incorporated into health service

management and regulation.

The paper has described ways in which researchers set

out to narrow this gap by increasing the generalisability or

applicability of RCT results though design strategies

modifying the pure world of the experiment, among which I

have singled out selection, bracketing and incorporation. A

number of the trial designs considered here use the ability

of persons or objects to exist in two or more different

spheres at the same time to support such strategies.

Examples might include general practices as units of care;

protocols; mortality records; questionnaires; prompts and

decision aids; the organisational structures of audit and

other metrological regimes. Their identification owes much

to Star and Griesener’s well-known account of ‘boundary

objects’ as ‘plastic enough to adapt to local needs and the

constraints of several parties employing them, yet robust

39

enough to maintain a common identity across sites,’ (1989:

508). Unfortunately, this does not tell us much about the

dynamics or effects of these relations, nor the moments when

participants resist the explicit enrolment or elucidation of

separate elements of practice in favour of bracketing

unexplored assemblages. To explore those issues, it is

helpful to develop an understanding of RCTs as a

‘transaction space’ (Nowotny, Scott & Gibbons, 2001: 97).

Without specifically describing elements of my own

developing ethnography, the best account I can find of the

trial as a zone of transaction comes from Cochrane’s

autobiography. The story comes from his discussion of his

growing interest in the 1960s in trials of the place of

treatment. Rather ruefully he recalls the breakdown of

randomisation in the first of these trials, comparing

inpatient and outpatient treatment of varicose veins, as a

result of the additional purification carried out by a

surgical registrar ‘abstracting “interesting” cases from one

side,’ (Cochrane and Blythe, 1989: 208). His next project

was to be a trial of coronary care units versus home care

40

for people who had suffered heart attacks. The origins of

this study are identified in his own research interest,

concern from someone at the Department of Health and Social

Security (DHSS) about the cost of these units and the

clinical doubts expressed by a cardiac consultant in

Bristol. Cochrane set out to enrol the DHSS and Medical

Research Council (MRC) in a discussion about the ethics of a

study at a national level. Meanwhile started talking to

local general practitioners and cardiac consultants in

Cardiff, who ‘seemed interested and not particularly

antagonistic, particularly as there was a chance of their

getting a new coronary care unit through the plan,’ (1989:

209). A meeting of the joint DHSS and MRC Committee provided

the first confrontation between elites in research and

cardiology, but gave approval for the trial. ‘The very next

day the cardiac consultants made it clear to me that

whatever Lord Platt had said there was going to be no

randomised controlled trial of coronary care units in

Cardiff. I was at first disbelieving, then furious, and

cursed the day I had become a professor in the hope of

41

better cooperation with consultants. Later I became

fascinated by the psychology underlying the decision,’

(1989: 210).

Cochrane’s story neatly encapsulates the ways in which

different actors bring different aspirations for the trial

to the table. Though these agendas may nevertheless be

played out under the umbrella of the RCT methodology,

Cochrane was clearly well aware of the difficulties of

design and organisation facing any single trial. In the

process of designing a study, even within the apparently

rigid methodological stipulations, the RCT thus becomes a

‘transaction space’ in which all groups have something to

bring and take away in a project shaped throughout by

context (Nowotny, Scott & Gibbons, 2001: 141). This meshing

of networks as well as materials cannot be reduced to bias:

without it knowledge production could not go ahead. Even in

this 1960s example, authoritative ethical approval is the

prerequisite for such a project, but Cochrane is confident

initially that the trial methodology can also accommodate

the diverse networks he describes. Though randomisation is

42

designed to eliminate bias, the trial cannot go ahead

without working on and through recognisable emotions,

passions and ideology (Latour, 1998). Similarly in the

contemporary trials discussed in this paper, researchers

enrol diverse assemblages of policy-makers, doctors, nurses,

patients who must all be ‘interested’ in at least part of

the processes captured in research. Along with the

incorporation of ready-made units of organisation and

measurement, such selection allows trialists to produce

knowledge that is contextualised as a condition of its

elaboration. The need for ethical approval is thus only one

of the sites where social concerns shape the nature of the

experiment, despite the absence in the trials discussed in

this paper of particularly active patients or communities.

However, not all contextualisation relies on processes

of making things explicit. Other actors and objects are

involved in research but are made real and allowed to remain

attached to practice precisely by being excluded from the

twin techniques of randomisation and control. This paradox

was debated in a conversation between Alvan Feinstein and a

43

Dr Pinsky reported in the Proceedings of a Workshop on

Evaluation of Therapy, which took place in September 1983.

Commenting on a paper on trial methodology Feinstein argued

that the ‘messiness’ of the situation (with difficulties

selecting patients, holding interventions constant etc.)

means that the RCT, even with intention to treat analysis,

becomes meaningless. However for Pinsky ‘in the very

situation in which there is so much messiness, you are

suggesting the use of a messy technique. I would propose to

you that such a situation is just the type in which a messy

technique will yield no answer at all.’ Responding,

Feinstein argues that they should be more ambitious to study

‘the total world instead of that world that we can

conveniently capture,’ (Discussion following Brown 1984:

350). The conversation recalls Law on the need to

acknowledge ‘mess’ in social scientific research. His

solution, like Feinstein, is accepting more ‘messy’

techniques, yet he is also alive to the power of what he

calls ‘manifest’ rather than ‘Othered’ absence in accounts

of methodology, which seems to resonate with the strategies

44

of bracketing and incorporation discussed here (2004: 14).

Researchers in the field of cardiovascular research continue

debates on these issues as much through their own design

choices as though such self-consciously methodological

discussions, but the connection also reminds us that these

decisions may be relevant for our own research practice.

Conclusion

Randomised controlled trials have many things in common with

other tests, including assays. They share a quality of

public demonstration explored by Schaffer (2005) and the

importance of staged evaluation of the ingredients or

quality of objects. A substantial note of formality is

introduced by the demands made for randomisation and control

that appear to effect what Latour (1993) would understand as

purification. Yet other aspects of trial design expose the

difficulties of defining both the content and context of the

experiment and acknowledge the limits of scientific

practice. In particular, the designers of RCTs are careful

to use selection and incorporation as ways of claiming

45

relevance as well as rigour, and fully exploit the

possibilities that randomisation opens up for strategic

retreats from comprehensive control as well as from

explanation. Though the RCT continues to be celebrated as a

bulwark against bias, complex sets of persons and things

must be involved and interested in the production of

knowledge. In the process the categories of science,

research and practice, are reworked in the struggle for

persuasive evidence. The result is the ‘alchemy of the RCT’,

characterised by the ritual invocation of randomisation and

control as tools to transform these imperfect materials into

the stuff of certainty.

Acknowledgements

This is a development of a paper first presented at the

European Association for the Study of Science and Technology

meeting in Paris 2004. I would particularly like to thank

the conveners of a session on Evidence in Practice for the

stimulus to consider these issues, and among them Tiago

46

Moreira for his helpful comments and encouragement. I am

also grateful to the three anonymous reviewers for their

useful suggestions on the paper in its current form. The

research was made possible by two awards, one for work on my

PhD and a second for postdoctoral study (ESRC R42200134004

and ESRC/MRC PTA 037270093) in the field of cardiovascular

research and innovation. (112 words)

References

Baker, R., Fraser, R.C., Stone, M., Lambert, P., Stevenson, K. and Shiels, C. (2003). ‘Randomised controlled trial of the impact of guidelines, prioritised review criteria and feedback on implementation of recommendations for angina andasthma.’ British Journal of General Practice, 53, 284-291.

Berg, M. (1997). Rationalizing Medical Work: Decision-Support Techniques and Medical Practices. The MIT Press: Cambridge MA and London.

Brown, BW. (1984). ‘The randomized clinical trial.’ (Printedwith following discussion) Statistics in Medicine 3, 307-311.

Callon M. and Rabeharisoa V. (2003). ‘Research in the wild and the shaping of new social identities.’ Technology in Society,25, 193-204.

Cambrosio A, Keating P, Schlich T and Weisz G (2006) ‘Regulatory Objectivity and the Generation and

47

Management of Evidence in Medicine,’ Social Science & Medicine, 63, 189-99.

Campbell Collaboration (undated) About the Campbell Collaboration [Accessed online 18 January 2007] www.campbellcollaboration.org/About.asp

Campbell, N, Thain, J, Deans, H, George, R, Lewis D., Rawles, JM, Squair, JL. (1998). ‘Secondary prevention clinics for coronary heart disease: randomised trial of effect on health.’ BMJ 316, 1434-7.

Campbell, M, Fitzpatrick, R, Haines, A, Kinmouth A-L, Sandercock P, Spiegelhalter D and Tyrer D (2000). ‘Frameworkfor design and evaluation of complex interventions to improve health.’ BMJ, 321, 694-6.

Cochrane, A. (1972). Effectiveness and Efficiency. Random Reflections on Health Services. Nuffield Provincial Hospitals Trust: London.

Cochrane, A. and Blythe, M. (1989). One Man’s Medicine. An autobiography of Professor Archie Cochrane. The British Medical Journal: London.

COREC (2006). Differentiating audit, service evaluation and research. [Accessed online 18 January 2007] www.corec.org.uk applicants/help/docs/Audit_or_Research_table.pdf.

Cupples, M.E. and McKnight, A. (1994). ‘Randomised controlled trial of health promotion in general practice forpatients at high cardiovascular risk.’ BMJ, 309, 993-6.

De Vries, R. and Lemmens, T. (2006). ‘The social and cultural shaping of medical evidence: Case studies from pharmaceutical research and obstetric science.’ Social Science &Medicine 62, 11, 2694-2706.

Department of Health (2000). National Service Framework on Coronary Heart Disease. HMSO: London.

48

http://www.corec.org.uk/

Eccles M., McColl E, Steen N, Rousseau N, Grimshaw J., Parkin D, Purves I. (2002). ‘Effect of computerised evidence-based guidelines on management of asthma and anginain adults in primary care: cluster randomised controlled trial.’ BMJ, 315, 941

Epstein, S. (1996). Impure science: AIDS, activism and the politics of knowledge. Berkeley: University of California Press.

Feder, G., Griffiths, C., Eldridge S, Spence M. (1999). ‘Effect of postal prompts to patients and general practitioners on the quality of primary care after a coronary event (POST): randomised controlled trial.’ BMJ 318, 1522-6.

Garcia, J., Elbourne, D. and Snowdon, C. (2004). ‘Equipoise:a case study of the views of clinicians involved in two neonatal trials.’ Clinical Trials 4, 1, 170-178.

Gieryn, T. F. (1983). ‘Boundary-work and the demarcation of science from nonscience: strains and interests in professional ideologies of scientists.’ American Sociological Review 48, 6, 781-795.

Heart Protection Study Collaborative Group (2002a). Questions and answers. [Accessed online 18 January 2007] www.ctsu.ox.ac.uk/~hps/June02QandA.shtml

Heart Protection Study Collaborative Group (2002b). ‘MRC/BHFHeart Protection Study of cholesterol lowering with simvastatin in 20,536 high-risk individuals: a randomised placebo-controlled trial.’ The Lancet 360, 7-22.

Jolly K., Bradley F., Sharp S., Smith H., Thompson S, Kinmouth A-L and Mant D. on behalf of the SHIP collaborativegroup (1999). ‘Randomised controlled trial of follow up carein general practice of patients with myocardial infarction

49

and angina: final results of the Southampton heart integrated care project (SHIP).’ BMJ 318, 706-11.

Jones D.S. (2000) Vision of a cure: visualization, clinical trials, and controversies in cardiac therapeutics, 1968-1998. Isis 91, 3, 504-541

Knorr Cetina K. (1999). Epistemic cultures. How the sciences make knowledge. Harvard University Press: Cambridge MA.

Latour, B. (1987). Science in Action. Open University Press: Milton Keynes, UK

Latour B. (1988) The Pasteurisation of France. Harvard University Press, Cambridge MA and London, England.

Latour, B. (1993). We have never been modern. Harvard UniversityPress: Cambridge MA. Translated Catherine Porter.

Latour, B. (1998). ‘From the world of science to the world of research?’ Science 280, 208-209.

Law, J. (2004). After method. Mess in social science research. Routledge: London and New York.

Marks, H.M. (1997). The Progress of Experiment. Science and Therapeutic Reform in The United States, 1900-1990. Cambridge University Press: Cambridge.

Marks, H.M. (2000). ‘Trust and Mistrust in the Marketplace: Statistics and Clinical Research, 1945-1960.’ History of Science 38, 343-355.

May, C., Rapley, T., Moreira, T., Finch, T. and Heaven, B. (2006). ‘Technogovernance: evidence, subjectivity and the clinical encounter in primary care medicine.’ Social Science & Medicine 62, 4, 1022-1030.

50

Medical Research Council (2001) Press release for the Heart Protection Study. [Accessed online 17 August 2005] www.mrc.ac.uk/txt/index/public-interest/public-news_centre/public-press_office/public-press_releases_2001/public-13_november_2001b.htm

Moher M., Yudkin P, Wright, L., Turner R., Fuller A, Schofield T. Mant D. for the Assessment of Implementation Strategies (ASSIST) trial collaborative group (2001). ‘Cluster randomised controlled trial to compare three methods of promoting secondary prevention of coronary heart disease in primary care.’ BMJ 322, 1-7.

Moreira, T. (2005). ‘Diversity in clinical guidelines: the role of repertoires of evaluation.’ Social Science & Medicine 60, 9, 1975-1985.

Muir J., Mant D., Jones L., Yudkin P., on behalf of the Imperial Cancer Research Fund OXCHECK study group (1994). ‘Effectiveness of health checks conducted by nurses in primary care: results of the OXCHECK study after one year.’ BMJ 308, 308-12.

Nowotny, H., Scott, P., & Gibbons, M. (2001). Re-thinking science: knowledge and the public in an age of uncertainty. Cambridge: Polity Press.

Porter, T. (1995). Trust in numbers. The pursuit of objectivity in science and public life. Princeton University Press: Princeton New Jersey.

Richards, E. (1991). Vitamin C and Cancer: Medicine or Politics? Macmillan: London.

Riles, A. (2001). The network inside out. University of Michigan Press, Ann Arbor.

51

Rothwell, P. (2005). ‘Treating individuals 1. External validity of randomised controlled trials: “To whom do the results of this trial apply?”’ The Lancet 365, 82-93.

Schaffer, S. (2005). ‘Public Experiments’ in Making Things Public. Atmospheres of Democracy, ed. by B. Latour and P Weibel. The MIT Press: Cambridge MA and London, England. 298-307.

Shepherd J., Cobbe S.M, Ford I., Isles C.G., Lormier A.R., Macfarlance P.W., McKillop, J.H. and Packard C.J. for the West of Scotland Coronary Prevention Study Group (1995). ‘Prevention of coronary heart disease with pravastatin in men with hypercholesterolaemia.’ New England Journal of Medicine 335, 1001-9

Star, S.L. and Griesemer, J.R. (1989). ‘Institutional ecology, ‘translations’ and boundary objects: amateurs and professionals in Berkeley’s museum of vertebrate zoology.’ Social Studies of Science 19, 3, 387-420.

Steptoe, A., Doherty, S., Rink, E., Kerry S., Kendrick T andHilton S. (1999). ‘Behavioural counselling in general practice for the promotion of healthy behaviour among adultsat increased risk of coronary heart disease: randomised trial.’ BMJ 319, 943-8.

Timmermans, S. and Berg, M. (2003). The Gold Standard. The Challenge of Evidence-Based Medicine and Standardization in Health Care. Temple University Press: Philadelphia.

Timmermans S and Berg M (1997). ‘Standardisation in Action: Achieving Local Universality through Medical Protocols.’ Social Studies of Science 27, 2, 273-305.

West of Scotland Coronary Prevention Study Group (1992). A coronary primary prevention study of Scottish men aged 45-64years: trial design. Journal of Clinical Epidemiology 45, 849-860.

52

Wood, D.A., Kinmouth, A-L., Davies G A., Yarwood J., Thompson S.G., Pyke, S.D.M., Kok Y., Cramb R., Le Guen, C., Marteau T.M., Durrington P.N. for the Family Heart Study Group (1994). ‘Randomised controlled trial evaluating cardiovascular screening and intervention in general practice: principal results of British family heart study.’ BMJ 308, 313-320.

53

the alchemy of clinical trials

Documents