ep103 webboard faqdl.lshtm.ac.uk/programme/epp/docs/faqs/ep103 faq.pdfep103 webboard frequently...

39
EP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session 1: Developing a study protocol Cinderella topics Q: “On page 50 of chapter 3 "Developing the research question" by Crombie and Davies, the Cinderella Topics are mentioned. I can't understand what they mean by that.” A: “The issue of 'Cinderella topics' in health is something that you will see relatively frequently, for example with regards to health care, diseases, risk factors, etc. It is thus important to understand what it means. What it refers to is a low status or profile (like Cinderella's status in the fairy tale). It highlights the fact that a certain health issue is relatively neglected, whose impact is underrated, which receives insufficient funding and attention by researchers etc. When talking about a 'Cinderella health issue', some authors will sometimes say things like 'It is time for Cinderella to go to the ball', meaning that more attention, funds, time, etc should be allocated to this particular issue.” Session 2: Writing a study protocol Participatory rural appraisal Q: “In the eye service protocol, they explain the Participatory Rural Appraisal (PRA) approach. Is this something important for epidemiology in developing countries?” A: “PRA was developed in the mid-eighties in Thailand, in the context of agricultural research in developing countries. It consists of a combination of techniques for community studies, which are used to assess health needs. Emphasis is placed on empowering local people, who are helped in identifying their own problems, and then in developing, implementing and evaluating their solutions. It relies on visual, flexible and creative data collection methods which may be suitable with less literate populations – examples are Venn diagramming and community mapping. A number of qualitative methods are also used, such as focus groups discussions, semi structured interviews and direct observation; qualitative methods are covered in Session 10 of EP103. PRA provides rich contextual data using people’s own points of view and priorities, which help in the interpretation of other quantitative data. Chapter 10 of Smith and Morrow's book also includes a section on PRA.” 30 Nov 2005 EP103 WebBoard FAQ.doc 1/39

Upload: others

Post on 22-May-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

EP103 WebBoard Frequently Asked Questions / Questions of Particular Interest

Block 1 (session 1 to 3)

Session 1: Developing a study protocol

Cinderella topics

Q: “On page 50 of chapter 3 "Developing the research question" by Crombie and Davies, the Cinderella Topics are mentioned. I can't understand what they mean by that.”

A: “The issue of 'Cinderella topics' in health is something that you will see relatively frequently, for example with regards to health care, diseases, risk factors, etc. It is thus important to understand what it means. What it refers to is a low status or profile (like Cinderella's status in the fairy tale). It highlights the fact that a certain health issue is relatively neglected, whose impact is underrated, which receives insufficient funding and attention by researchers etc. When talking about a 'Cinderella health issue', some authors will sometimes say things like 'It is time for Cinderella to go to the ball', meaning that more attention, funds, time, etc should be allocated to this particular issue.”

Session 2: Writing a study protocol

Participatory rural appraisal

Q: “In the eye service protocol, they explain the Participatory Rural Appraisal (PRA) approach. Is this something important for epidemiology in developing countries?”

A: “PRA was developed in the mid-eighties in Thailand, in the context of agricultural research in developing countries. It consists of a combination of techniques for community studies, which are used to assess health needs. Emphasis is placed on empowering local people, who are helped in identifying their own problems, and then in developing, implementing and evaluating their solutions. It relies on visual, flexible and creative data collection methods which may be suitable with less literate populations – examples are Venn diagramming and community mapping. A number of qualitative methods are also used, such as focus groups discussions, semi structured interviews and direct observation; qualitative methods are covered in Session 10 of EP103. PRA provides rich contextual data using people’s own points of view and priorities, which help in the interpretation of other quantitative data. Chapter 10 of Smith and Morrow's book also includes a section on PRA.”

30 Nov 2005 EP103 WebBoard FAQ.doc 1/39

Page 2: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

Session 3: Ethical issues in epidemiological research

Ethics and morals

Q: “What are morals? How are they different from ethics?”

A: “First of all, ethics comes within the discipline of ‘philosophy’ which is often translated as the ‘love of wisdom’ or the ‘love of truth’. The skeleton of philosophy contains several large bones:

• metaphysics: investigating the underlying nature and structure of reality as a whole • epistemology: ‘what is knowledge’ • logic • philosophy of mind: what is the human mind • political philosophy: what would utopia be like? • aesthetics and ethics: how should we live and why should we live like that, what is

good and bad, what is ‘happiness’

Two definitions of ‘ethics’ used by some researchers are: ‘Ethics is asking how we ought to live’, Socrates (a philosopher) ‘Ethics is to do with learning to live together’ Bonhoeffer (a theologian) From the point of view of health and health policy, ethics is contained within any questions we ask about ‘HOW’ we do things (eg how we establish a TB control programme) and also in any statement that contains an ‘OUGHT’ (eg he ought to use direct observation of treatment for treating TB patients). Morals pertain to good or bad conduct. If you are thinking about morals you are thinking about what is considered good or bad by a person, community, organisation, society etc.”

Ethics and rights

Q: “How do ethics and rights relate to each other? Is there a difference between an ethical issue (as a common good) and a right issue (for the individual)? E.g. both have legal aspects.”

A: “Broadly speaking, ethics (and morals) are about how we ought to live, and why (as Socrates put it). Ethics involves many different philosophical concepts and theories oriented around this subject. On the other hand rights are, in the main, thought of as entitlements conferred by society, which usually need a legal basis to have substance. The language of rights has become confusing, and is often loosely applied. For example, individuals often talk about their right to something (healthcare, knowledge and so forth) without it really being clear what this means, or whether saying it carries any weight.”

Legal basis of ethics

Q: “Is there a legal basis for research ethics?”

A: “There is no set of specific legal rules/laws to cover health care research (and thus epidemiological research) in the UK. However, researchers need to comply with a set of ethical ‘principles/standards’ – a more concrete way of ‘applying’ ethics – before any

30 Nov 2005 EP103 WebBoard FAQ.doc 2/39

Page 3: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

research project is started (Declaration of Helsinki). In practice, this becomes mandatory because funding agencies, research institutions, pharmaceutical companies, etc require that researchers comply with these principles, and this is checked by an ethical committee. A related point mentioned in the red CIOMS booklet (page 41) is that ‘Ethical review committees generally have no authority to impose sanctions on investigators who violate ethical standards. However, they should be required to report to institutional or governmental authorities any serious or continuing non-compliance with ethical standards as they are reflected in protocols that they have approved. Failure to submit a protocol to the committee should be considered a violation of ethical standards’.”

Equipoise

Q: “Where does clinical equipoise fit into the question of ethical consent? If we do not know whether something helps or harms, would it be ethical to continue giving someone a treatment outside of a well defined and pragmatic clinical trial? Does the idea of equipoise include the notion that the intervention is not statistically superior to standard treatment or is just conventional wisdom sufficient to allow a trial to go forward?”

A: “The question of equipoise is a difficult but very important concept in medical research, of course when planning the study, but also with regards to informed consent. Before agreeing to take part in a study, potential participants must be fully informed of (as cited by the CIOMS red booklet, pages 14-15): 1) the benefits that might reasonably be expected to result to the subject or to others as an outcome of the research, 2) any foreseeable risks or discomfort to the subject, associated with participation in the research, 3) any alternative procedures or courses of treatment that might be advantageous to the subject as the procedure or treatment being tested. At the planning stage, the balance between risks and benefits must be considered really well. Smith and Morrow in their book 'Field trials of health interventions' describe this with some useful examples (pages 79-80-section 3.2 and page 82-section 4.1). The Declaration of Helsinki (particularly points 16-19) also addresses this topic. In a clinical trial, for the position of equipoise to be satisfied the researcher must be uncertain whether the new treatment is actually better than the standard, to which it is being compared. Equally, the researcher must be reasonably sure the subject is receiving treatment at least as good as that which he/she would have received had he/she not been enrolled. Statistics will play a part in judging both these aspects of equipoise, along with conventional wisdom and consensus opinion i.e. interpretation of existing evidence alongside professional opinion and common sense. Each prospective trial has to be judged separately because each differs with regard to what evidence is available, what previous studies have been done, and so forth.”

Information sheets: how to make sure they are understandable

Q: “In generating these for patients and making them available to them, how does one make sure that they are written in language patients can understand? Does one carry out a pilot

30 Nov 2005 EP103 WebBoard FAQ.doc 3/39

Page 4: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

study as for questionnaires? Are they reviewed by social scientists to look for culturally appropriate terms and correct translations?”

A: “According to the CIOS guidelines, the investigator has a duty to (among others): - communicate to the prospective subject all the information necessary for adequately informed consent - give the prospective subject full opportunity and encouragement to ask questions - seek consent only after the prospective subject has adequate knowledge of the relevant facts and of the consequences of participation, and has had sufficient opportunity to consider whether to participate - renew the informed consent of each subject if there are material changes in the conditions or procedures of the research The information sheets (and also informed consent sheets) have to be reviewed by an institutional review board/ethics committee. These committees will often ask for changes in the information sheets. However, not all committees have the chance to include people from the general community (who might more easily find bits of information that should be made clearer or that are missing). One way to deal with this is to pre-test the information sheets on people from the general community who have similar characteristics to potential participants (e.g., only women/men, specific ethnic group, age group, etc). In subsequent pilot work, you will then be able to assess the information sheet within the specific study population.

Acceptability of an action

Q: “The three different approaches to acceptability (goal-, duty- and rights-based approaches, p. 3.7) are all culturally defined. We are then left with trying to analyze ethics using ‘rationality’ and ‘logic’ as the text suggests. This, of course, could be criticized as being a ‘western’ approach.”

A: “These comments are completely fair. Biomedical ethics, and by extension research ethics, do come out of western moral philosophy, and have been criticised for western bias. Everything is of course culturally defined, but what is suggested in the question is that the three frameworks are shaped by (or loaded with) western morality. A more balanced approach needs to consider culturally local values, or value systems (e.g., theological ethical systems, Buddhist ethics, Islamic ethics, etc). This does not, however, mean that everything dissolves into a void of pure moral relativism. Western moral theories/systems - of which there are many - have universalistic threads, which is not the same as saying everyone everywhere should adopt them. But, if you look at morality in different cultures you will find similar themes (prohibition on at-will killing, rules about lying etc.), albeit with local variations. So the frameworks are not a bad starting point, but are not definitive, and context is highly relevant.”

TB protocol: masking field workers; confidentiality

Q: “I was puzzled by a point in the ethical considerations of the TB protocol. It says that HIV status will not appear on records given to the field workers. But this study includes only HIV positive subjects. Does that mean that the field workers are not aware of the question asked in that study?”

30 Nov 2005 EP103 WebBoard FAQ.doc 4/39

Page 5: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

A: “From the information provided in the protocol, it is impossible to say whether or not all field workers knew of the hypothesis being tested in the study; if the field workers knew what the study was about, then they would know that everyone was HIV positive. So we do not know whether the fact that the HIV status will not appear on records is an attempt to mask field workers – that appears to be the issue you are concerned with. However, not writing HIV status on the records does go some way towards protecting confidentiality. For instance, what if other clinic staff saw the notes, or if the notes went missing? By not writing HIV status on the records, you ensure that only study staff, aware of the research question, are also aware of the patients’ HIV status.”

Trials of ziduvidine to prevent perinatal HIV transmission: various issues

NB: Several ethical issues related to the trials of zidovudine to reduce perinatal HIV transmission were raised. Some sections of the Declaration of Helsinki (as updated in October 2000) relevant for the issues raised are listed below:

In the ‘INTRODUCTION’ of the document it says: 5. In medical research on human subjects, considerations related to the well-being of the human subject should take precedence over the interests of science and society. 7. In current medical practice and in medical research, most prophylactic, diagnostic and therapeutic procedures involve risks and burdens. 8. Medical research is subject to ethical standards that promote respect for all human beings and protect their health and rights. Some research populations are vulnerable and need special protection. The particular needs of the economically and medically disadvantaged must be recognized. Special attention is also required for those who cannot give or refuse consent for themselves, for those who may be subject to giving consent under duress, for those who will not benefit personally from the research and for those for whom the research is combined with care.

In the section ‘BASIC PRINCIPLES FOR ALL MEDICAL RESEARCH’ it says: 10. It is the duty of the physician in medical research to protect the life, health, privacy, and dignity of the human subject. 13. The researcher should also submit to the (ethical) committee, for review, information regarding funding, sponsors, institutional affiliations, other potential conflicts of interest and incentives for subjects. 18. Medical research involving human subjects should only be conducted if the importance of the objective outweighs the inherent risks and burdens to the subject. This is especially important when the human subjects are healthy volunteers. 19. Medical research is only justified if there is a reasonable likelihood that the populations in which the research is carried out stand to benefit from the results of the research.

30 Nov 2005 EP103 WebBoard FAQ.doc 5/39

Page 6: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

In the section ‘ADDITIONAL PRINCIPLES FOR MEDICAL RESEARCH COMBINED WITH MEDICAL CARE’, it says: 30. At the conclusion of the study, every patient entered into the study should be assured of access to the best proven prophylactic, diagnostic and therapeutic methods identified by the study.

Exit strategy for research projects

Q: “When we leave the research field, are there general principles about how we should leave?”

A: “It is important to be concerned about what happens when a research study ends. Simply turning the back to study participants may represent a lack of respect and consideration for their dignity. The Declaration of Helsinki says: ‘Ethics in research involves respecting the health and dignity of the human being. Protecting their well being should take precedence over the interests of science and society’. This means the research population should be the researchers’ number one priority! Most importantly, perhaps, is that you ‘should not start a study in a population, unless you are confident that they will very likely benefit from the results of the research’ (another citation from the Declaration of Helsinki). Thus, the researcher should be concerned that the benefits from the results of the research will reach the study population. Moreover, the results of the research should become available to those who participate. They should be communicated in a simple manner so that they may be easily understood.”

Independent monitoring of trial ethics

Q: “I believe that there should be an independent group to debrief after a major research project. The group should speak to the subjects and ask whether any unreasonable pressure was brought to bear. They could examine the way the study concluded in the location as well. A statement in the written consent on the agreement to end participation at any time is all very well, but is it ever assessed? I think the pressure to keep subjects in some trials must be enormous. Would there be any potential problems in such a review?”

A: “This point is related to the monitoring of ethical conduct during a study. The problem of research misconduct is now discussed openly in medical journals, as well as in the press. The red booklet from the CIOMS (International Guidelines for Biomedical Research Involving Human Subjects) does not provide clear guidelines as to monitor the application of ethical issues in research. However, it says: ‘The preferred methods of control include cultivation of an atmosphere of mutual trust, and education and support to promote in investigators and in sponsors the capacity for ethical conduct of research (p. 41).’ Thus, prevention is a very important aspect. But how about monitoring? The role of ethics committees in that matter is often limited to reviewing annual progress reports, approving amendments, and assessing the report of adverse effects. However, in other countries (e.g., Canada), the ethics committees have an obligation to monitor research, and this is specified when ethical approval is granted. In the USA, a lot of debate (and some highly publicised cases) has led the government to

30 Nov 2005 EP103 WebBoard FAQ.doc 6/39

Page 7: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

develop a set of standardised procedures to be followed when an allegation is made about a study sponsored by some governmental agencies. But the debate continues for a more generally applied set of rules. The suggestion of independent reviews at the end studies is interesting. This could, for example, be organised by a national research body for research misconduct (if it exists). This body could decide whether reviews should be done only when allegations are made or in random samples of studies. To read about the current situation in the UK, you could go to the following papers (available from www.bmj.com): Farthing, Horton, Smith. Research misconduct: Britain's failure to act. BMJ 2000; 321: 1485-1486. Blunt, Savulescu, Watson; Meeting the challenges facing research ethics committees: some practical suggestions. BMJ 1998; 316: 58-61.”

Block 2 (session 4 to 7)

Session 4: Introduction to sampling methods

Session 5: More complex sampling methods

When use cluster sampling?

Q: “I would like to know when using cluster sampling. At page 5.8 is said when it is preferable to use cluster sampling, but I really do not understand it.”

A: “Cluster sampling can be used for different reasons, for example cost and practicality. For example it would be cheaper and more manageable to enrol a number of classrooms of children in an intervention study to increase fruit and vegetable intake than to recruit a few children each in many more classrooms. It would also be less costly and again more manageable to get data on the socioeconomic status of a city if you sample small areas of the city (the interviewer can concentrate in turn on each area) than if you pick individuals at random (the interviewer has to move around much more).”

Stratified random sampling vs. cluster sampling

Q: “From both the workbook and Kirkwood chapter 23, I believed that cluster sampling involves including all sample units within the selected clusters in your sample. However, the first two optional articles in our reader (Bener & Al-Ketbi, Bennett et.al.) talk about cluster sampling where only a portion of sample units were randomly selected from within each cluster. These two papers actually seem to be describing stratified random sampling.”

A: “Cluster sampling is defined in our material as a one stage sampling design in which the sampling units are groups (or clusters) of enumeration units (individuals). It may also be called single-stage or one-stage cluster sampling; you may come across these terms in the literature - be aware of variations in the terminology. It implies taking a sample of clusters and selecting all enumerating units within each cluster. On the other hand, multi-stage sampling refers to a sampling design with two or more sampling stages. Any probability sampling method may be used in any stage, including the sampling of clusters in one or more stages of the process. In this sense, multi-stage sampling may be regarded as an

30 Nov 2005 EP103 WebBoard FAQ.doc 7/39

Page 8: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

extension of (one-stage) cluster sampling, which involves more than one stage. The paper by Bener and Al-Ketbi describes a multi-stage sampling technique, which they refer to as ‘multi-stage stratified cluster sampling’. It is a ‘multi-stage’ sampling because it involves multiple sampling stages. Three strata were used (Abu-Dhabi Emirate, Dubai Emirate and Al-Ain City), thus the term ‘stratified’. The term ‘cluster’ refers to the fact that clusters were selected in successive stages, i.e. schools, classes. Note that both strata and cluster refer to groups of individuals. The WHO EPI survey sampling scheme is often referred to as an example of cluster sampling, as it is by Bennett et al. You could also classify it as a multistage sampling design, as the sample selection involves more than one stage. Moreover, the paper ‘Estimation of Design Effects in Cluster Surveys’ by Katz, Scott and Zeger, also in the reader, refers to ‘cluster sampling or random allocation of clusters’. They continue: ‘A simple random, systematic, or stratified sample of clusters can be selected. Individuals need be identified only within the selected clusters. Within these clusters, all individuals or a random or systematic sample of all individuals can be selected for the study’. Thus, this paper on cluster surveys also seems to use the term ‘cluster sampling’ in a wider context, referring to the fact that clusters such as villages, communities and health centres are used in the sampling process. No account is taken of whether a one-stage or multi-stage sampling technique is used. Whichever sampling technique is used (and however it is called), the important thing is whether you end up with a probability sampling, which allows you to make inferences about the population you draw the sample from. Another issue is whether the sample was selected with probability proportional to size. What is important is to be as clear as possible when describing the methods used, so as to make sure that whoever reads them knows exactly what you mean.”

Unequal size first-stage units

Q: “In the example in the study guide, we have 14 schools and have to select 4 schools (the first-stage units) with probability to size. Why calculate the cumulative frequency?”

A: “In this example, ‘probability proportional to size’ means that you want to make sure that the chance of a school being selected is proportional to its size (how many children it has). Your final goal is that each child in each school has the same chance of being selected. One option would be to use simple random sampling. In that case you would give each child a number and randomly select the required number of children. But this might mean that you would select a few children in several schools… and this might be unpractical (i.e. having fieldworkers obtaining data from only a few children from several schools that may be very far from each others). So you decide to first select only 4 schools using probability proportional to size. You have 14 potential schools for a total of 7031 children. The first school has 950 children. If you allocate (at least in theory based on the number of children) each child in this school a number (number 1 to 950), you will thus give the school an overall probability of being selected equal to 950/7030=0.135 or 13.5%. The second school with 410 children (numbered 951 to 1360 as you don’t want the same numbers happening twice) will have a probability of being selected equal to 410/7030=0.058 or 5.8%. And so

30 Nov 2005 EP103 WebBoard FAQ.doc 8/39

Page 9: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

on until the last school which has 725 children (numbered 6307 to 7031) and thus a probability of being selected equal to 725/7031=0.103 or 10.3%. Can you see that by allocating each child a number what you are in fact doing is a cumulative frequency? Can you also see that each school will have a chance of being selected proportional to its size? The first school has a 13.5% chance of being selected because it includes 950 children. The second school has a 5.8% chance of being selected because it includes only 410 children. And so on. Using the list of all children (1-7031) you will then select 4 numbers at random and see to which school they correspond. In the example the first random number is 0282, thus school 1 (282 falls between 1 and 950). The second is 2521, thus school 5 (2521 falls between 2119 and 2585). The third is 3815 thus school 8. The fourth is 6138 thus school 13. You have thus selected 4 schools with probability proportional to size.”

PPS and self-weighting sampling methods

Q: “(1) Regarding the multistage sampling example from the ‘Immunity in Swedish Population’ paper: why do they select a parish once only? Much of what I have been reading suggests that it would be inappropriate to not resample from a parish if the PPS sampling interval, starting at a random number, suggests that it should be sampled twice. (2) Given that one county was smaller and contributed less, are they obliged to weight up that county's results? (3) All counties had the same number selected from each parish, hence the claim that the distribution is self-weighting. What does self-weighing actually mean?”

A: “The recommended procedure when sampling is to select units at all stages except the last with probability proportional to size, and at last stage to take an equal number of sub-units from each selected unit (in this case persons within parishes). If you multiplied all the probabilities of each stage, then everything cancels out, each person having the same overall probability of selection. This is what is meant by a self-weighting sampling method – no weighting is necessary during the statistical analysis of the data. (1) It is correct, as suggested, to take2 samples from a parish if it is selected twice. (2) If one county is smaller, i.e., there are fewer parishes than intended, then it should be upweighted indeed.”

PPS: why only applied to first sampling unit?

Q: “Why is it that we do not apply probability proportional to size to the second sampling units (SSU) as we do for the first sampling units (FSU)? Suppose we need a sample of 200 children from 4 schools, and the number of children per school is as follows: 50, 200, 500 and 1000. Why take the same number of pupils in each school, why not take samples in proportion to the number of children in each school? Suppose one takes 50 in each school. The sampling fractions would then be 100%, 25%, 10% and 5% respectively. It appears there is a great potential for bias in this approach, for example when the smallest

30 Nov 2005 EP103 WebBoard FAQ.doc 9/39

Page 10: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

school has mainly pupils from a certain religion or different culture. Would these children not be relatively over-represented in the sample?”

A: “When first stage units are of unequal sizes, you have to apply probability proportional to size at the first sampling level, that is, when sampling schools. If a school is small, it will have less chance of being selected. Thus, a small school (e.g. one that has pupils from a certain religion) will have less chance of being selected. However, it is possible that ‘by chance’ this school is selected. To avoid imbalances in your example, you would increase the number of FSUs from 4 to, say, 20, and decrease the number of SSUs per FSU from 50 to 10. With PPS, the smaller the school, the less likely to be sampled at all, and if sampled, the smaller the number of FSUs it is likely to contribute to the sample. Given that the difference in school size is taken into account at the first stage, the number of SSUs should be equal in all FSUs, and SSUs should be sampled by simple random sampling. Using PPS at both stages would lead to an over-correction of the difference in school sizes and thus to bias. Let’s take another example to illustrate probability proportional to size. Let’s say you want to estimate, in your region, the proportion of individuals who have ever seen a doctor because they were victims of street violence, by sampling cities within the region. Suppose that in your region there are several small cities and a few large cities. In a simple random sample, the size of the cities is not taken into account, and a typical sample of cities will contain mostly small cities. Would you then get a representative estimate of the situation? Probably not, as street violence and thus illness caused by it will be heavily influenced by the size of the cities, most likely being higher in large urban areas. Thus, you should be able to improve on the simple random sample by giving the large cities a greater chance to appear in the sample, and you can accomplish this by ‘sampling with probabilities proportional to size’. However, let’s assumes that you want to study how a certain outcome varies specifically with religion in children. This is different from the examples above – in these examples, the main objective was to obtain a valid prevalence estimate in a population. If the objective is to investigate the association between an outcome and religion, you may need to take steps to insure that the final study sample includes a sufficiently high number of children from each religion – and not just let chance do it for you. In this hypothetical situation, let’s assume that 80% of the pupils are from a certain religion and 20% are from another religion. You might consider one of the following sampling strategies:

A. If schools can be stratified by religion: 1) stratify schools by religion (e.g., 2 strata) 2) within each strata, select a number of schools using probability proportional to size (e.g., take 4 schools within each stratum) 3) within each selected school, take a simple random sample of children (same number in each school; e.g., take 100 children within each school) Thus, the final sample would have 800 children, 400 from each religion. The advantage of this method, compared with taking a simple random sample of children within each two strata (stratified simple random sample), is that you do not have to have a list of all children in all the schools (sampling frame).

30 Nov 2005 EP103 WebBoard FAQ.doc 10/39

Page 11: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

B. If schools cannot be classified by religion and assuming that there is a similar mix of religions within each school: 1) select a number of schools using probability proportional to size (e.g., select 16 schools) 2) within each school, stratify children according to religion (e.g., 2 strata) 3) within each stratum, take a simple random sample of children from each religion (e.g. 25 children in each stratum) Thus, the final sample would have 800 children, 400 from each religion. Remember that in both scenarios, the sample of 800 children will not be representative of the ‘total’ population of children as children from the less common religion have been over-sampled. But in this example, this was not the objective of sampling – the objective was to examine a possible association between religion and a health outcome. In this study unit, you read about the main sampling methods that exist. However, this example shows how sampling strategies can be combined so that the need of each specific study can be met.”

Bener paper: PPS

Q: “Re: paper by Bener et al. on cigarette smoking among high school boys in the United Arab Emirates. The paper describes how the first-stage sampling units (high schools) were stratified by geographic area, then selected by simple random sample. As explained in Kirkwood and the workbook, this is only appropriate if the number of boys in each high school is about the same. The paper then says that second-stage units (classes) were randomly selected in each of the selected schools. Then, the number of students selected in each class was proportional to the total number of students (why not just the boys, or are all high schools segregated by sex in the UAE?) in each school. So, it seems that the authors recognized the importance of using probability proportional to size sampling when the 1st stage units are of different sizes. However, the method they used wouldn’t result in each high school boy in the study population having an equal chance of being selected. There was nothing done at the first stage to ensure that schools with the most boys had greater chances of being selected.”

A: “Unfortunately, the authors were not very clear on how they went about the sampling procedures. We do not know anything about the number and size of schools and classes, neither on the number and proportion of schools sampled. There is also no information on whether boys and girls go to different schools in UAE. If this is the case, then the number of boys and number of students mean the same thing - the authors might have just used the term students to refer to boys, but we do not know if this is the case. However, the number of students per school was probably variable, as the authors mentioned that ‘the number of students in each school was in proportion to the total number of students in each school’. From what the authors wrote, they appear to have applied ‘epsem’ (equal probability of

30 Nov 2005 EP103 WebBoard FAQ.doc 11/39

Page 12: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

selection method) at all stages of selection, except that they also stratified schools according to the geographic area. Therefore, irrespective of size, each school had a similar chance of being selected; each class within the schools also had a similar chance of selection. At the end, every student may have had an equal probability of being selected, as the number of students selected in each school was proportional to the total number of students in each school. Note that the probability of selection of a multistage sample is equal to the product of the probabilities of selection at each sampling stage. Having no information at all about the number of students in each school could perhaps have justified this approach. However, there are some disadvantages of applying epsem at all stages of selection in a multistage design (when the primary sampling units vary in size), compared to sampling with probability proportional to size (PPS). As small schools had the same probability of selection as large schools, this means that large schools could have easily been kept out of the sample and therefore been under represented. This is unlikely to happen when a sampling with probability proportional to size is used. Other disadvantages of using epsem include the yield of less precise estimates and less control over the final sample size. If you are interested, you could try to get hold of ‘The Encyclopedia of Biostatistics, by Armitage P and Colton T (1998) New York, John Wiley & Sons’. It goes into some detail (pages 3945 to 3950) into sampling with PPS and epsem.”

Q: “Why is it, in the example given of unequal size first stage units in multi-stage sampling, that after the schools have been selected, the same number of pupils are sampled from the selected school? Isn't better to allow school to be selected only once and take a sample proportional to the size of the school, like they do for parishes in the Svensson paper?”

A: “The number of students taken in each school was indeed proportional to the size of each school. This is what the authors wrote in the last sentence of the first paragraph of ‘Data collection procedure’. This was how they managed to give each student in the sampling frame an equal chance of being selected for the study.”

Number of first and second stage units

Q: “How do we determine the optimum number of First Stage Units and then the optimum number of Second Stage Units within each First Stage Units.”

A: “There are no good rules that always hold for determining the optimum numbers of first stage units (FSU) and second stage units (SSU). Each case must be studied on its own, but pilot surveys with various FSU/SSU sizes might help point you in the correct direction (if this is feasible). Reviewing the literature to see what other researchers have done is also very useful. But in the end, a balance between the size of the FSU and SSU usually must be achieved. Your choice will most likely be influenced by costs and practical issues, and of course on the availability of a sampling frame that could be used. Often, we see that individuals within a small FSU are physically close together (e.g., school, city block) and hence tend to have similar characteristics. In this situation, the amount of information about a population parameter may not increase substantially if new individuals are taken within a FSU. Since measurements cost money, a researcher would waste money by choosing too many people within a homogeneous FSU. However, situations may arise in which individuals within a FSU are very different from one another. In such cases, you would need a larger number of people within a few FSUs to get a good

30 Nov 2005 EP103 WebBoard FAQ.doc 12/39

Page 13: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

estimate of a population parameter. Let’s take an example. Let’s say that you would like to conduct a survey of university students to know their opinion about smoking in public areas. If students from a university have similar opinions on the question but opinions differ widely from university to university, then the sample should contain a few students (SSU) from many universities (FSU). However, if the opinions vary greatly within each university, then the survey should include many representatives (SSU) from each of a few universities (FSU). Time and travel expenses will often influence your decision. For example, if you would like to perform a trial of iron supplementation on infant mortality in rural Nepal, you might decide to select only a few villages (FSU) and several women (SSU) from the villages instead of selecting a few women from several villages. This would simplify the tasks of the fieldworkers and reduce travelling time and costs. As you can see, deciding on the best strategy to use is not easy, although in several situations, costs and practicalities with fieldwork will guide your choice.” NB: note that standard sample size calculations assume simple random sampling, and that adjustments have to be made for multi-stage sampling (ref. design factor etc).

Session 6: Size of a study

Desired precision or power as determinant of sample size?

Q: “When the two approaches to sample size calculation are presented in, the example given is for a study to estimate a single proportion (% vaccinated). And, the questions for this activity suggest that you would use the precision method for studies that estimate a single rate, proportion or mean, whereas you would use the power method if you’re comparing two rates, proportions or means. But later on, other examples deal with using the precision method in studies that are comparing two rates, proportions or means.”

A: “It is not the case that the precision method of sample size calculation is used only for studies which estimate a single rate/proportion or mean while the power method is used for comparing two rates/proportions or means. The precision method can be used either for a single proportion/rate/mean or when comparing two rates/proportions/means. For instance, if you want a precise estimate (that is, an estimate with a narrow confidence interval) of the proportion of children vaccinated in a population, then you use the precision method. If however, you have a study looking at an intervention, then you might use the power or precision method. If you already know that the new drug has an effect, then you will not be interested in testing the null hypothesis and again finding out that it has an effect. What you are really interested in is finding out the magnitude of the effect that it has. You will then use the precision method when calculating sample size, to have as narrow a range as possible around the measure of effect (your measure of effect can in principle be a risk, rate or odds difference, or a risk, rate or odds ratio). If however, you don’t know whether or not your treatment has an effect, then you want to be sure that your sample size is large enough for you to detect an effect if it is there. You want the sample size to be large enough so that the test of the null hypothesis (that there is

30 Nov 2005 EP103 WebBoard FAQ.doc 13/39

Page 14: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

no difference between the treatments) will be significant if there is a true difference. You would then use the power method of sample size calculation. In summary: if you are concerned about the width of a confidence interval, then you will opt for the precision method; if you are concerned to find a significant difference, you will opt for the power method. Smith & Morrow, particularly section 2.3 on page 45, discuss further the choice of criterion for calculating sample size.”

Desired precision of estimate as one determinant of sample size (1)

Q: “When calculating the sample size to determine the prevalence of a variable in one population, I am asked to define the precision. How can I know the precision before actually doing the survey?”

A: “What you need to specify is the precision of your estimate that is needed, e.g. to take a decision. Imagine you run a camp with 20,000 refugees. Your stocks and facilities are sufficient to cope with, say, 10% of severely malnourished people, but if there are 20%, you better ask your HQ for more resources. You decide to conduct a survey in a representative sample of the refugee population and ask yourself: if the true prevalence in the whole population is 20%, how precise do I need the survey result to be for helping me do decide whether I should contact HQ or not? If you choose the precision to be +/- 15%, then the required sample size is very small (n=28; for all calculations in this example: size of the population=20,000, expected prevalence=3%, design effect=1, alpha risk=5%) - but the survey result may not be of great help: with 95% certainty, your survey would find a prevalence between 5 and 35%, so you would still not know with enough confidence that reality is closer to 20% than to 10%. On the other hand, the result may not need to be as precise as +/- 0.5% - this is likely to demand too many resources for your survey given its purpose (n=11029). Perhaps a precision of +/- 3% (n=661) would serve the purpose of the survey, that is to decide whether you can run the camp with available resources or whether you need an input from HQ? With 95% certainty, your survey would estimate the prevalence to be between 17 and 23%, if the true value is 20% - that allows you to decide whether you can cope alone or whether you need help from HQ. Thus, the precision is always determined by the purpose of the survey.”

Desired precision of estimate as one determinant of sample size (2)

Q: “Is it inherent in the nature of the confidence interval for a mean that one can manipulate it to get the sample size needed - in other words, can I specify any (within reason) value for f and get the n required? As long as the equation includes the requisite 1.96 factor, this will give me 95% intervals?”

30 Nov 2005 EP103 WebBoard FAQ.doc 14/39

Page 15: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

A: “In principle yes – the question, however, is how large you can afford the 95%CI to be, or how narrow you need it to be, depending on the purpose of your study. All other factors equal, a narrower confidence interval – that is, a more precise estimate – requires a larger sample size.”

Q: “However the situation is not so clear in rate and risk ratios, where the logarithmic nature means that the error factor f is a multiplier and divider of the ratio R. The situation is more complicated, as the distance between the point estimate and the lower CI limit is different from the distance to the upper CI limit.”

A: “For risk ratios and rate ratios, the situation is similar, the only difference being that the symmetry is on the log scale: CI widths are symmetric about the log risk ratio or log rate ratio.”

Error factor for risk ratio

Q: “When calculating the error factor ‘f’ from a desired confidence interval (CI) for a risk ratio R, CI defined as R/f to Rf, do we get a different value depending on whether we use the upper limit ‘Rf’ or the lower limit ‘R/f’ to obtain f? Using the example on p. 6.9 of the course notes: for an estimate of R to within 0.1: Rf=0.33+0.1=0.43, f=0.43/0.33 =1.3. But if R/f = 0.33-0.1=0.23, f=0.33/0.23=1.43. Basically I suppose, R-R/f is not always going to be equal to Rf-R.”

A: “It is true that R-R/f tends not to be equal to Rf-R – the limits of a CI around a risk ratio have the same distance to the risk ratio on the logarithmic scale, not on the non-logarithmic scale. Therefore, the CI around 0.33 should not be expected to be 0.23 and 0.43, and deriving f from such limits will not give the same result; the formulation ‘an estimate of R to within 0.1’ is somewhat imprecise. If the upper limit of the CI is supposed to be 0.33+0.1, the lower limit will be 0.33/[(0.33+0.1)/0.33]=0.25, not 0.33-0.1=0.23. The error factor derived from the upper limit 0.43 is then 0.43/0.33=1.3, as is the error factor from the lower limit 0.33/0.25=1.3 However, if you get such a rough instruction as ‘an estimate of R to within 0.1’, one way to deal with it would be to obtain two different values for f, based on an upper CI limit of 0.43, and a lower CI limit of 0.23, to do a sample size calculation on both values for f and to use the larger sample size to be on the safe side. Here: f(upper CI limit)=1.3, n=1388, f(lower CI limit)=1.43, n=747; you may now opt for f and n based on the upper CI limit 1.43.”

Expected prevalence as one determinant of sample size

Q: “What would happen if I want to estimate the prevalence of a disease with a certain precision but I underestimate or overestimate the likely value of this prevalence in the population?”

A: “If the real prevalence is very different from what you had expected, the survey’s estimate may be more or less precise than needed; your survey may be over- or underpowered. Your ‘best guess’ for the expected prevalence will either come from your own experience, or the one of your colleagues, or from the literature. You also could use a worst case approach: if you think that the true prevalence is likely to be 20%, but could also be 10%

30 Nov 2005 EP103 WebBoard FAQ.doc 15/39

Page 16: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

or 30%, then you could calculate your sample size on the basis of these three assumptions and take the largest sample that comes out of your calculations (n=377, 661, 858, when size of the population=20000, desired precision=3%, design effect=1, alpha risk=5%)”.

Sample size calculation in multistage sampling

Q: “When I use multi-stage sampling or cluster sampling methods, how do I calculate the number of primary sampling units to be sampled - how do I breakdown my sample size between primary sampling units and secondary sampling units?”

A: “The sample size calculations that are taught in EP103 are all based on the assumption that individuals are selected by simple random sampling from the source population. If you intend to use multistage sampling, however, your next question should not be how to distribute this number of sampling units to the first and second sampling stage, but by which factor the sample size needs to be increased to take the multistage sampling into account. If you apply multistage sampling, e.g. cluster sampling, you will usually get individuals that are more similar to each other than they would be if they were not grouped together: pupils from the same school, children in the same family or from the same neighbourhood tend to share features to some extent; they are more similar to each other than you would expect by picking them individually at random from a comprehensive population register. This design effect due to cluster sampling needs to be taken into account by increasing the sample size by an appropriate amount. The factor by which you need to increase the sample size (the design effect) will itself depend on the variables you are studying (e.g. will be higher if person-to-person transmission of a pathogen is involved), but also, to come back to your question, on the size of the clusters. The smaller the cluster size, the smaller the design effect, the less you have to increase your sample size. That means: better to have many small clusters than few big ones. You will make the clusters as small as you can, taking limitations by research logistics into account. Exact calculations of the design effect are complicated and involve a bit of guesswork. In early days, people would often just double the sample size, but this is no longer considered appropriate. If you wish to know more, have a look at Bennett S, Woods T, Liyanage WM, Smith DL. A simplified method for cluster-sample surveys of health in developing countries. Wld.Hlth.Statist.Quart. 1991;44:98-106, or: Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomised trials. International Journal of Epidemiology 1999;28:319-326. In practice, expert statistical advice often needs to be sought if you want to do multistage sampling.”

Sample size calculation for comparing two proportions in the same population, e.g. in a before-and-after intervention study

Q: “How do you calculate a sample size when you are comparing proportions before and after an intervention in the same population?”

30 Nov 2005 EP103 WebBoard FAQ.doc 16/39

Page 17: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

A: “One way of doing this is by treating the pre and post exposure groups as matched pairs and analysing it by using McNemar’s test. The formula for the calculation of the sample size for such a study design can be found in a book by James Schlesselman, entitled ‘Case Control Studies: Design, Conduct, Analysis’, published by Oxford University Press 1982, page 160. However, it should be noted that a before-and-after study may not be the best study design. For most intervention studies you should really aim at having a control group, as using just one group makes it difficult to attribute any change in the outcome to the intervention, and not to a confounding variable. If it is impossible to have a control group – e.g. when the intervention consists of TV spots and it is difficult to control who will see it and who won’t – the before-and-after design may be the only option. This requires making several measurements before and several measurements after the intervention, so that you don’t obtain only data points before-and-after, but stable trends before-and-after. Such design is also called ‘interrupted times-series design’.”

Sample size calculation for not normally distributed continuous variables (1)

Q: “How can I calculate the size for a sample if the continuous data I am going to use cannot be assumed to be normally distributed?”

A: “The standard formulae for sample size / power calculations for continuous variables assume (approximate) normality in the data distribution. If this assumption does not hold, it may be possible to normalise the distribution by transforming the data. The most frequently used transformation is the logarithmic transformation, but there are many others. If transformation does not make the distribution normal enough, you could perform simulations. By simulating the non-normal data, you attempt to find a suitable sample size / power by trial and error.”

Sample size calculation for not normally distributed continuous variables (2)

Q: “In power and precision calculation for difference in means, we assume that the item measured will have a normal distribution? What if this is not the case?”

A: “In that situation there are a few options: 1. Tests such as the t-test are quite robust, meaning that even if the data do not look very normally distributed, the test is still valid, and the usual sample size calculation is OK. 2. If data are obviously non-normal, you could transform the data so that it follows a more normal distribution. We often use the log transformation for this. If the data are normal on a log scale then you can go ahead and do the sample size calculation as usual. 3. If there is no clear transformation which normalises the data, you might want to think about using it as a categorical variable rather than a continuous. For example, if you have

30 Nov 2005 EP103 WebBoard FAQ.doc 17/39

Page 18: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

immune response data with a lot of zeros and some positive values, the data is not normally distributed; you might be interested in a difference in the proportion of positive responders between two groups. So your main analysis would be a comparison of two proportions, and you can do a sample size calculation for this. 4. If the data are not normally distributed, and you can't easily transform them to be normal, you may do a non-parametric analysis. This is because if the data are not normally distributed you will not want to use the mean as an outcome measure - probably you will use the median instead. There are special methods to compare two medians, but power calculations for these are quite complex.”

Assumption of similar standard deviations when estimating sample size for the comparison of two means

Q: “Why do we assume similar standard deviations (SD) when estimating the required sample size for comparing means?”

A: “This assumption is made because the parametric statistical tests to compare means rely on it, too. Often, we don’t know whether SD will be similar in both groups, so for practical reasons, we assume it, but if the information is available from previous studies, one should definitely use it.”

Sample size for comparison of independent vs. paired means

Q: “Can we use the same sample size formula to compare independent means (i.e., from two different populations) also for paired means?”

A: “A different formula needs to be used when one wants to compare paired means (e.g. comparison of blood lipids before and after a low fat diet in one group of individuals, or comparisons in a matched case-control study). Whenever data are paired – means, proportions etc -, special formulae for the sample size calculation, and special statistical tests need to be used.”

Sample size for showing no difference

Q: “Is it possible to calculate the sample size needed to show that there is no difference between two groups?”

A: “Showing that there is no difference between two groups would often be a very useful thing indeed. But try to calculate a sample size for the comparison of two proportions with Epi-Info or Stata, and make the difference between the proportions ever smaller: the sample size will tend to infinity… So the simple answer is: No, it is not possible to calculate a sample size to prove there is no difference. It is, however, possible to calculate a sample size to show that any difference is less than a certain amount. What you need to consider is how big a difference between the two groups you judge to be unimportant for public health, to be clinically irrelevant etc, and how big a difference is important. You can then use this difference to calculate a sample size, so that if the two groups (treatments) really are the same, the confidence interval for the difference will not include the difference you judge to be important with a probability of 80 or 90%. The smaller you set the critical difference, the

30 Nov 2005 EP103 WebBoard FAQ.doc 18/39

Page 19: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

larger the required sample size will be. In the end, there is usually a balance that needs to be reached between the ‘ideal’ and the ‘practical’ sample size. Most text books on clinical trials discuss sample sizes for such ‘equivalence trials’”.

Two-sided vs. one-sided z1 for the level of significance

Q: “Why are we using a two sided z-value for alpha, the significance level, when calculating a sample size for a requested power?”

A: “Using a two-sided z-value allows for a difference between A and B to work both ways, that is, A>B, but also B>A. Say, for example, that you are comparing the effects of Drug A (the standard treatment) and Drug B (a new treatment). You may believe that Drug B is better than Drug A, but you would also wish to detect whether it has a deleterious effect, so you want to detect any effect it has, in either direction. You would therefore want to use a two-sided significance test. Very occasionally, you may use a one-sided test - for example, if the new drug you are using may have no effect or a beneficial effect, but you know for certain that it can not have a deleterious effect.”

Session 7: Randomization

Eligibility and randomisation

Q: “Should we replace randomised individuals who do not fit the inclusion criteria?”

A: “All epidemiology textbooks strongly suggest that randomisation to a study group should be done only AFTER it has been ascertained both that an individual is eligible for entry into the study AND the he/she is prepared to participate in the study. So it is important to make sure that the potential participant meets all your inclusion criteria BEFORE randomising him/her to one of the groups.”

Restricted randomisation (block randomisation)

Q: “What additional advantage does block randomisation have over simple, unrestricted randomisation?”

A: “The main advantage of block randomisation is that it helps to bring balance in the number of individuals allocated to each group compared with unrestricted randomization (e.g. tossing a coin). This is important as we would not like to have a study with a much larger number of participants in one group compared to the other group(s). Another advantage is that it can help to bring balance in the sequence of allocation between the groups. For example, let’s say that you want to do a trial with 20 people. You have 2 treatment groups: A and B. You will allocate the 20 persons into the 2 groups. You estimate that it will take about 8 weeks to recruit the 20 participants. You decide to select 10 random numbers from a random number table; these will be allocated to group A and the others to group B. Let’s say that the random numbers are

30 Nov 2005 EP103 WebBoard FAQ.doc 19/39

Page 20: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

(these are numbers that I truly got!): 3, 16, 12, 18, 1, 8, 17, 7, 15, 19. This means that the 1st, 3rd, 7th, 8th, 12th, 15th, 16th, 17th, 18th and 19th participants would be allocated to group A and the others to group B. One can see that most of the participants in group B will be allocated early on and that most of the participants in group A will be allocated later. This may not have any importance in some studies but in others it might be very important, for example, in studies that stop along the way because of some important negative effects found in some study participants (thus you need to know which treatment gives these effects and thus need to break the code and perform some analyses). One would not want in that case to have a large imbalance in the allocation between the groups as this might make the statistical analyses difficult. Similarly, in some studies interim analyses are planned. For example, one decides in advance to analyse data half way trough to see if the study should be stopped (because of very positive effects for example). Thus one would need to make sure that each group has a similar number of participants. These are examples when block randomisation is useful because for each given 4 or 10 or 20 participants (for example), one can be sure to have a balance in the allocation of participants between the groups.”

Stratified randomisation

Q: “I would like to know what is the purpose and the effect of stratified randomisation”.

A: “Stratified randomisation is a method that helps achieve comparability between the study groups for certain important characteristics. The characteristic has to be correlated with subsequent response or outcome. For example, if the response to a certain drug is known to vary between men and women, you will want to make sure that you have (approximately) the same number of men and women in the control (placebo) and intervention groups (if for example the response was known to be strong in men and weak in women, you would be overestimating the effect if – by chance – men were mostly assigned to the intervention group and women to the control group). A block randomisation strategy within each stratum (rather than simple randomisation) will then insure that you obtain the same number of men (or women) in the control and intervention groups. Of course the final number of men (or women) in the control and intervention groups might differ slightly depending on the number of persons recruited, the number who might have been allocated to a group but who never started the study, and block size (you might not finish a block when you have recruited all your study participants). This is why it is usually said that you will achieve an ‘approximate’ balance of important characteristics.”

Q: “Suppose that I would like to carry out an intervention trial, comparing treatment A with treatment B. I will assign the treatment by blocked stratified randomization because I know that two characteristics have important effects on the outcome, for example sex and age. (1) I understood that I should make a list of random numbers for each stratum. If my required sample is 100 patients, should I make a list of 25 numbers for each stratum? The question is: how many random numbers should there be for each stratum? (2) Suppose that I have previous data about the whole population; these data tell me that

30 Nov 2005 EP103 WebBoard FAQ.doc 20/39

Page 21: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

the proportion of males is 20% and the proportion of patients >70 years is 30%. I need 100 patients, and I hope that my sample will be representative of the whole population. I decide to use stratified randomization. If I make a separate list for each stratum I will complete faster the stratum of females aged less than 70 groups. Should I stop including patients in this stratum and wait for patients who fit the other stratum? What is the right procedure? Should I think in the concept of probability proportional to size?”

A: “(1) Your required sample size is 100. You have 2 treatments: A and B. You want to stratify by sex (Males/Females) and age (I assume young/old). So you have 4 strata: - Young males - Old males - Young females - Old females Let’s assume that you will perform block randomisation within each stratum with blocks sizes of 4. So as you said, for stratum you will prepare random sequences of allocation. You can prepare as many sequences are you think will be necessary (plus some extra in case you need them). For example: * Young males: BBAA AABB ABBA … … * Old males: AABB ABAB BBAA … … * Etc If you want to have an equal number of participants in each stratum, then yes, you will need to prepare a list of at least 7 blocks of 4 (28 individuals randomised). You would then recruit participants until you have found 25 persons for each stratum. As you say, you would have to wait until each stratum is completed. (2) However, remember that the objective of block randomisation is to insure that within each stratum, you will have the same number of individuals in treatment A as in treatment B. No necessarily that you will have the same number of individuals in each stratum. In a real-life situation, the study population will most probably not be evenly distributed over the strata (i.e., you would not expect the same number of old males/young males/young women/old women). Think for example of a trial related to cardiovascular diseases for which more men than women might be eligible. As a result some strata might get fewer subjects, and this might just be fine and reflect the study population. This will depend on your study objectives. Finally, you wonder if you should use ‘probability proportional to size’. We can assume that if you recruit individuals ‘as they come’ the final sample will reflect the study population (although the response rate in some strata might be higher than in other strata). But this might not be the case. Thus, if you know the ‘true’ distribution of individuals among strata in the study population, you could decide to insure that you final sample will reflect this distribution. But this is not a requirement, it is only a suggestion. Once again, you will have to be guided by the objectives of the study.”

30 Nov 2005 EP103 WebBoard FAQ.doc 21/39

Page 22: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

Block 3 (session 8 to 11)

Session 8: Methods of data collection 1

Questionnaire design: the order of questions

Q: “Is there a general outline that one can follow when making a questionnaire? For example: 1) General instructions, 2) Demographic factors, 3) Questions on possible aetiological factors, 4) Questions on possible confounding factors.”

A: “There is no one ideal way to organise a questionnaire. All will depend on the study population, topic covered, time and place of the interview, who the interviewer is, etc. However, including general instructions first, particularly if the questionnaire is to be self-administered, is a very good idea. Then, there is some debate. What most researchers will agree with is the idea that threatening questions should be placed at the end, when a good relation has been built with the interviewer, or the interviewee has warmed up to the questionnaire. Some will argue that starting with questions on demographic factors is appropriate as they are often relatively easy to answer. Others will say that several ‘demographic’ questions, particularly on income, can be threatening for some respondents and should thus be placed at the end of the questionnaire; following this logic, the questionnaire should start with the topics that are really important to the study as they may also be the more interesting ones for the respondents. If the whole content of the questionnaire could be perceived as threatening (e.g., study of sexual behaviour) the investigators will have to find a way how to soften the impact of the questions (e.g. by introducing the topic and alternating with other less threatening questions). To differentiate between aetiological and confounding factors is probably meaningless to the interviewee, so that it is not obvious why one should wish to order questions like this.”

Questions on attitudes and beliefs

Q: “What is the difference between attitudes and beliefs? For instance, in session 8, activity 4 page 8.5, question 6, I thought the question ‘Do you think eye camps are useful’ was investigating a belief not an attitude, because it is asking about something that the person says whether for him/her it is true or not.”

A: “While everyone agrees that attitudes and beliefs are different, there is quite a bit of debate around what the differences actually are and how best to explain them. For example, some social psychologists explain the difference as follows: ‘an attitude is something which we can observe, measure and perhaps modify, while a belief is a more subjective concept and is related to values, ideas which we take on from socialising in the outer world. In a sense, social attitudes are mechanisms for the transmission of social beliefs’. People’s beliefs and attitudes can be shaped and reshaped by forces like race, ethnicity, culture, gender, environment, exposures and life experiences. In questionnaire design, these factors and the most appropriate words, ideas, phrases and concepts need to meet the population we are interested in. Preparatory qualitative work might help to reach this objective.

30 Nov 2005 EP103 WebBoard FAQ.doc 22/39

Page 23: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

Regarding the question ‘Do you think that eye-camps are useful?’, we would tend to see this as a question about an attitude, for the following reasons: - it relates to what the person says he/she wants or thinks - it is something that could be modified relatively easily (e.g. by explaining to the interviewee why the camps are useful) - it is something that does not refer so much to personal values, that is something less easily modifiable, e.g. the belief that it is not worth treating elderly people.”

Data collection form etc.

Q: What is the difference between “data collection form”, “data abstraction form”, “record abstraction protocol” and “record abstraction form”?

A: “The terms are similar but a bit different which can lead to confusion. One has to remember that this is about existing records, i.e. data that have been recorded for a purpose other than for the epidemiological study (e.g., mortality records, birth certificates, medical records, etc.). Now, information is supposed to be obtained from these records (e.g., cause of death, date of birth, etc.). First, in order to collect the information properly and in a standardised way, one has to use a record abstraction protocol. This is a document that will be prepared by the investigators and that will describe in details how the information should be collected from the existing records. This is important, for example, because there might be cases where the information is missing (so what should be done??), unclear (e.g. bad hand-writing – what should be done??), incomplete (what should be done??), or where there are several answers where only one should have been provided (e.g. several causes of deaths – what should be done??). It is thus important that the person collecting the information knows exactly what to do in each circumstance. Then the person employed to collect the data will use special forms to enter the required information. These forms could be called data collection form, data abstraction form or record abstraction form. The first term is of course more general than the other two as it simply refers to ‘a form used to collect data’ (thus any type of data collected from existing records or from any other source). The other two terms are more specific as they refer to a form used for the collection of information that is abstracted from existing records.”

Session 9: Methods of data collection 2

Telescoping of exposure

Q: “What is ‘telescoping of exposure’, mentioned in activity 3, table 2?”

A: “Telescoping can be defined as the tendency of respondents to report events which actually took place before the time period being studied. For example, if you ask a respondent how many times he/she went to the doctor during the last 6 months, he/she may include in the count -without realising it- visits that he/she did before that time period (e.g. visits he/she did between 6 months and 12 months before). Records, as opposed to interview data, include the exact dates of the visits, thus avoiding telescoping.”

30 Nov 2005 EP103 WebBoard FAQ.doc 23/39

Page 24: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

Healthy worker effect

Q: “In session 9, activity 4, task 1, the feedback mentions the ‘healthy worker effect’ as potential source of bias. But in this particular study, the comparison is between two groups of workers exposed or non-exposed to radiation, and not to the general population, so is the healthy worker effect important in this context?”

A: “The study described in the paper by Najarian and Colton is a proportional mortality study. The number of deaths from cancer in nuclear workers is compared with the expected number of cancer deaths estimated from age-specific proportions of deaths in the reference population (USA White males in this study). You can see how the calculations are performed by looking at pages 85-86 in your book from Hennekens and Buring (Epidemiology in Medicine). In that sense, the comparison group is the general population and thus the healthy worker effect becomes an issue. For example, if we assume that nuclear workers are overall ‘healthier’ than the reference population, they might be less likely to die from cardiovascular diseases, the most important cause of death in the USA. As a result, the proportion of deaths from cancer may be higher in nuclear workers than in the general population, while incidence rates are the same in both groups.”

Using dead controls for dead cases?

Q: “Re: Armstrong et al, p. 227, on using dead controls for dead cases. Why may using dead controls lead to biased estimates of the prevalence of exposure? Does this mean their use is likely to underestimate the potential effect of the exposure of interest on the outcome of interest? And why is selecting living controls and obtaining proxy responses for them an acceptable alternative? It my get around the bias issue, but doesn't it just introduce other potential ‘differences’?”

A: “Let’s take the example of a case-control study to examine whether a low fruit and vegetable intake is related to heart disease. The starting point in case-control studies is the definition of a group of people with a particular disease or condition. Say, cases are individuals who died from a heart attack. The second step is to select suitable controls without the disease or condition and representing the population from which the cases originated. Why would using dead controls potentially lead to biased estimates of the prevalence of exposure (here low fruit and vegetable intake)? This is because the death of the controls may be related to the exposure of interest, and control would not be truly for the population the cases are coming from. In this example, one would not want to select individuals whose causes of death are also related to low fruit and vegetable intake (for example different types of cancer including oesophageal, lung and stomach cancer). This problem is similar to the problem of selecting hospital controls for hospital cases (see FE16, page 5, sheet 3). As controls and cases become more similar in terms of their exposure status than they should be, this may lead to underestimating the effect of low fruit and vegetable on mortality from a heart attack. Armstrong suggests using living controls and obtaining proxy responses from them. By selecting living controls, the sample of controls may be more representative for the

30 Nov 2005 EP103 WebBoard FAQ.doc 24/39

Page 25: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

population from which the cases are coming from. By selecting proxy respondents for both the cases and controls one tries to make the potential sources of measurement bias as similar as possible for the two groups.”

Effect of non-response on OR (optional material)

NB: The material presented in this section goes beyond the material covered in EP103, and beyond what is expected in the exam. The issues of non-response that are essential in EP103 are those covered in session 11.

Q: “In [the optional] activity 9, Nelson in his/her paper on "Proxy Respondents" writes on page 216 line 14 (repeated on page 217 line36) that if the item non response rate differs with respect to exposure or disease alone the value of the odds ratio is unbiased. How could this be?”

A: “The effect of non response on measures of association is rather complex. Consider a hypothetical population of 100,000 persons in which: 1) 30% of the population is exposed and 10% of exposed individuals develop the disease over a defined period of time 2) 70% of the population is unexposed and 5% of unexposed individuals develop the disease Disease Exposure Yes No Total Yes A=3000 C=27000 30000 No B=3500 D=66500 70000 Total 6500 66500

The risk ratio (RR) in the population is (A/(A+B))/(C/(C+D))=(3000/30000)/(3500/70000)=2.00 The odds ratio (OR) is AD/BC=(3000*66500)/(3500/27000)=2.11 Let’s call the fractions sampled from each of the four combinations of exposure and disease categories (cells A,B, C, D in the table) fa, fb, fc, and fd. When there is no selection bias, then fa=fb=fc=fd. For example if a 1% sample of the population of 100,000 persons is selected, then on average fa=fb=fc=fd=0.01 and the estimates of RR and OR (from the sample) are identical to those obtained from the whole population. However (in most cases), if selection fractions are not equal, the expected value of the RR and OR are: RR=[faA(fcC+fdD)] / [fcC(faA+fbB] OR=fafd(AD) / fbfc(BC) Using these equations, if the selection fractions differ with respect to exposure only (fa=fb and fc=fd), then the values of both the RR and the OR are unbiased. For example if fa=fb=0.95 and fc=fd=0.5, then: RR=(2850/28500)/(1750/35000)=2.00 OR=(2850/25650)/(1750/33250)=2.11 Thus the size of the exposed and unexposed groups does not affect the expected value of RR and OR, provided that each group is representative of the population in terms of its disease experience subsequent to selection.

30 Nov 2005 EP103 WebBoard FAQ.doc 25/39

Page 26: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

If the selection fractions differ with respect to disease only (fa=fc and fb=fd) then the RR is biased but the OR is not (this is what Nelson says). But for diseases affecting only a small proportion of the population, the magnitude of the bias for the RR is small for plausible differences in the selection fractions. Let’s say that fa=fc=0.80 and fb=fd=0.40, then: RR= (2400/13200)/(2800/29400)=1.91 which is only slightly lower than the population value of 2.00, even with a twofold difference in selection fractions for diseased and non-diseased individuals. OR=(2400/26600)/(10800/2800)=2.11, which is the same as the population OR. The type of selection bias having the most serious impact on study results is the one in which the selection fractions do not vary only with respect to exposure, or only with respect to disease, but instead vary according to specific combinations of exposure and disease (what Nelson says). Suppose for example that individuals who have both the exposure and develop the disease are less likely to be selected (e.g. smokers who will develop lung cancer), fa=0.50 and fb=bc=fd=0.90. Then the expected RR=1.16 and OR=1.17. Both values are quite different from the population values. Things can get even more complex, so better stop here…”

Session 10: Methods of data collection 3

Acceptance/refusal rate in qualitative research

Q: “In the session describing advantages and inconveniences of qualitative methods in epidemiology, there was no mention of the acceptance rate. I would think that refusal might be more frequent in qualitative studies, especially in focus groups, because: - it is more time consuming - it is more sensitive than a structured interview or questionnaire - some people might be too shy to participate in focus groups. I assume that non-respondent bias might be important in some qualitative epidemiological studies.”

A: “The question gets at some fundamental aspects of qualitative research. a) Refusal/acceptance rates are obviously important because only certain people will enter ANY study, qualitative or quantitative. Some of the decisions to be made about which methods to use, however, have to be made on the basis of what type of information is desired. Some data are more sensibly collected in quantitative surveys (e.g. population levels of disease, age structure of population etc.) and some in qualitative work (e.g. meanings, understandings, sensitive topics). b) ‘Bias’ in the statistical sense is a bit of a red herring regarding qualitative work because we rarely take a random sample in the way that would be done in a quantitative study. Respondents for interviews (group or one-on-one) are often chosen for particular characteristics, and further respondents are subsequently interviewed in light of the preceding interviews. In addition, some populations are particularly unsuitable for random sampling e.g. IV drug users – it is difficult if not impossible to create a sampling frame and there are probably too few for it to be worth taking a random population sample in the hope of finding them. In this type of situation, we have to rely on sampling where we have

30 Nov 2005 EP103 WebBoard FAQ.doc 26/39

Page 27: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

no idea about the size of the response bias (although we can sometimes guess at the direction of the bias). c) A related point to b) above is that some people are simply more articulate and/or willing to talk than others. Interviewing a strict random sample of the population for qualitative work is likely to yield a number of respondents who are either unwilling or unable to talk on the topic you want, even if they agree to participate. Sometimes the silences in themselves are ‘data’, but generally, unforthcoming interviewees provide less useful data than those prepared to talk at greater length. Of course, you have to be aware of possible ‘confounders’ e.g. political pressure that may be silencing a sector of the community. This latter concern of course also applies to quantitative research. d) Regarding your point that someone may be more willing to undergo a structured, quantitative interview than an unstructured one: If the issues investigated are sensitive e.g. drug use, political affiliation, sexual activity etc., it is perfectly true that many people will not participate. Nevertheless, if this is the case, we must also question the validity of quantitative investigations into the same topic. It is often possible to obtain more reliable information in an in-depth interview about ‘quantitative’ elements of behaviour e.g. number of sexual partners, condom use, than can be obtained using highly structured questionnaires where misleading answers can be given either on purpose (“no, I've never had sex”), or because of misunderstanding the question. In my own work, initial answers given in qualitative interviews e.g. about contraceptive use, have often been changed completely before the end of the interview following probes and clarification on both sides. e) A more philosophical point relating to the above: as in quantitative research, any hope of discovering ‘absolute truth’ is rather naive, particularly when examining nebulous concerns such as attitudes and beliefs. Responses will depend on the phrasing of questions, the relationship between interviewer and interviewee, the time of day, the meaning of particular words to the interviewer and the interviewee (not necessarily shared), the physical location of the interview etc. The advantage of qualitative research is that it allows you to probe and to examine concepts in detail. Participants are not limited to the set responses that may lead to reporting bias in quantitative surveys. The problem with reporting bias in quantitative surveys is that it often goes unrecognised, and potential complexity of responses is ignored. For example, the dynamics of the process of interviewing are frequently disregarded (e.g. an interview with a teenage boy about his sexual activity is likely to be different if is conducted by a middle-aged woman compared with a young man nearer the interviewee's own age). In other words, response bias per se should not be our only concern: we must also consider reporting ‘bias’ – which is almost inevitable if you take the view that there is an absolute reality to be discovered (not my personal view!). f) In general, many of the issues about bias and reliability in qualitative research also apply to quantitative research. The difference is sometimes just that the issues are more transparent in qualitative research. The important thing to remember is that by choosing one method over another, you will be measuring different things. So for instance, drug trials are likely to require large, quantitative studies to measure specific outcomes such as mortality rates. On the other hand, people’s experiences of side effects, or their adherence to therapies may better be illuminated using qualitative methods. This is why researchers are increasingly looking for ways of using both qualitative and quantitative methods and

30 Nov 2005 EP103 WebBoard FAQ.doc 27/39

Page 28: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

‘triangulating’ the evidence i.e. using results of each method to strengthen to the final conclusions of the study. A good reference for this type of issue is Martin Hammersley (1998) ‘Reading Ethnographic Research’. London: Longman

Participant observation: systematic but unstructured?

Q: “In one reference text, ‘participant observation’ is defined as ‘qualitative observation’, that is as ‘systematic recording of behaviours, actions etc in naturally occurring settings’. In another reference text, ‘participant observation’ is explained as ‘unstructured observation where the investigator is a participant’. How can the same research method be systematic and unstructured at the same time?”

A: “Participant observation is systematic because relevant observations are systematically collected, documented and coded. It is unstructured because the observer does not have a predetermined list of detailed events to register - he/she remains open and flexible, keeping the research question in mind. And it is (one type of) a qualitative method because it generates qualitative data.”

Session 11: Field organization and quality control

Adjustment of sample size for non-response

Q: “If we do not replace non-responders, we would loose in power for the study but if we replace them, there is a risk of selection bias. So is it better to take a bigger sample size than calculated, to include the proportion of non-responders?”

A: “By replacing non-responders or adjusting the estimated required sample size for potential non-response rates, you would help insure that you have sufficient power to detect a difference if one truly exists. However, even if you replace non-responders or if you start up with a larger sample of potential responders, your risk of selection bias due to non-response will not be reduced, as your final sample will only be made up of responders. The key issue is thus to try to maximise response rates!”

How to deal with non-responders

Q: “I am doing a survey by sending questionnaires by post. I chose cases, and I took controls (2 per case) matched for sex and date of birth. My question is how to deal with non-responders: - Should I do a matched analysis, keeping only pairs of cases and controls who did respond? And if yes, should I try to replace non-respondent controls by selecting other controls? - Should I do a comparison of both groups (non paired), with just the mention of proportion of responders in each group, maybe comparing the responders and non-responders?”

30 Nov 2005 EP103 WebBoard FAQ.doc 28/39

Page 29: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

A: “As this is a matched design you should perform a matched analysis. The sort of matching that you are referring to introduces a bias by making the cases more similar to the controls than what would normally be. So this bias must be taken into account by performing a matched analysis. In more advanced statistical study units this will be covered. With respect to non-responders, the following could be done: 1) Contact the non-responders repeatedly (at least 3 times) by post or phone, 2) Recruit other controls for non-responders if this is necessary for sample size reasons, 3) Compare the responders and non responders in terms of age, sex and other socioeconomic determinants (if available) and hope that they are comparable, 4) Perform some form of sensitivity analysis, assuming that, say, all non-responders had the risk factor, to determine what influence this might have on the effect, 5) Differentiate between those who refuse to participate and those who could not be reached, 6) Discuss these 5 points very self-critically in the discussion section of the paper.”

How to correct for selection bias due to non-response

Q: “In the effect of selection bias due to non-response, I don’t understand the formula 11.1 in Armstrong’s book, page 303. How could I obtain (or calculate) the OR in the population ?”

A: “In order to obtain the odds ratio in the population (ORP), you have to modify the formula slightly: ORP= (PbPc)/(PaPd) × ORR So you need to know: - the OR from the respondents (that’s what you obtain from data analysis) - the response proportion in individuals with the risk factor and the disease (Pa) - the response proportion in individuals with the risk factor but without the disease (Pb) - the response proportion in individuals without the risk factor but with the disease (Pc) - the response proportion in individuals without the risk factor and without the disease (Pd) The ORR is easily obtainable from your study results. But in order to obtain Pa, Pb, Pc and Pd, you need to have information on the risk factor and disease status from the non-respondents. This is not readily available and you would need to contact at least a sample of non-respondents (ideally representative of all non-respondents) and get their collaboration to provide this information.”

Consent form: mention severity of side-effects?

Q: “In the example consent form given on page 11.8 (Zambart chemoprophylaxis study), it is mentioned that ‘side-effects have been explained to the patient’. Would be appropriate to

30 Nov 2005 EP103 WebBoard FAQ.doc 29/39

Page 30: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

include in the consent form that the participant understands that side-effects can be severe, and possibly (though infrequently) life-threatening?”

A: “Potential participants should know about possible side effects and about their rights to potential compensation - if this is appropriate for the study. This is part of the informed consent. In the ‘International Guidelines for Biomedical Research Involving Human Subjects’ (EP103 reader), it is said on page 14 (guideline 2) that before requesting an individual’s consent to participate in a study, the investigator must provide the individual with information in a language that he/she is capable of understanding. This includes, among a number of points, ‘any foreseeable risks or discomfort to the subject, associated with participation in the research’. However, there is an additional note saying that: ‘In the case of complex research projects it may be neither feasible nor desirable to inform prospective subjects fully about every possible risk. However, they must be informed of all risks that a reasonable person would consider material to making a decision about whether to participate. An investigator’s judgment about what risks are to be considered material should be reviewed and approved by the ethical review committee.’ In that sense, if the risk of death is ‘real’ (you would want it to be documented), the participants should know about it before deciding to participate or not in the study. It may be difficult to tell a patient that he/she may die or have serious side effects by taking part in the research, but if the risks are not justified, then one should not be doing the study anyway. I assume that when you do a trial, you believe the benefits outweigh the risks, and therefore that it is worth taking the risks. Thus, there is no reason why the potential participants should not know about this. Having said that, this information need not be in the consent form. Actually, it is better placed in the ‘Information Sheet’ so that the consent form is not overloaded with information (have a look at your study unit reader - section related to Session 3 - to see various examples of information sheets and consent forms). However, you would include in the consent form a sentence saying that the person has read (or was read) the information sheet.”

Informed consent: costs of treating side effects

Q: “Since I presume any side-effects will be treated by the researchers, should we mention to the participants that costs of treating side-effects will be borne by the researchers?”

A: “The CIOMS brochure also mentions that the individuals should know ‘whether the subject or the subject’s family or dependants will be compensated for disability or death resulting from such injury’. You can read further details of the obligation to provide economic compensation in the event of death or disability resulting from specified types of research-related injury in Guideline 13 on pages 36-37.”

30 Nov 2005 EP103 WebBoard FAQ.doc 30/39

Page 31: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

Block 4 (session 12 to 16)

Session 12: Data processing

Coding of categorical variables

Q: “When should a categorical variable be coded as ‘numeric’ instead of ‘text’?

A: “There is no fixed rule for this, and some people will prefer to have as many categorical variables directly coded (and entered) as ‘numeric’ as possible; others will prefer to enter such data as text (=string). What is important is to try to limit data entry error. Thus the format of the questionnaire needs to parallel the format of the data entry sheet (e.g. if sex is coded as M and F on the questionnaire, it might be easier to enter ‘M’ or ‘F’ during data entry. If the number 1 is circled for males and number 2 is circled for females, then it might be easier to enter sex as a numeric variable). In some statistical packages, e.g. Stata, having codes as numbers is more practical during analysis than having them as text. Remember that even if you enter codes as text variables, you can always transform them into numeric variables during the analyses.”

Session 13: Describing and presenting data

Normality of a distribution

Q: “(1) What is the difference between a normal distribution and a symmetric distribution? (2) Is normality not best analysed by looking at the mean and median, and if they are close then the distribution is normal/symmetric? (3) What should a cdf plot look like in a perfectly symmetric/normal distribution?”

A: “(1) Both terms - normal and symmetric - are used here as synonyms, although this is strictly speaking not correct: a normal curve is a curve that is symmetrical about the mean AND bell-shaped. The normal distribution is also called the Gaussian distribution. (2) To assess whether the distribution is normal, you can compare the mean and median, as well the mode. If they are close, this suggests that the distribution is symmetric, and therefore fulfils an important condition for being normal. However, you can also use the other techniques described in Session 13 to assess whether a distribution is normal. You can look at the percentiles (Activity 6), or produce graphs (Activity 7). (3) For a normal distribution, the cumulative frequency distribution obtained using the command cdf in Stata should give you a curve that would look similar to figure 6 on page 13.12, but with a longer tail on the left hand side – also the cdf plot must be symmetrical.”

Univariate analysis

Q: “What exactly does the term ‘univariate analysis’ mean – analysis of one variable? It is often used for 2x2 tables on an exposure and an outcome, so for the analysis of two variables.”

A: “In session 14, the term ‘univariate analysis’ is used in the assessment of the relationship between an outcome variable and ‘one’ exposure variable. This terminology is often used like that in epidemiological analyses, but some statistics textbooks and statisticians will

30 Nov 2005 EP103 WebBoard FAQ.doc 31/39

Page 32: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

prefer to use the term ‘univariate analysis’ for the description of one variable at a time, bivariate analysis when looking at 2 variables at a time (e.g., one outcome and one exposure, or 2 exposure variables, etc). Multivariate analysis in epidemiology is often referred to as the analysis of two or more exposure (= explanatory = independent) variables with one outcome (= dependent) variable, as in happens in multiple (linear) or logistic regression; in a ‘strict statistical sense multivariate analysis means the study how several outcome variables vary together (Kirkwood, 2nd ed., p. 106). The co-existence of various meanings of the same term is unfortunate, but a reality we will have to live with still for a while.”

Session 14: Data analysis: estimation and hypothesis testing

Using rates of risks for multiple events

Q: “In session 14, activity 6, I have some problem understanding Smith & Morrow, p 307-8: If I understand well, they say that when each individual is at risk of experiencing events more than once during the study period, it is better to use the incidence rate. It is possible to convert it to incidence risk, but doing this, we loose information and then, power to detect differences between groups. But, then, they say that the analysis in not straightforward and that to overcome this problem, we can use the conversion to risk or exclude the individual from time the first event occurs and use rate calculations: doesn’t this comment contradict what is said just above? I don’t see the advantage of using rates in this case, because if we use risk, we can make different categories to refine the analysis (ex: “at least 1 event”, “2-3 events”, “more than 3”…). We can then compare the categories in a ‘stratified’ way, lets say: the risk ratio of getting at least 1 event, the risk ratio of getting 2-3 events, the risk ratio of getting > 3 events: it is maybe a little bit tedious, but is that way of doing correct or not?”

A: “Smith & Morrows describe a problem when you deal with a disease which may occur more than once during observation time; they offer a range of analytical approaches for such a situation, and point out that there is a trade-off between simplicity and power of the analysis. This is a very typical trade-off in epidemiological analysis, where you have the choice between precise and powerful but complex approaches, and simple but not-that-exact and less powerful ones. Here, simple approaches would be to calculate the risk of having one-or-more events, or to calculate the rate based on the first event only. The complex approach would consider all events, but would take into account the change in susceptibility (which may increase or decrease or remain the same) to suffer a second event once the individual has suffered a first one. The first event may have weakened the individual, so the susceptibility to experience another event may be higher than in an individual who has not experienced an event previously. On the other hand, the individual may have acquired partial immunity through the first event, so the susceptibility to experience another event may be lower than in an individual who has not experienced an event previously. This variation in susceptibility after the first event is the complexity which one would like to avoid; the problem is, this variation is often not predictable. In conclusion: there are several acceptable approaches, and no clear right and wrong. It is, however, important that one demonstrates understanding of the underlying complexity.

30 Nov 2005 EP103 WebBoard FAQ.doc 32/39

Page 33: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

--- With respect to the analysis approach you suggest, there are two comments to make: firstly, the categories you suggest are not mutually exclusive. Each observation should belong into exactly one category without ambiguity. Categories like ‘one event, 2-3 events, more than 3 events’ are mutually exclusive, but the categories ‘at least 1 event, 2-3 events, >3 events’ are not: any observation with more than 0 events would fit into the first category OR into the second/third one respectively. This must be avoided. By the way: categories should also be comprehensive, that is, for each observation there is one category where it fits in - so you would need a fourth category for ‘zero events’. Secondly, to calculate three risk ratios for the comparison of the disease experience of two populations may not be technically wrong, but certainly something you would rather wish to avoid, unless you have clear and justifiable reasons for going down that road. The comparison of two populations should result in ONE risk ratio/odds ratio/rate ratio, not in a series of ratios (unless there is interaction, which is a separate issue altogether). Is it wrong to be more cumbersome than necessary? Perhaps not wrong, but it’s not good either.”

Session 15: Confounding and stratification

Differences between CI computed by STATA and reported in the course notes/calculated by hand

Q: “As I worked through Activity 3, I found that when asked to calculate odds ratios for smoking, ovarian cancer and oral contraceptive use in Stata, the odds ratios I get are the same as in the Workbook but the 95% confidence intervals are slightly different. The difference is very small, and doesn't change the conclusions drawn from the confidence intervals (although the Stata confidence intervals are always wider), but I was just wondering why it happened - does Stata use a slightly different method to calculate them?”

A: “The confidence intervals in the workbook have been obtained by the Cornfield approximation, while STATA computes exact CIs by default. STATA offers Cornfield CIs as an option, as in: ‘cci 35 42 63 41, cornfield’ The STATA help function ‘help cci’ explains: ‘cornfield requests that the Cornfield approximation be used for calculating the standard error of the odds ratio. Otherwise, standard errors are obtained as the square root of the variance of the score statistic or exactly in the case of cc and cci.’ Exact CIs are preferred if you have the computing power to obtain them. Otherwise, Cornfield approximate CIs will do.”

Attributable fractions from case-control studies

Q: “In activity 3, task 2, I am confused when I see the words ‘attr. frac.ex.’, ‘attr. frac.pop.’, ‘prev.frac. ex.’ and ‘prev. frac. pop.’ as I thought one could not calculate incidence risks or incidence rates in case-control studies. How do they calculate these values then?”

30 Nov 2005 EP103 WebBoard FAQ.doc 33/39

Page 34: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

A: “A case control study cannot deliver absolute frequencies of disease or exposure (risks, rates, odds) in the source population because it’s the researcher who decides on the composition of the study population: whether he opts for 1 control per case of 2 or 3 controls per case will obviously impact on the frequency of disease in the study population, and if disease is associated with exposure, on the frequency of exposure in the study population, too. These frequencies in the study population are therefore completely artificial and have in general nothing to do with the frequencies of disease or exposure in the source population. However, the case control study can provide the information of relative frequencies of exposure in the diseased or non diseased, and the relative frequency of disease in the exposed on non exposed, expressed as odds ratio. And this allows to calculate the fraction of diseased in the EXPOSED which is attributable to the exposure (and therefore preventable by removal of the exposure, prev. frac. ex), and also the fraction of diseased in the whole POPULATION which is attributable to the exposure (and therefore preventable by removal of the exposure, prev. frac. pop). These fractions reflect the fact that an exposure is rarely the only cause for a disease – while smoking is undoubtedly a risk factor for lung cancer, it is not the only one: if everybody were a never-smoker, lung cancer would continue to occur, so the preventable fraction in smokers and the whole population will be <1. You can try to vary the case/control ratio and you’ll see that the preventable fractions (and the odds ratio) are always the same – they depend on the strength of the association, not on absolute frequencies of disease or exposure: ‘cci 35 42 63 41’ ‘cci 35 42 126 82’ ‘cci 35 42 189 123’”

Positive and negative confounding

Q: “Activity 4 says that use of oral contraceptive is a positive confounder since controlling for contraceptive decreases the OR: Crude OR = 0.54, MH OR = 0.95. I think controlling for confounding has increased the OR from 0.54 to 0.95, and it is therefore a negative confounder since use of oral contraceptive made the association weak (0.54) than it really is (0.95).”

A: “The strength of an association is expressed by the distance of the OR from unity (OR=1) - on the logarithmic scale! That means that an OR=2 and an OR=1/2=0.5 express the same strength of an association. An OR=3 marks a stronger association than an OR=2; an OR=0.33 marks a stronger association than an OR=0.5. Researchers are free to express an association by an OR above or below 1, as long as they interpret the OR correctly. Imagine a study which looks into the association of bottle-feeding and diarrhoeal disease in infants. Researchers found that bottle-feeding is a risk factor with an OR of 2 attached to it - they could also have said that breastfeeding is a protective factor with an OR of 0.5 attached to it: the message is the same. There is, however, a certain preference for ORs>1, because they are easier to be understood intuitively by colleagues less expert in epidemiology. Coming back to your example: If the crude OR is 0.54 and the adjusted OR is 0.95, then confounding has made the association look stronger than it is in reality, as the removal of confounding takes the OR towards unity. This is what we call ‘positive confounding’. A

30 Nov 2005 EP103 WebBoard FAQ.doc 34/39

Page 35: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

‘negative confounder’ is a variable which makes the association between exposure and outcome look weaker than it really is; the removal of negative confounding pushes the OR farther away from unity. Note that the feedback for activity 15.4 does NOT say “controlling for contraceptive use decreases the OR” - it correctly says: “controlling for contraceptive use decreases the effect of smoking on ovarian cancer” - by taking the OR closer to unity, increasing the OR numerically.”

Session 16: Screening

Screening and diagnostic tests

Q: “According to Hennekens, screening can not diagnose disease. In order to diagnose disease you have to do further tests. If this is so, are mammograms classified as screening or diagnostic tests?”

A: “As described in your EP103 manual on page 16.2, the term ‘screening’ is used in two situations: 1) detecting a disease in its early stage in people who have no symptoms, or detecting a risk factor for disease making them more likely than average to develop the disease; this is ‘screening’ in a more restricted, classic sense 2) identifying individuals who suffer from a specific disease but who are not known to the health services - for example, testing immigrants to certain countries for tuberculosis; this is what is called ‘case-finding’ From a clinician’s point of view, a mammogram is not, strictly speaking, a diagnostic test. A positive result indicates that a woman may have breast cancer, but she may not. This is because the findings in a mammogram are not specific enough for diagnosing cancer. Abnormal findings can also be due to benign tumours, or otherwise benign conditions such as fibrocystic breasts; such false positive results can occur with mammography as well as with other screening techniques for breast cancer (breast palpation, ultrasonography, etc). A positive finding in mammography is thus normally followed by fine needle aspirations or biopsy of the abnormal lesion and cytological/histological examination of the specimen to come to a diagnosis and decide on a treatment recommendation. Telling a woman that she has breast cancer is clearly a very serious matter, and one would have to have a high degree of certainty (positive predictive value) before using the results of a test to do so. A mammogram alone would not be good enough. The following papers are available from www.bmj.com for further reading: R W Blamey, A R M Wilson, and J Patnick. ABC of breast diseases: Screening for breast cancer. BMJ 2000; 321: 689-693. J M Dixon and R E Mansel. ABC of breast diseases: Symptoms assessment and guidelines for referral. BMJ 1994; 309: 722-726.”

30 Nov 2005 EP103 WebBoard FAQ.doc 35/39

Page 36: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

Parallel and serial testing

Q: “Re: Hennekens and Buring's discussion on p. 334 on parallel and serial testing. I've two questions/comments: 1. What about screening tests such as the double/triple/nuchal translucency test, antenatally, for problems such as Down's syndrome? This isn't a "parallel" test, as described by H&B, because ALL the factors are considered TOGETHER, to give you a "risk" score. 2. Aren't all screening programs, in effect, a form of serial testing? Okay, so the first one (or even two) tests, might not give you the diagnosis, but even "gold standard" tests are subject to limits of sensitivity and specificity, aren't they?”

A: “1) You are right that screening for Down syndrome as you describe it is slightly different from what Hennekens and Buring describe as parallel screening. In your example, several screening tests are used in parallel, and their results are “added up”; an individual is considered screening positive once a certain threshold is reached. In Hennekens’ deep vein thrombosis example, several tests are used in parallel, and an individual is considered screening positive if at least one screening test is positive. The principle is the same – in Hennekens’ example, the threshold is particularly low. (2) You are right that all screening tests, if positive, are followed by one or several diagnostic tests. However, Hennekens & Buring write about using several tests for screening, either in parallel, or in series – that is another matter, not to be confused with the relationship between screening and diagnostic tests. Note that screening tests, if positive, result in diagnostic tests, and not in the diagnosis, while diagnostic tests, if positive, lead to the diagnosis and hopefully treatment. Also note that “gold standard tests” certainly have their limitations, which we should remain aware of, but as they are the best tests available, treatment decisions will be based on them, and not on screening tests.”

Screening and case-finding

Q: “If you screen and find someone with asymptomatic disease, have you found a ‘case’, or are ‘cases’ only those with symptomatic disease?”

A: “Hennekens and Buring define screening as ‘the application of a test to people who are yet asymptomatic for the purpose of classifying them with respect to their likelihood of having a particular disease’ (1st ed., p. 327). According to this definition, you are looking for people who do not have symptoms of the disease you are interested in, but who may have either risk factors for that disease (e.g. high blood pressure as a risk factor for stroke and heart attack) or disease at a stage too early to be recognisable by the screened individual herself (e.g. early cervical carcinoma). A screening test does not usually diagnose illness - diagnostic tests are often too expensive or invasive to apply them on a large scale. People who test positive in the screening need to be sent on for further evaluation by a subsequent diagnostic test or procedure to determine whether they do in fact have the disease. If they do, you could say that your screening has identified a ‘case’ - someone who has the disease you are interested in. The term ‘case finding’, however, is used when the activity is directed at people who have the disease and are symptomatic but who are as yet unknown to the health services, e.g.

30 Nov 2005 EP103 WebBoard FAQ.doc 36/39

Page 37: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

individuals suffering from leprosy who have not come forward for treatment. Some authors may define ‘case finding’ as a ‘screening for symptomatic cases’, but more often ‘screening’ is understood to look for asymptomatic, ‘case finding’ for symptomatic individuals.”

Screening and ethics

Q: “I'm not sure how to interpret the comment about the ethics of using a randomised controlled trial (p. 16.10) to evaluate a screening programme ‘introduced despite a lack of evidence’, e.g. screening programmes for cervical cancer. Do I interpret this as the screening programmes for cervical cancer have been established without solid evidence of benefit but it would be unethical to withdraw screening (from the control group) because it is now an accepted practice?”

A: “Although the advantages of randomized controlled trials to assess the effectiveness of a screening programme are very well recognized, there are potential problems in using a randomised design, as described in the course notes. In the case of cervical cancer, we can imagine that it would appear unethical to physicians and potential study participants to participate in the random allocation of the screening test, the Pap smear, which is a very well established in medical practice, particularly as there is some evidence – albeit not from clinical trials – that it helps identify people with preclinical disease. As a result, the evaluation of screening for cervical cancer would need to be performed using other study designs.”

Q: “Would it be possible to do cohort studies where all females are offered screening test, then following the females who opt for screening test and the females who do not opt for screening test to see their outcomes work?”

A: “In situations such as these, you could decide to perform a cohort study or a case-control study, but in such non-randomised designs would be prone to bias, e.g. selection bias – for instance, women have experienced irregular vaginal bleeding, or whose mother/sister have suffered from cervical cancer, may be more or less likely to accept a screening test.”

Q: “In many countries cervical cancer screening is not offered as a routine care. However I do not think it would be ethical to do a RCT even in that situation?”

A: “The fact that cervical cancer screening is currently not offered in certain countries would not make it more acceptable to carry out an RCT over there to test the effectiveness of the screening, provided the screening is universally accepted to be beneficial on the basis of existing evidence, even if that evidence does not stem from earlier RCTs. Note that in certain circumstances, it is considered ethically acceptable by some to carry out an RCT in a developing country even if the same RCT would be unacceptable in a developed country. This could be the case if the RCT sets out to test the effectiveness of an intervention in a developing country which is thought or known to be inferior to standard treatment in a developed country, provided the standard treatment is economically beyond reach in a developing country setting. Examples are antiretroviral short course regimes for the prevention of mother-to-child transmission of HIV during pregnancy, or oral misoprostol for the prevention and treatment of post-partum haemorrhage.”

30 Nov 2005 EP103 WebBoard FAQ.doc 37/39

Page 38: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

Drawbacks of screening:

Q: “The first disadvantage of screening listed in the feedback to activity 11 on page 16.11 does not appear to be a disadvantage to me. It is about a screening that is not justified since early detection does not affect prognosis.”

A: “On page 16.3 you find a list of 10 criteria that should be fulfilled to decide whether or not screening is appropriate. The second criterion says “There should be an accepted treatment for patients with recognized disease, and early treatment should be of more benefit than later treatment”. If this criterion is not fulfilled, the screening is thought to be useless for the screened individual and therefore not justified: just extending the time during which the individual knows that he or she is likely to suffer one day from a certain disease, without being able to offer improved outcomes by early treatment, is thought to be useless at best, and possibly detrimental (see below). In general, before launching a screening campaign it is important to make sure that the screening is to be of benefit to the population. Indeed, inappropriate screening could cause harm. This harm can be classified into three categories: 1) Harm can come when we tell someone who feels well that he/she is sick. An asymptomatic person who previously felt well may now adopt the sick role and consider him/herself as fragile. If there is no treatment available, then this feeling might continue for the rest of his/her life, despite the fact that he/she may not be physically ‘sick’, thus causing psychological harm. In that case, it might be better not to know. This is the issue addressed by the first disadvantage of screening programmes in the feedback to activity 10, p. 6.11. 2) The screening result can be wrong. For example, the risk of obtaining a false-positive result becomes considerable when we are looking for rare diseases in asymptomatic patients. An example of this is screening for ankylosing spondylitis. The sensitivity and specificity of the test are quite good (90% and 95% respectively), but this disorder is so rare that for every 100 individuals with a positive screening test, 85 do not have the disease. Screening for this disorder would then leave lots of individuals with positive tests that needed to be followed-up, while most do not have the target disorder. In that case, it might be better not to screen. 3) The treatment initiated as a result of screening and early diagnosis may do more harm than good. An example is the widespread treatment of hyperlipidemia in the United States. In the 1960s and 70s, it was popular to screen for hyperlipidemia, and to treat with clofibrate when elevated values were found. As a result, it was not unusual to find healthy, asymptomatic, middle-aged men taking this drug as a result of having undergone a cholesterol check-up. Unfortunately, it was only after such screening plus clofibrate treatment had been going on for several years that a proper randomized trial of clofibrate was carried out in healthy middle-aged men with hypercholesterolemia: the results showed that mortality in clofibrate-treated men was 17% higher than among men with placebos, and this excess mortality continued for 4 years following the withdrawal of the drug! This suggests that there should be sound evidence that early diagnosis and available therapy do more good than harm before asking the public to submit to them. Finally, screening involves costs. As a result, if early detection does not lead to an

30 Nov 2005 EP103 WebBoard FAQ.doc 38/39

Page 39: EP103 WebBoard FAQdl.lshtm.ac.uk/programme/epp/docs/FAQs/EP103 FAQ.pdfEP103 WebBoard Frequently Asked Questions / Questions of Particular Interest Block 1 (session 1 to 3) Session

30 Nov 2005 EP103 WebBoard FAQ.doc 39/39

improvement in prognosis, it may be better not to spend money on screening, but on something more cost-effective.”