confounders and bias

CONFOUNDING AND BIAS

Dr. Binaya SapkotaBPharm, PharmD

• It is essential to avoid bias, to control confounding & to undertake accurate replication.

• Bias, confounding & random variation are the non-causal reasons for an association between an exposure & outcome.

• Bias: Mistake of the researcher• Confounding: Not a mistake & when obvious, it can be

controlled; sometimes referred to as the 3rd major class of bias.

Confounding

• Latin: “confundere” = to mix/blend

• Mixture of an effect of exposure on outcome with theeffect of a 3rd factor.

Confounding

Recognition of Confounders

• Not easy to recognise confounders. • A practical way to achieve this is to analyze data with

& without controlling for the potential confounders.• If the estimate of the association differs remarkably

when controlled for the variable, it is a confounder & should be controlled for (by stratification or multivariate analysis).

Consequence of confounding

• Estimated association is not the same as true effect.• Can cause overestimation or underestimation of the

true association.• May even change the direction of the observed effect.

Types of Confounders

• Positive or negative. 1. Positive confounders: cause over-estimation of an association (which may be an inverse association)

2. Negative confounders: cause under-estimation of an association.

Control of Confounders

A. Controlling by Design:1. Randomization2. Restriction3. Matching

B. Controlling by Analysis:1. Stratified analysis2. Multivariable analysis

1. Randomization

• Ensures that potential confounding factors (known or unknown) are evenly distributed among the study groups.

• Also ensure that all participants remain in the group to which they were allocated & no systematic loss to F/U occurs.

• To avoid this possibility, maximise F/U & carry out an intention to treat analysis.

2. Restriction

• Restricts admission to the study to a certain category of a confounder i.e. enroll only people without confounder.

• Only allow into study if fall into specific groups.• In practice, restriction ensures comparisons to be

performed only between observations that have the same value of the confounder (for e.g., only white men).

3. Matching

• Equal representation of subjects with certain confounders among study groups

• Ensures comparisons between groups that have the same distribution of the confounder (frequency matching or one-to-one matching).

• i.e. for every person with a factor in case have person without in controls.

• If matching was done, then the most common analysis methods would be McNemar test or conditional logistic regression.

4. Stratified analysis

• Make groups homogenous

• Controlling confounding throughstratified analysis:

Smokers Non-Smokers MI No

MIMI No MI

Coffee 80 40 10 20No

Coffee20 10 40 80

Total 100 50 50 100 OR = 1 OR = 1

5. Multivariable analysis

• Most popular

• Can control for a no. of confounding factors simultaneously as long as there are at least 10 subjects for every variable investigated -in a logistic regression situation.

• Defined as any systematic error in a study that results in an incorrect estimate of the association between exposure & risk of disease.

• Caused by systematic variation.• “Deviation from the truth”• Undermines internal validity of research

• Henneken: A systematic error in the collection or interpretation of data in an epidemiologic study.

• Schlesselman & Stolley (1982): Any systematic error in the design, conduct or analysis of a study resulting in a mistaken estimate of an exposure effect.

Effects of Bias

• Bias may mask an association or cause a spurious one; & it may cause over- or under-estimation of effect size.

• A study that suffers from bias lacks internal validity.

1. Positive bias: • Observed measure of effect (eg, odds ratio) > true

measure of effect • Applies to both protective & risk associations.

2. Negative bias: Observed measure of effect < true measure of effect

3. Toward the null: Observed measure of effect is closer to 1 than the true measure of effect.

4. Away from the null: Observed measure of effect is farther from 1 than the true measure of effect.

Control of Bias

• Potential sources of bias should be eliminated or minimized through rigorous design & meticulous conduct of a study.

Most important biases:Prevalence-incidence bias• For diseases such as cancer, HIV infection, the use of

prevalent instead of incident cases usually distorts the measure of effect.

• Incidence: all new cases of disease in a time period; tend to be acute.

• Prevalence: existing cases of disease at one point in time; tend to be chronic.

Detection bias

• Risk factor investigated itself may lead to increased diagnostic investigations & increase the probability that the disease is identified in that subset of persons.

• e.g. Women with benign breast diseases who undergo detailed F/U programs which would detect cancer at early stages.

Recall bias

• Caused by differences in accuracy of recalling past events by cases & controls.

• There is a tendency for diseased people (or their relatives) to recall past exposures more efficiently than healthy people.

• Particular problem in case-control studies• e.g., women with breast cancer are more likely to remember a +ve

family history than control subjects.• e.g. mothers of leukemic children would remember even trivial

exposures.• This bias is avoided by prospective studies.

Publication bias:• Editors & authors tend to publish articles containing +ve findings as

opposed to -ve result papers.

Diagnosis Bias:• Knowledge of exposure may influence diagnosis.

Follow-up bias:• In cohort studies, bias due to loss to follow-up (attrition): Greatest

danger

Selection bias

• Method of participant selection that distorts the exposure-outcome relationship from that present in the target population

• Knowledge of exposure status influences the identification of diseased & non-diseased study subjects.

• An imp. problem in case-control & retrospective cohort studies

• Unlikely to occur in a prospective cohort study.

Information (Observation/Classification/ Measurement) bias

• Occurs when information is collected differently between 2 groups, leading to an error in the conclusion of the association.

• When information is incorrect, there is misclassification.i. Differential misclassification: occurs when level of misclassification differs between 2 groups.

ii. Non-differential misclassification: occurs when level of misclassification does not differ between 2 groups.

Bias and confounding

• Not affected by sample size but chance effect (random variation) diminishes as sample size gets larger.

• Bias involves error in measurement of a variable.• Confounding involves error in interpretation of what

may be an accurate measurement.• Bias is found in the design/conduct of study.• Confounding is found in nature of the study.

Blinding or masking

• Open• Single Blind• Double BlindqDouble Dummy

• Triple Blind

• Blinding is used to reduce bias.

Blinding or masking

• Limits the occurrence of conscious & unconscious bias in the conduct & interpretation of a “CT” arising from the influence that the knowledge of Rx may have on recruitment & allocation of subjects, their subsequent care, attitudes of subjects to Rxs, assessment of end-points, handling of withdrawals, exclusion of data from analysis.

Open (unblinded)

• In an open-label trial the identity of Rx is known to all.

• Still commonly used for marketing purposes• But have little scientific merit.• Blinded trials often become unblinded if Rx has very

prominent beneficial effects or adverse effects.

Single blind

• Single blind trials mean the investigator will know the Rx but the subject does not.

Double blind

• Both the investigator & subject are not aware of Rx.• Optimal approach• This requires that Rxs to be applied during the trial cannot be

distinguished (by appearance, taste, etc.) either before or during administration, & that the blind is maintained during the whole trial.

Double dummy trials

• Used when 2 physically different Rxs are compared.• e.g. tablet & inhaler Rxs for asthma.

Triple blind trials

• Nobody ever knows what Rx was given.

• This is very hard to prevent or adjust for in the analysis.

Double dummy technique: Historical Perspective

• Dudley Hart and Boardman, at the Westminster Hospital in London, UK, compared indomethacin and phenylbutazone in the Rx of “RA”.

• Their objective: “To provide double-blind conditions they received active Indomethacin & dummy Phenylbutazone in 1 month, & in the other active Phenylbutazone & dummy indomethacin.”

Double dummy

• A technique for retaining the blind when administering supplies in a “CT”, when 2 Rxs cannot be made identical.

• Supplies are prepared for Rx A (active & indistinguishable placebo) & for Rx B (active & indistinguishable placebo).

• Subjects then take 2 sets of Rx; either A (active) & B (placebo), or A (placebo) & B (active).

Major advantages of Randomization

(1) Randomization eliminates selection bias on the part of the participants & investigators.

(2) Randomization tends to create groups that are comparable in terms of the distribution of known &, more importantly, unknown factors that may influence the outcome.

Methods of randomization

1. Simple randomization2. Restricted randomization (or blocked

randomization)3. Stratified randomization4. Matched-pair design

1. Simple randomization

• Most elementary method of randomization• Equivalent of tossing a coin• However, randomization by tossing a coin should not

be used because it cannot be checked or reproduced.• Alternative is to use a table of random no. (or a

computer-generated randomization list).

Steps in determining random group assignments

1st step: To set up a correspondence between no. in the table & the study groups. • Let us assume that odd no. correspond to the control

group & even no. to the new intervention.

2nd step: To define a convenient way of reading the table of random no., for e.g., to read down the columns or across the rows.

• 3rd step: To select a starting point, for e.g., by closing your eyes & selecting a no. with a pin.

• Once the starting point is established, no. are then read from the table following the sequence defined in step 2.

• Suppose that the chosen starting point was the one circled in the table & that we have decided that no. should be read column by column down the page.

• The first 10 no. would have been 8, 9, 3, 5, 7, 5, 5, 9, 1, 0.

• 4th step: To make Rx assignments a/c the system defined above.

• Random no. tables are generated in such a way that each of the digits 0 through 9 is equally likely to occur.

Extract from a table of random number

Table of random numbers

Example illustrating use of a table of random no. to allocate 10 subjects to 2 study groups

• In the case of 3 groups, 3 of the 10 one-digit no. are assigned to each group (e.g., no. 1, 2, 3 to group A; 4, 5, 6 to group B; & 7, 8, 9 to group C).

• The remaining no. (i.e., 0 in this example) in the random tables is ignored & selection moves to the next no.

2. Restricted randomization (Blocked randomization)

• This method guarantees that no. of participants allocated to each study group are equal after every block of so many pts has entered the trial.

• Suppose that pts are going to be allocated to Rxs A & B in such a way that after every 4th subject there are an equal no. of participants on each Rx.

• There are only 6 possible combinations (permutations) of A & B in blocks of 4:

Number Combination1 AABB2 ABAB3 ABBA4 BBAA5 BABA6 BAAB

• Combination for a particular block of 4 pts is chosen at random (by using a table of random no. as described above) from the 6 possible (in the above example, digits 7, 8, 9 & 0 from the table of random no. should be ignored).

• For e.g., if the random no. from the table were 2, 3, 6, 5 (& the blocks were assigned as listed), it would mean that pts 1–4, 5–8, 9–12, 13–16 would receive Rxs ABAB, ABBA, BAAB & BABA, respectively.

• This procedure thus allocates 8 pts to group A & 8 to group B.

Stratified randomization

• Used when the results of the trial are likely to vary between, say, different age gropus or genders.

• Strata or groups are formed & randomization occurs separately for subjects in each stratum.

• As subjects become eligible for inclusion in trial, their appropriate stratum is determined & they receive next random-no. assignment within that stratum.

• For e.g., pts may be classified a/c gender & age (<50, & ≥50), yielding a total of 4 strata.

• Within each stratum, each pt. will be randomly assigned to either intervention or control group.

• This could be done by using either simple or restricted randomization.

Matched-pair design

• Special case of stratified randomization in which the strata are each of size 2.

• Individuals (or communities) are matched into pairs, chosen to be as similar as possible for potential confounding variables such that in the -ce of any intervention they would be expected to be at similar risk of the disease under study.

• The intervention is assigned at random to one member of each pair, with the other member acting as a control.

Levels of measurement (Scales of measurement)

• Ways in which variables/numbers are defined & categorized.• Developed by the Harvard psychologist Stanley Smith (S.S.)

Stevens in the early 1940’s. • Each scale of measurement has certain properties which in

turn determines the appropriateness for use of certain statistical analyses.

• 4 scales of measurement: nominal, ordinal, interval, & ratio.

Nominal (Classificatory) scale

• Categorical data & no. that are simply used as identifiers or names represent a nominal scale of measurement.

• Also sometimes called categorical scales or dichotomous scales (when there are only 2 categories).

• Each subgroup has a characteristic/property which is common to all classifies within that subgroup.

• e.g., Gender: Male, Female• e.g., Religions: Christian, Islam, Hindu, etc.

Ordinal or Ranking scale

• It has the characteristics of a nominal scale. PLUS subgroups arranged in ascending or descending order.

• Represents an ordered series of relationships or rank order.

• 1st, 2nd, & 3rd place represent ordinal data.• Likert-type scales also represent ordinal data.• Likert items: 1st introduced by Rensis Likert (1932).

Ordinal or Ranking scale Examples

• Income: Above average, average and below average• Socioeconomic status: Upper, middle and low• Attitudinal: Strongly favorable, favorable, uncertain,

unfavorable & strongly unfavorable• IQ scores: because IQ of 160 is not just as different

from an IQ of 130 as an IQ of 100 is different from an IQ of 70.

Interval Scale

• It has all the characteristics of an ordinal scale (which also includes a nominal scale). PLUS It has a unit of measurement with an arbitrary starting & terminating point.

• Eg., Temperature: Celcius (00C); Fahrenheit (320F)

Ratio Scale

• It has all the properties of an interval scale. PLUS It has a fixed starting point (eg., a zero point).

• Much more sensitive than the nominal categories.

• e.g., Measurement of income, age, height & weighte.g., “How many hours in an average week do you devote to reading political news and commentary?”

Choice of Levels of Measuerement

• Qualitative research uses descriptive statements to seek answers to research “Qs”.

• Quantitative research uses measurement scales (nominal, ordinal, interval or ratio) to seek answers to research “Qs”.

• Interval & ratio scales are more refined, objective & accurate.• Nominal & ordinal scales are subjective & not accurate as

they do not have a unit of measurement.

Choosing the Level of Measurement

• Nominal & Ordinal data: Referred to as nonparametric & can not be added.

• Interval & Ratio data: Sometimes referred to as parametric

• For e.g., "average" amount of pain that a person reports on a Likert-type scale over the course of a day computed by adding the reported pain levels taken over the course of the day & dividing by the no. of times the “Q” was answered.

Parametric data

• Meet certain requirements “wrt” parameters of population (for e.g., data will be normal - the distribution parallels the normal or bell curve).

• No. can be added, subtracted, multiplied, & divided.• Analyzed using statistical techniques identified as Parametric

Statistics.• More statistical technique options for the analysis of

parametric data• Parametric statistics more powerful than nonparametric

statistics.

Nonparametric data

• Lacking those same parameters • Can not be added, subtracted, multiplied, & divided.• Analyzed by using Nonparametric Statistics• e.g., Ordinal data can not be added together 1st & 2nd

place in a race - one does not get 3rd place.

confounders and bias

Education

estimation bias and seer tempering - galorath,...

controlling for unobserved confounders in …controlling for...

bias and confoundingpublicifsv.sund.ku.dk/~nk/epie12/sf bias...

time-varying confounding and marginal structural model ·...

bias assessment of observational studies · what can go...

intake of free sugars, chronic metabolic diseases and...

bias and confounding

confounders in tissue dielectric...

causal criteria bias & confounders · • bias &...

bias and confounding -...

heuristics and bias

1.1.3 forward bias and reverse bias

racial bias and in-group bias in judicial decisions:...

bias and validity

self-bias, time-bias, and the metaphysics of self and...

lec_23 effect of substrate bias vb and channel bias vc

attention bias and cbt 1 attention bias dynamics and ... ·...

bias ampliﬁcation and bias unmasking - joel a....

recognizing and eliminating bias in the legal profession...

hierarchical priors for bias parameters in bayesian...