eval 6970: experimental and quasi- experimental designs dr. chris l. s. coryn kristin a. hobson fall...

EVAL 6970:Experimental and Quasi-

Experimental DesignsDr. Chris L. S. Coryn

Kristin A. HobsonFall 2013

Agenda

• Randomized experiments

Important Caveats

Caveats

• Not every phenomenon of interest or value can be studied experimentally–Many variables of interest cannot be

manipulated or isolated in the way required for experiments • Most trait variables, particularly including

gender, race, and ethnicity– These types of variables can still be the

subject of cause-probing studies, but cannot be manipulated in the formal sense

Caveats

• Many phenomena of interest also cannot be manipulated or isolated for ethical reasons – Can’t withhold potentially effective

treatments from participants• Example: Tuskegee syphilis study

– Can’t assign participants to potentially harmful conditions• Physiologically - require participants to smoke

or expose them to a pathogen• Psychologically – Stanford prison experiments

Theory of Random Assignment

Random Assignment

• Random assignment is any procedure by which units are assigned (selected to) conditions based only on chance– Each unit has a known, nonzero probability of being assigned to

a condition

• This method of assignment reduces the plausibility of many alternative explanations for observed effects—particularly selection– By definition, randomization rules out selection threats; random

chance cannot introduce systematic bias into the selection process

– However, this works best with large samples

• Random assignment attempts to distribute systematic differences (biases) equally over groups on every variable, whether observed or not– This is why random assignment is superior to even pretesting

with statistical matching

Random Assignment

• Unlike other controls for validity threats (like pretests and nonequivalent dependent variables), random assignment yields unbiased estimates of average treatment effects – Here, unbiased means that any between-group

differences are due solely to chance, rather than systematic sources of error

– Regression discontinuity also yields unbiased effect estimates, but randomized experiments are more flexible, and their analysis is often more straightforward

Random Assignment versus Random Sampling• Random assignment is not the same thing as

random sampling; the two procedures serve entirely different purposes

• Random sampling– Places units from the population into the sample– Makes a sample more representative of the population– Strengthens external validity

• Random assignment– Places units from the sample into treatment conditions– Make samples equivalent to each other– Strengthens internal validity

Why Randomization Works

• Reduces plausibility of threats to validity by distributing them randomly over conditions– It equates groups on the expected value of all

variables at pretest, regardless of whether those variables are measured

– It allows the selection process to be completely known and completely modeled. This property is unique to randomized experiments and regression discontinuity designs.

• Allows valid estimation of error variance that is orthogonal to treatment

• It ensures that alternative causes are not confounded with a unit’s treatment condition

Why Randomization Works

• Groups are equated before treatment, eliminating pretest selection differences as a plausible cause of posttest differences

• The posttest of the control group serves as a very good counterfactual for the treatment group posttest

• Threats are randomly distributed over conditions, so both control and treatment units have the same average characteristics

• The only remaining systematic difference between conditions is treatment– Note that random assignment equates groups on

expectation

Randomization Doesn’t Fix Everything• First and foremost!

– Randomization works best in large samples. The smaller the sample, the more likely that significant differences remain between groups

• Attrition is the largest threat to randomized experiments (as selection is to quasi-experiments)– Attrition is often differential; there are usually differences

between those who remain in a study and those who drop out

• Randomization does nothing for maturation effects, and it cannot prevent the possibility of historical events affecting groups (likewise, pretests can still cause a testing effect, and changes in instrumentation can still occur)– However, random assignment does reduce the likelihood that

these threats are confounded with treatment effects

Randomization Doesn’t Fix Everything• Randomization can also indirectly affect the

required amount of statistical power, because attrition reduces the number of units that remain in a study– A priori power analysis will provide information about

how many units are necessary to achieve a minimum detectable effect size (MDES)

• Oversampling can help avoid loss of power due to attrition– As a general guideline, include 25%-50% more

participants than would be required for minimum power… so that when you lose participants, you can still detect the expected effect

Randomization and Units

• A unit can be viewed as an opportunity to apply or withhold treatment

• Units can be individuals (like people or animals) or higher order aggregates (like families, job sites, or classrooms)– It is often easier to obtain required power with

individual units. Higher-order or nested units sometimes require larger sample sizes, because power is based on the unit of randomization

• If the higher order unit were classrooms, for instance, increasing power requires a larger number of classrooms to increase power

Limitations of Randomization

• Randomized experiments are often considered the gold standard of cause-probing studies

• However, randomized experiments are most useful for answering questions about local molar causation

• The valid generalization of results from randomized experiments relies on correspondence between the units sampled and the population of interest

Basic Designs

Basic Designs

R X O

R O

Basic design

Two treatments

R XA O

R XB O

Two treatments and a control

R XA O

R XB O

R O

Good if treatment A is known to be effective, otherwise no way to determine if both were equally effective or ineffective

• Note that none of these designs use pretests

• Why would you skip pretests?– There might be a concern over

sensitization (i.e., a testing effect)

– Administration might be unfeasible

– The variable of interest might be a constant, as in studies of mortality (all patients are alive at the start)

• Why use pretests?– Pretests allow you to study

attrition!– Do those who drop out of one

condition differ from those who drop out of another?

Although groups are assumed to be equated…the problem is…

Basic Designs

R O X O

R O O

Pretest-posttest

Alternative-treatments

R O XA O

R O XB O

Two treatments and a control

R O XA O

R O XB O

R O O

This design allows investigators to explore attrition. It also results in increased power, by using pretests as covariates in ANCOVA

These designs can be used for dismantling studies (study of specific components or parts of a treatment)

They are also used for dose-response studies (differing doses of the same treatment)

Factorial Designs

• In a factorial design, two or more independent variables (factors) are investigated concurrently– Each factor must have at least 2 levels

(treatment/control, low dose/high dose, etc.)– The number of factors and levels within factors

determine the number of cells in the design• There are 4 cells in a 2 x 2 factorial design• There are 8 cells in a 2 x 2 x 2 design• There are 12 cells in a 3 x 2 x 2 design

– The main advantage of factorial designs is that the joint contribution of two or more independent variables can be simultaneously studied (rather than requiring two or more separate studies)

Basic Factorial Design

R XA1B1 O

R XA1B2 O

R XA2B1 O

R XA2B2 O

Basic factorial design

Factor B

Level 1 Level2

Level 1Cell

A1B1Cell

A1B2

Row Mean for A1

Factor A

Cell A2B1

Cell A2B2

Row Mean for A2

Level 2

Column Mean for

B1

Column Mean for

B2

2 x 2 factorial design

Factor A (Level 1 and Level 2)Factor B (Level 1 and Level 2)

Results in four cells: A1B1, A1B2, A2B1, and A2B2

Notation for Factorial Designs

The number of numbers here is the number of factors in the design.

The numbers themselves indicate the number of levels in each factor.

3

32 4

2 X 3 X 4

Main Effects and Interactions

• In factorial designs we also discuss main effects and interactions– In a 2 x 2 design there are two main effects (one for

Factor A, and one for Factor B) and one interaction (Factor A x Factor B)

• Main effects reflect the separate treatment effects of one independent variable (i.e., factor) averaged over the levels of other independent variables

• Interactions occur when treatment effects are not constant, but vary over levels of another factor– The interaction of one factor with another is sometimes

referred to as a moderator

Example

• As an example, consider a 2 x 3 factorial design– Factor A is Gender

• There are 2 levels: male and female

– Factor B is Age• There are 3 levels: young, middle-aged, and old

– The outcome variable is performance on a mathematical aptitude test

– Is this a randomized experiment? What would random assignment of participants to conditions look like?

• If the performance of the male group differs as a function of age (that is, males performed worse as age increases), but the performance of the female group is consistent across age groups, then there is an Age x Gender interaction

Longitudinal Designs

R O . . . O X O O . . . O

R O . . . O O O . . . O

• Allows investigators to study how effects change over time

• Adds power to small sample sizes

• However– Attrition is a serious

problem– It can be unethical

to withhold effective treatment for a long period of time

Similar to a time-series, but with fewer pretest and posttest observations Can be used to study different

outcomes over time that are causally related

aspirations → expectations → achievement → educational success → quality of life (e.g., income, status)

Crossover Designs

R O XA O XB O

R O XB O XA O

• Allows counterbalancing and assessment of order effects

• The effects of the first treatment must dissipate before another begins (otherwise, future treatments is confounded)

• This is essentially a variation of the factorial design

A variant of the Latin squares design, in which all units and all possible orders of a treatment are presented in a within-subjects design

After the first posttest, units cross over to receive treatment they did not previously get

If there were three treatment conditions (A, B, and C) then there would be 6 possible orders (ABC, ACB, BAC, BCA, CAB, and CBA), so subjects would be divided into 6 groups

Factors Conducive to Randomized Designs1. Demand for a treatment outstrips the supply2. An innovation cannot be delivered to all units at once3. Experimental units can be temporally isolated4. Experimental units are spatially/geographically separated,

or communication between units is otherwise low5. Change is mandated, but the quality or effectiveness of

solutions is unknown6. A tie can be broken, or ambiguity about need can be

resolved7. Some persons (participants) express no preference among

alternatives8. Investigators can create their own organization9. Investigators have control over experimental units10. Lotteries are an expected portion of treatment

Inhibiting Factors

1. Randomized experiments take a lot of time, in both design and execution (a time frame of several years from conceptualization to results is not unusual)– Policymakers and other stakeholders often need answers

now

2. Randomized experiments provide very precise and valid answers about whether a treatment is effective, at substantial cost– Policymakers and other stakeholders may not need such

precise answers

3. Randomized experiments can only provide answers to a fairly narrow set of questions, and the investigator must be able to actively manipulate treatment– Many questions of interest to policy and decision makers

are not necessarily causal or cannot be manipulated

Inhibiting Factors

4. Before a randomized experiment is conducted, investigators must demonstrate (have evidence for, have a reasonable expectation of) all of the following:– Present conditions need improvement– The proposed improvement is of unclear value, or

there are several changes whose relationship is unclear

– The results of the experiment would clarify the situation

– The results would be used to change the policy or practice relating to present conditions

– The rights of participants will be protected throughout the process

eval 6970: experimental and quasi- experimental designs dr. chris l. s. coryn kristin a. hobson fall...

Documents

method of assignment

straightforward slide

internal validity slide

treatment conditions

statistical matching

formal sense slide

validity threats

trait variables