analyzing more general situations

ANALYZING MORE GENERAL SITUATIONS

UNIT 3

Unit Overview In the first unit we explored tests of significance,

confidence intervals, generalization, and causation mostly in terms of a single proportion.

In the second unit we compared two proportions, two averages (independent), and paired data. All of these had a binary categorical explanatory variable.

In the third unit we will expand on this to compare multiple proportions, multiple averages, and two quantitative variables. The explanatory variable will be categorical (not necessarily binary) or quantitative.

Unit Overview Throughout this unit we can still express

the null hypothesis in terms of no association between the response and the explanatory variable and the alternative hypothesis in terms of an association between the response and the explanatory variable.

We can also be more specific in our hypotheses to reflect the type of data (and hence parameters) we are working with.

Comparing More Than Two Groups Using

Proportions

Chapter 8

Chapter Overview In chapter 5, our explanatory variable had two

outcomes (like parents smoke or not) and the response variable also had two outcomes (like baby is boy or girl).

In this chapter we will allow the explanatory variable to have more than two outcomes (like both parents smoke, only mother smokes, only father smokes, or neither parent smokes).

We will still focus on the response variable having two outcomes, but it doesn’t have to. It can have many as well.

Section 8.1: Simulation-Based Approach to Compare Multiple Proportions

Example 8.1Coming to a Stop

Stopping Virginia Tech students investigated which

vehicles came to a stop at a intersection where there was a four-way stop.

While the students examined many factors for an association with coming to a complete stop, we will investigate whether intersection arrival patterns are associated with coming to a complete stop: Vehicle arrives alone Vehicle is the lead in a group of vehicles Vehicle is a follower in a group of vehicles

Stopping Null hypothesis: There is no association

between the arrival pattern of the vehicle and if it comes to a complete stop.

Alternative hypothesis: There is an association between the arrival pattern of the vehicle and if it comes to a complete stop.

Stopping Another way to write the null uses the

probability that a single vehicle will stop is the same as the probability a lead vehicle will stop, which is the same as the long-term probability that a following vehicle will stop

Or Single = Lead = Follow where is the probability a vehicle will stop.

The alternative hypothesis is that not all these probabilities are the same (at least one is different).

Stopping Percentage of vehicles

stopping: 85.8% of single vehicles 90.5% of lead vehicles 77.6% of vehicles

following in a group

Single Vehicle

Lead Vehicle

Following Vehicle

Total

Complete Stop

151 (85.8%)

38 (90.5%)

76 (77.6%)

265

Not Complete Stop

25 (14.2%)

4 (9.5%)

22 (22.4%)

51

Total 176 42 98 316

Stopping Remember that no association implies the

proportion of vehicles that stop in each category should be the same.

Our question is, if the same proportion of vehicles come to a complete stop in all the three categories, how unlikely would we get proportions at least as far apart as we did?

StoppingApplying the 3S Strategy

We need to find a statistic that will describe how far apart our proportions are from each other.

This is more complicated than when we just had two groups.

We also need to decide what types of values for that statistic (e.g., large or small, positive or negative) we would consider evidence against the null hypothesis.

Stopping To find a statistic we start by finding the

three differences in proportions.

Stopping How can we combine 3 differences

(-0.047, -0.082 and 0.129) into a single statistic?

Add them up? Average them? -0.047 + (-0.082) + 0.129 = 0. [-0.047 + (-0.082) + 0.129]/3 = 0.

What could we do so we don’t always get a sum of zero?

Stopping1. Statistic

We are going to use the mean of the absolute value of the differences (MAD)

(0.047 + 0.082 + 0.129)/3 = 0.086. What would have to be true for the average

of absolute differences to equal 0? What types of values of this statistic (e.g.,

large or small) would provide evidence in favor of the alternative hypothesis?

Stopping2. Simulate

If there is no association between arrival pattern and whether or not a vehicle stops it basically means it doesn’t matter what the arrival pattern is. Some vehicles will stop no matter what the arrival pattern and some vehicles won’t.

We can model this by shuffling either the explanatory or response variables. (The applet will shuffle the response.)

Stopping Use the Multiple Proportions Applet

Stopping The results of one shuffle of the response

Stopping Simulated values of the statistic for 1000

shuffles Bell-shaped? Centered at 0?

Stopping3. Strength of evidence

Do we use a 1-sided or 2-sided alternative hypothesis to compute a p-value? By finding the absolute values we have lost

direction in terms of which proportion is smaller than another.

When we are looking for more of a difference, the MAD statistic will be larger.

Hence to calculate our p-value we will always count the simulations that are as large or larger than the MAD statistic.

Stopping We had a p-value of 0.0854 so there is moderate

evidence against the null hypothesis Do the results generalize to intersections beyond

the one used? Probably not since intersections have different factors

that influence stopping Can we draw any cause-and-effect conclusions?

No, an observational study If we had stronger evidence of a difference in

groups, we could follow with pairwise tests to see which proportions are significantly different from each other. (We will save this until the next section.)

Recruiting Organ DonorsExploration 8.1

Theory-Based Approach to Compare Multi-Category Categorical Variables

Section 8.2

Theory based vs. Simulation basedJust as always:

Simulation based methods Always work Require the ability to simulate (computer).

Theory-based methods Avoid the need for simulation Additional ‘validity conditions’ must be met

Other Distributions The null distributions in the past few chapters

were bell-shaped and centered at 0. For these, we used normal and t-distributions to

predict what the null distribution would look like. In this chapter our null distribution was neither

bell-shaped nor centered at 0. We can, however, use a chi-squared distribution

to predict the shape of the null distribution. When doing this, we will not use the MAD

statistic, but a chi-square statistic.

Sham Acupuncture A randomized experiment was conducted

exploring the effectiveness of acupuncture in treating chronic lower back pain (Haake et al. 2007).

Acupuncture inserts needles into the skin of the patient at acupuncture points to treat a variety of ailments.

Sham Acupuncture 3 treatment groups 1. Verum acupuncture: traditional Chinese2. Sham acupuncture: needles inserted into the

skin, but not deeply and not at acupuncture points

3. Traditional, non-acupuncture, therapy of drugs, physical therapy and exercise.

1162 patients were randomly assigned to each treatment group

387 patients in groups 1 and 2, and 388 in group 3.

Sham Acupuncture

Null hypothesis - no association between type of treatment received and reduction in back pain. Alternative hypothesis - there is an association between type of treatment and reduction in back pain.

Sham Acupuncture Here are the results.

Real Acup.

Sham Acup.

Non-Acup.

Total

Pain reduced 184 (0.475)

171 (0.442)

106 (0.273)

461

Pain not reduced

203 (0.524)

216 (0.558)

282 (0.776)

701

Total 387 387 388 1162

Sham Acupuncture Remember that the MAD statistic is the

mean of the absolute value of differences in proportions for all 3 groups:

Real vs. Sham: 0.476-0.442=0.034Real vs. None:0.476-0.274= 0.202Sham vs. None: 0.442-0.274= 0.168

The statistic (MAD) is (0.034+0.202+0.168)/3=0.135

Larger MAD statistics give more evidence against the null

Sham AcupunctureThe other statistic we saw, chi-square is calculated using the formula.

is the proportion of successes in category i is the overall proportion of successes in

the dataset. is the sample size of category i

2)ˆˆ()ˆ1(ˆ

1 ppnpp ii

Sham Acupuncture

Don’t worry about the formula. Just realize it is another way to measure how far apart the proportions are from each other (or the overall proportion of successes).

Just like the MAD statistic, the larger the chi-square statistic the more evidence there is against the null.

Let’s test this in the Multiple Proportions Applet using both MAD and chi-square simulation.

Sham AcupunctureStrength of evidence:

In both cases, we got p-values of 0. Nothing as large as a MAD statistic of 0.135 or

larger ever occurred in this simulation. Likewise nothing as large as a chi-square

statistic of 38.05 or larger ever occurred in this simulation.

Hence we have very strong evidence against the null and in support of the type of acupuncture used is associated with pain reduction.

Sham Acupuncture The theory-based method will predict the chi-

square null distribution. We can use the applet to overlay a

theoretical chi-square distribution and find the theory-based p-value.

The theory based method only works well when each cell in the 2-way table has at least 10 observations.

This is easily met here. The smallest count was 106.

Results

Sham Acupuncture What if you find evidence of an

association? What happens after a chi-squared test?

No standard approach to follow Could describe the association with confidence

intervals on each of the differences in proportions.

Sham Acupuncture 95% confidence intervals (found in applet) on

the difference improvement percentages comparing: Real to sham (-0.0366, 0.1038) Real to none (0.1356,0.2689)* Sham to none (0.1022, 0.2351)*

There is evidence that the probability of pain reduction is different (lower) for no acupuncture treatment than the other two.

There is no significant difference between real and sham acupuncture however.

Let’s see all this in the applet.

Exploration 8.2: Conserving Hotel Towels Hotels are encouraging guests to practice

conservation by not having their towels washed.

(Goldstein et al., 2008) conducted a randomized experiment to investigate how different phrasings on signs placed on bathroom towel racks impacted towel reuse behavior.

Researchers were interested how messages which communicated different types of “social norms” impacted towel reuse.

Exploration 8.2: Conserving Hotel Towels Rooms at a single

hotel were randomly assigned to receive 1 of 5 messages on the sign on the towel bar

Is there an association between the message left and whether or not towels get reused?

analyzing more general situations

Documents

response variable

proportion of vehicles

unit overviewin

stoppingpercentage of

alternative hypothesis

unit overviewthroughout

complete stop25

single proportion