unit 1 overview significance – how strong is the evidence of an effect? (chapter 1) estimation...
TRANSCRIPT
![Page 1: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/1.jpg)
Unit 1 Overview Significance – How strong is the evidence
of an effect? (Chapter 1) Estimation – How large is the effect?
(Chapter 2) Generalization – How broadly do the
conclusions apply? (Chapter 3) Causation – Can we say what caused the
observed difference? (Chapter 4)
![Page 2: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/2.jpg)
Chapter 1
Significance: How Strong is the Evidence?
![Page 3: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/3.jpg)
Section 1.1: Introduction to Chance Models Organ Donation Study
78.6% in neutral group agreed 41.8% in the opt-in group agreed
The researchers found these results to be statistically significant.
This means that if the recruitment method made no difference in the proportion that would agree, results as different as we found would be unlikely to arise by random chance.
![Page 4: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/4.jpg)
Dolphin Communication Can dolphins communicate abstract ideas? In an experiment done in the 1960s, Doris was
instructed which of two buttons to push. She then had to communicate this to Buzz (who could not see Doris). If he picked the correct button, both dolphins would get a reward.
What are the observational units and variables in this study?
![Page 5: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/5.jpg)
Dolphin Communication
In one set of trials, Buzz chose the correct button 15 out of 16 times.
Based on these results, do you think Buzz knew which button to push or is he just guessing?
How might we justify an answer? How might we model this situation?
![Page 6: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/6.jpg)
Modeling Buzz and Doris Flip Coins One Proportion Applet
![Page 7: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/7.jpg)
Simulation vs. Real Study
coin flip = guess by Buzz
heads = correct guess
tails = wrong guess
chance of heads = ½ =
probability of correct button when Buzz is just guessing
one set of 16 coin flips =
one set of 16 attempts by Buzz
![Page 8: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/8.jpg)
Three S Strategy Statistic: Compute the statistic from the
observed data. Simulate: Identify a model that represents a
chance explanation. Repeatedly simulate values of the statistic that could have happened when the chance model is true and form a distribution.
Strength of evidence: Consider whether the value of the observed statistic is unlikely to occur when the chance model is true.
![Page 9: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/9.jpg)
Buzz and Doris Redo Instead of a canvas curtain, Dr. Bastian
constructed a wooden barrier between Buzz and Doris.
When tested, Buzz pushed the correct button only 16 out of 28 times.
Are these results statistically significant? Let’s go to the applet to check this out.
![Page 10: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/10.jpg)
Exploration 1.1: Can Dogs Understand Human Cues? (pg. 1-12) Dogs were positioned 2.5 m from experimenter. On each side of the experimenter were two cups. The experimenter would perform some human cue
(pointing, bowing or looking) towards one of the cups. (Non-human cues were also done.)
We will look at Harley’s results.
![Page 11: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/11.jpg)
Section 1.2: Measuring Strength of Evidence
In the previous section we preformed tests of significance.
In this section we will make things slightly more complicated, formalize the process, and define new terminology.
![Page 12: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/12.jpg)
We could take a look at Rock-Paper-Scissors-Lizard-Spock Scissors cut paper Paper covers rock Rock crushes lizard Lizard poisons Spock Spock smashes scissors Scissors decapitate lizard Lizard eats paper Paper disproves Spock Spock vaporizes rock (and as it always has) Rock crushes scissors
RPS
![Page 13: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/13.jpg)
Rock-Paper-Scissors Rock smashes scissors Paper covers rock Scissors cut paper Are these choices used
in equal proportions (1/3 each)?
One study suggests that scissors are chosen less than 1/3 of the time.
![Page 14: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/14.jpg)
Rock-Paper-Scissors Suppose we are going to test this with 12
players each playing once against a computer.
What are the observational units? What is the variable? Even though there are three outcomes, we
are focusing on whether the player chooses scissors or not. This is called a binary variable since we are focusing on 2 outcomes (not both necessarily equally likely).
![Page 15: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/15.jpg)
Terminology: Hypotheses When conducting a test of significance,
one of the first things we do is give the null and alternative hypotheses.
The null hypothesis is the chance explanation.
Typically the alternative hypothesis is what the researchers think is true.
![Page 16: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/16.jpg)
Hypotheses from Buzz and Doris Null Hypothesis: Buzz will randomly pick
a button. (He chooses the correct button 50% of the time, in the long run.)
Alternative Hypothesis: Buzz understands what Doris is communicating to him. (He chooses the correct button more than 50% of the time, in the long run.) These hypotheses represent the parameter (long run behavior) not the statistic (the observed results).
![Page 17: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/17.jpg)
Hypotheses for R-P-S in words Null Hypothesis: People playing Rock-
Paper-Scissors will equally choose between the three options. (In particular, they will choose scissors one-third of the time, in the long run.)
Alternative Hypothesis: People playing Rock-Paper-Scissors will choose scissors less than one-third of the time, in the long run.Note the differences (and similarities) between these hypotheses and those for Buzz and Doris.
![Page 18: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/18.jpg)
Hypotheses for R-P-S using symbols
H0: π = 1/3 Ha: π < 1/3
where π is players’ true probability of throwing scissors
![Page 19: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/19.jpg)
Setting up a Chance Model Because the Buzz and Doris example had a
50% chance outcome, we could use a coin to model the outcome from one trial. What could we do in the case of Rock-Paper-Scissors?
![Page 20: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/20.jpg)
Same Three S Strategy as Before Statistic: Compute the statistic from the
observed data. [In a class of 12 students, 2 picked scissors. This sample proportion can be described using the symbol (p-hat)].
Simulate: Identify a model that represents a chance explanation. Repeatedly simulate values of the statistic that could have happened when the chance model is true and form a distribution.
Strength of evidence: Consider whether the value of the observed statistic is unlikely to occur when the chance model is true.
![Page 21: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/21.jpg)
Applet We will use the One Proportion Applet for
our test. This is the same applet we used last time
except now we will change the proportion under the null hypothesis.
Let’s go to the applet and run the test. (Notice the use of symbols in the applet.)
![Page 22: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/22.jpg)
Null Distribution
![Page 23: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/23.jpg)
P-value The p-value is the proportion of the
simulated statistics in the null distribution that are at least as extreme (in the direction of the alternative hypothesis) as the value of the statistic actually observed in the research study.
We should have seen something similar to this in the applet
Proportion of samples: 938/5000 = 0.1876
![Page 24: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/24.jpg)
What can we conclude? Do we have strong evidence that less than
1/3 of the time scissors gets thrown? How small of a p-value would you say
gives strong evidence?
Remember the smaller the p-value, the stronger the evidence against the null.
![Page 25: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/25.jpg)
Guidelines for evaluating strength of evidence from p-values p-value >0.10, not much evidence against
null hypothesis 0.05 < p-value < 0.10, moderate evidence
against the null hypothesis 0.01 < p-value < 0.05, strong evidence
against the null hypothesis p-value < 0.01, very strong evidence
against the null hypothesis
![Page 26: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/26.jpg)
What can we conclude? So we do not have strong evidence that
fewer than 1/3 of the time scissors is thrown. Does this mean we can conclude that 1/3 of
the time scissors is thrown? Is it plausible that 1/3 of the time scissors is
thrown? Are other values plausible? Which ones? What could we do to have a better chance of
getting strong evidence for our alternative hypothesis?
![Page 27: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/27.jpg)
Summary The null hypothesis (H0) is the chance
explanation. (=) The alternative hypothesis (Ha) is you
are trying to show is true. (< or >) A null distribution is the distribution of
simulated statistics that represent the chance outcome.
The p-value is the proportion of the simulated statistics in the null distribution that are at least as extreme as the value of the observed statistic.
![Page 28: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/28.jpg)
Summary The smaller the p-value, the stronger the
evidence against the null. P-values less than 0.05 provide strong
evidence against the null. π is the population parameter is the sample proportion
![Page 29: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/29.jpg)
Exploration 1.2 (pg 1-25) Can people tell the difference between
bottled and tap water?
![Page 30: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/30.jpg)
Alternative Measure of Strength of Evidence
Section 1.3
![Page 31: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/31.jpg)
Criminal Justice System vs. Significance Tests
Innocent until proven guilty. We assume a defendant is innocent and the prosecution has to collect evidence to try to prove the defendant is guilty.
Likewise, we assume our chance model (or null hypothesis) is true and we collect data and calculate a sample proportion. We then show how unlikely our proportion is if the chance model is true.
![Page 32: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/32.jpg)
Criminal Justice System vs. Significance Tests If the prosecution shows lots of evidence
that goes against this assumption of innocence (DNA, witnesses, motive, contradictory story, etc.) then the jury might conclude that the innocence assumption is wrong.
If after we collect data and find that the likelihood (p-value) of such a proportion is so small that it would rarely occur by chance if the null hypothesis is true, then we conclude our chance model is wrong.
![Page 33: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/33.jpg)
Review
In the water tasting exploration, you could have obtained a null distribution similar to the one shown here. (H0: π = 0.25, Ha: π < 0.25 and = 3/27 = 0.1111)
• What does a single dot represent?• What does the whole distribution represent?• What is the p-value for this simulation?• What does this p-value mean?
![Page 34: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/34.jpg)
More Review The null hypothesis is the chance
explanation. Typically the alternative hypothesis is
what the researchers think is true. The p-value is the proportion of outcomes
in the null distribution that are at least as extreme as the value of the statistic actually observed in the study.
Small p-values are evidence against the null.
![Page 35: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/35.jpg)
Strength of Evidence P-values are one measure for the strength
of evidence and they are, by far, the most frequently used.
P-values essentially are measures of how far the sample statistic is away from the parameter under the null hypothesis.
Another measure for this distance we will look at today is called the standardized statistic.
![Page 36: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/36.jpg)
Heart Transplant Operations
Example 1.3
![Page 37: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/37.jpg)
Heart Transplants The British Medical Journal (2004) reported
that heart transplants at St. George’s Hospital in London had been suspended after a spike in the mortality rate
Of the last 10 heart transplants, 80% had resulted in deaths within 30 days
This mortality rate was over five times the national average.
The researchers used 15% as a reasonable value for comparison.
![Page 38: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/38.jpg)
Heart Transplants Does a heart transplant patient at St.
George’s have a higher probability of dying than the national rate of 0.15?
Observational units The last 10 heart transplantations
Variable If the patient died or not
Parameter The actual long-run probability of a death after
a heart transplant operation at St. George’s
![Page 39: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/39.jpg)
Heart Transplants Null hypothesis: Death rate at St.
George’s is the same as the national rate (0.15).
Alternative hypothesis: Death rate at St. George’s is higher than the national rate.
H0: = 0.15 Ha: > 0.15
Our statistic is 8 out of 10 or 0.80
![Page 40: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/40.jpg)
Heart TransplantsSimulation
Null distribution of 1000 repetitions of drawing samples of 10 “patients” where the probability of death is equal to 0.15.
What is the p-value?
![Page 41: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/41.jpg)
Heart TransplantsStrength of Evidence Our p-value is 0, so we have very strong
evidence against the null hypothesis. Even with this strong evidence, it would
be nice to have more data. Researchers examined the previous 361
heart transplantations at St. George’s and found that 71 died within 30 days.
Our new statistic is 71/361 ≈ 0.197
![Page 42: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/42.jpg)
Heart Transplants Here is a null distribution and p-value
based on the new statistic.
![Page 43: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/43.jpg)
Heart Transplants The p-value was about 0.007 We still have very strong evidence against
the null hypothesis, but not quite as strong as the first case
Another way to measure strength of evidence is to standardize the observed statistic
![Page 44: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/44.jpg)
The Standardized Statistic The standardized statistic is the
number of standard deviations our sample statistic is above the mean of the null distribution.
For a single proportion, we will use the symbol z for standardized statistic.
![Page 45: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/45.jpg)
The standardized statistic Here are the standardized statistics for our
two studies.
In the first, our observed statistic was 5.70 standard deviations above the mean.
In the second, our observed statistic was 2.47 standard deviations above the mean.
Both of these are very strong, but we have stronger evidence against the null in the first.
![Page 46: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/46.jpg)
Guidelines for strength of evidence If a standardized statistic is below -2 or
above 2, we have strong evidence against the null.
Standardized Statistic Evidence Against Null
between -1.5 and 1.5 not much
below -1.5 or above 1.5 moderate
below -2 or above 2 strong
below -3 or above 3 very strong
![Page 47: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/47.jpg)
Which is Bob and which is Tim?
![Page 48: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/48.jpg)
Do People Use Facial Prototyping?
Exploration 1.3
![Page 49: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/49.jpg)
Impacting Strength of Evidence
Section 1.4
![Page 50: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/50.jpg)
Introduction We’ve now looked at tests of significance
and have seen how p-values and standardized statistics give information about the strength of evidence against the null hypothesis.
Today we’ll explore factors that affect strength of evidence.
![Page 51: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/51.jpg)
Bob or Tim?
�̂�=0.82 �̂�=0. 65
When the statistic is farther away from the proportion in the null, there is stronger evidence against the null.
![Page 52: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/52.jpg)
Predicting Elections
from Faces
Example 1.4
![Page 53: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/53.jpg)
Predicting Elections Do voters make judgments about
candidates based on facial appearances? More specifically, can you predict an
election by choosing the candidate whose face is more competent-looking?
Participants were shown two candidates and asked who has the more competent-looking face.
![Page 54: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/54.jpg)
Who has the more competent looking face?
2004 Senate Candidates from Wisconsin
Winner Loser
![Page 55: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/55.jpg)
Bonus: One is named Tim and the other is Russ. Which name is the one on the left?
2004 Senate Candidates from Wisconsin
Russ Tim
![Page 56: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/56.jpg)
Predicting Elections They determined which face was the more
competent for the 32 Senate races in 2004.
What are the observational units? The 32 Senate races
What is the variable measured? If the method predicted the winner correctly
![Page 57: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/57.jpg)
Predicting Elections Null hypothesis: The probability this
method predicts the winner equals 0.5. (H0: = 0.5)
Alternative hypothesis: The probability this method predicts the winner is greater than 0.5. (Ha > 0.5)
This method predicted 23 of 32 races, hence 23/32 ≈ 0.719, or 71.9%.
![Page 58: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/58.jpg)
Predicting Elections1000 simulated sets of 32 races using the One Proportion applet.
![Page 59: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/59.jpg)
Predicting Elections With a p-value of 0.009 we have strong
evidence against the null hypothesis. When we calculate the standardized
statistic we again show strong evidence against the null.
What do the p-value and standardized statistic mean?
𝑧=0.7188−0.501
0.09=2.42 .
![Page 60: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/60.jpg)
What effects the strength of evidence?
1. The difference between the observed statistic () and null hypothesis parameter (.
2. Sample size.3. If we do a one or two-sided test.
![Page 61: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/61.jpg)
Difference between and What if researchers predicted 26 elections
instead of 23? 26/32 = 0.8125 never occurs just by chance
hence the p-value is 0.
![Page 62: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/62.jpg)
Difference between and The farther away the observed statistic is
from the average value of the null distribution (or ), the more evidence there is against the null hypothesis.
![Page 63: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/63.jpg)
Sample SizeSuppose the sample proportion stays the same, do you think increasing sample size will increase, decrease, or have no impact on the strength of evidence against the null hypothesis?
![Page 64: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/64.jpg)
Sample Size The null distribution changes as we
increase the sample size from 32 senate races to 128 races to 256 races.
As the sample sizes increases, the variability (standard deviation) decreases.
![Page 65: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/65.jpg)
Sample Size What does decreasing variability mean for
statistical significance (with same sample proportion)?
32 elections p-value = 0.009 and z = 2.42
128 elections p-value = 0 and z =5.07
256 elections Even stronger evidence p-value = 0 and z = 9.52
![Page 66: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/66.jpg)
Sample Size As the sample size increases, the
variability decreases. Therefore, as the sample size increases,
the evidence against the null hypothesis increases (as long as the sample proportion stays the same and is in the direction of the alternative hypothesis).
![Page 67: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/67.jpg)
Two-Sided Tests What if researchers were wrong; instead of the
person with the more competent face being elected more frequently, it was actually the less frequently?
H0: = 0.5
Ha: > 0.5
With this alternative, if we go a sample proportion less than 0.5, we would get a very large p-value.
This is a one-sided test. Often one-sided is too narrow In fact most research uses two-sided tests.
![Page 68: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/68.jpg)
Two-Sided Tests In a two-sided test the alternative can be
concluded when sample proportions are in either tail of the null distribution.
Null hypothesis: The probability this method predicts the winner equals 0.50. (H0: π = 0.50)
Alternative hypothesis: The probability this method predicts the winner is not 0.50. (Ha: π ≠ 0.50)
![Page 69: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/69.jpg)
Two-Sided Tests The change to the alternative hypothesis
also effects how we compute the p-value. Remember that the p-value is the
probability (assuming the null hypothesis is true) of obtaining a proportion that is equal to or more extreme than the observed statistic
In a two-sided test, more extreme goes in both directions.
![Page 70: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/70.jpg)
Two-Sided Tests Since our sample proportion was 0.7188 and
0.7188 is 0.2188 above 0.5, we also need to look at 0.2188 below 0.5. (This gets a bit more complicated when the distribution is not symmetric, but the applet will do all the work for you.)
Hence the p-value will include all simulated proportions 0.7188 and above as well as those 0.2812 and below.
![Page 71: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/71.jpg)
Two-Sided Tests 0.7188 or greater was obtained 9 times 0.2812 or less was obtained 8 times The p-value is (8 + 9 = 17)/1000 = 0.017. Two-sided tests increase the p-value (it
about doubles) and hence decrease the strength of evidence.
Two-sided tests are said to be more conservative. More evidence is needed to conclude alternative.
Let’s check this out using our applet.
![Page 72: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/72.jpg)
Predicting House Elections Researchers also predicted the 279 races for
the House of Representatives in 2004 The correctly predicted the winner in 189/279
≈ 0.677, or 67.7% of the races. The House’s sample percentage (67.7%) is bit
smaller than the Senate (71.9%), but that the sample size is larger (279) than for the senate races (32).
Do you expect the strength of evidence to be stronger, weaker, or essentially the same for the House compared to the Senate?
![Page 73: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/73.jpg)
Predicting House Elections Distance of the observed statistic to the
null hypothesis value The statistic in the House is 0.677 compared to
0.719 in the Senate Slight decrease in the strength of evidence
Sample size The sample size is almost 10 times as large
(279 vs. 32) This will increase the strength of evidence
![Page 74: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/74.jpg)
Predicting House ElectionsNull distribution of 279 sample House races
Simulated statistics ≥0.677 didn’t occur hence the p-value is 0
![Page 75: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/75.jpg)
Predicting House Elections What about the standardized statistics?
For the Senate it was 2.49 For the House is 5.90.
The larger sample size for the House trumped its smaller proportion, so we have stronger evidence against the null using the data from the House.
![Page 76: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/76.jpg)
Competitive Advantage to Uniform Colors?
Exploration 1.4
![Page 77: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/77.jpg)
In four contact sports (boxing, tae kwon do, Greco–Roman wrestling and freestyle wrestling) in the 2004 Olympics, participants were randomly assigned a red or blue uniform.
Researches then analyzed the results.
![Page 78: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/78.jpg)
Section 1.5
Normal Approximation
(Theory-Based Test)
![Page 79: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/79.jpg)
Simulation-Based vs. Theory-Based We will now look at the more traditional
method of determining a p-value through theory-based techniques.
When we used simulation-based methods, we all got slightly different p-values. The more repetitions we would do, the closer our p-values will be to each other.
In theory-based methods, we will use a theoretical distribution to model our null distribution and we will all get the same p-value.
![Page 80: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/80.jpg)
Theory-Based Techniques Hopefully, you’ve noticed the shape of
most of our simulated null distributions were quite predictable.
We can predict this shape using normal distributions.
When we do a test of significance using theory-based methods, only how our p-values are found will change. Everything else will stay the same.
![Page 81: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/81.jpg)
Our null distributions: Were typically bell shaped Centered at the proportion under the null Their width was dependent mostly on the
sample size.
The Null Distributions
![Page 82: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/82.jpg)
Both of these are centered at 0.5. The one on the left represents samples of size 30. The one on the right represents samples of size 300. Both could be predicted using normal distributions.
The Normal Distribution
![Page 83: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/83.jpg)
Examples from this chapter Which ones will normal distributions fit?
![Page 84: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/84.jpg)
When can I use a theory-based test that uses the normal distribution? The shape of the randomized null distribution is
affected by the sample size and the proportion under which you are testing.
The larger the sample size the better. The closer the null proportion is to 0.5 the better. A simple rule of thumb to follow is:
You should have at least 10 successes and 10 failures in your sample to be fairly confident that a normal distribution will fit the simulated null distribution nicely.
We will call guidelines like that above as validity conditions.
![Page 85: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/85.jpg)
Advantages and Disadvantages of Theory-Based Tests Advantages of theory-based tests
No need to set up some randomization method Fast and Easy Can be done with a wide variety of software We all get the same p-value. Determining confidence intervals (we will do
this next time) is much easier. Disadvantages of theory-based tests
They will all come with some validity conditions (like the number of success and failures we have for a single proportion test).
![Page 86: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/86.jpg)
Researchers investigated whether children show a preference to toys or candy
Test households in five Connecticut neighborhoods offered children two plates: One with candy One with small, inexpensive toys
The researchers observed the selections of 283 trick-or-treaters between ages 3 and 14.
Example 1.5: Halloween Treats
![Page 87: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/87.jpg)
Null: The proportion of trick-or-treaters who choose candy is 0.5.
Alternative: The proportion of trick-or-treaters who choose candy is not 0.5.
H0: π= 0.5 Ha: π ≠ 0.5
Notice we are focusing on candy, but could have easily done this focusing on the toy.
Halloween Treats
![Page 88: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/88.jpg)
283 children were observed 148 (52.3%) chose candy 135 (47.7%) chose toys
Let’s first run this test using one-proportion applet we have been using.
When doing this notice what the shape, center and standard deviation of the null distribution.
Halloween Treats
![Page 89: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/89.jpg)
Predicting Standard Deviation Could you have predicted the center and
shape of the null distribution? What about the standard deviation? This is a bit harder, but can easily done
with the formula where π is the proportion under the null and n is the sample size.
![Page 90: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/90.jpg)
Theory-Based Inference These predictions work if we have a large
enough sample size. We have 148 successes and 135 failures.
Is the sample size large enough to use the theory-based method?
Use the One Proportion applet to find the theory-based (normal approximation) p-value.
![Page 91: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/91.jpg)
If half of the population of trick-or-treaters preferred candy, then there’s a 43.9% chance that a random sample of 283 trick-or-treaters would have 148 or more, or 135 or fewer, choose the candy.
Since its not a small p-value, we don’t have strong (or even moderate) evidence that trick-or-treaters prefer one type of treat over the other.
Halloween Treats
![Page 92: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/92.jpg)
Notice that the standardized statistic in the applet is 0.77 (or sample proportion is 0.77 SD above the mean).
Remember that a standardized statistic of more than 2 indicates that the sample result is far enough from the hypothesized value to be unlikely if the null were true.
We had a standardized statistic that was not more than 2 (or even 1) so we don’t really have any evidence against the null.
Standardized Statistic
![Page 93: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/93.jpg)
What happens when validity condition is not met? Suppose we’re testing 12 repetitions of the
Rock-Paper-Scissors test and 1 of the 12 repetitions had scissors thrown.
Use the One-Proportion applet to test the (we will use a less than alternative) and also look at the normal approximation results.
H or T
![Page 94: Unit 1 Overview Significance – How strong is the evidence of an effect? (Chapter 1) Estimation – How large is the effect? (Chapter 2) Generalization](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649c7f5503460f94936211/html5/thumbnails/94.jpg)
Exploration 1.5: Heads or Tails?