synthesis and review 2/20/12 hypothesis tests: the big picture randomization distributions...

44
Synthesis and Review 2/20/12 • Hypothesis Tests: the big picture • Randomization distributions • Connecting intervals and tests • Review of major topics • Open Q+A Section 4.4, 4.5, ES 2 Professor Kari Lock Morgan Duke University

Upload: florence-griffith

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Today, 3 – 4 pm (Prof Morgan) Today, 4 – 6 pm (Christine) Tuesday, 3 – 6 pm (Prof Morgan) Tuesday, 6 – 8 pm (Yue) Tuesday, 8 – 9 pm (Michael) (My office hours this week have been moved to Monday and Tuesday to answer questions before the exam) Office Hours this Week

TRANSCRIPT

Page 1: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Synthesis and Review 2/20/12

• Hypothesis Tests: the big picture• Randomization distributions• Connecting intervals and tests• Review of major topics• Open Q+A

Section 4.4, 4.5, ES 2 Professor Kari Lock MorganDuke University

Page 2: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• Make one double-sided page of notes for in-class exam

• WORK PRACTICE PROBLEMS! (old exams and solutions to review questions

under Documents on course website)

• Read sections corresponding to anything you are still confused about

• Practice using technology to summarize, visualize, and perform inference on data

To Do

Page 3: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• Today, 3 – 4 pm (Prof Morgan) • Today, 4 – 6 pm (Christine)• Tuesday, 3 – 6 pm (Prof Morgan)• Tuesday, 6 – 8 pm (Yue)• Tuesday, 8 – 9 pm (Michael)

(My office hours this week have been moved to Monday and Tuesday to answer questions before the exam)

Office Hours this Week

Page 4: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Hypothesis Testing1. Define the parameter(s) of interest2. State your hypotheses3. Set significance level, (usually 0.05 if unspecified)4. (Collect your data)5. Plot your data6. Calculate the observed sample statistic7. Create a randomization distribution8. Calculate the p-value9. Assess the strength of evidence against H0

10. Make a formal decision based on the significance level11. Interpret the conclusion in context

Page 5: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Exercise and GenderAmong college students, does one gender exercise more than the other?

Mean number of exercise hours per week for femalesMean number of exercise hours per week for males

F

M

0 : 0: 0

F M

a F M

HH

Page 6: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Exercise and Gender

3.00M FX X

Page 7: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Exercise and Gender

www.lock5stat.com/statkey

p-value = 0.218 Little evidence against H0

Do not reject H0

This study does not provide evidence that there is any association between gender and exercise times among college students

Conclusion:

Results this extreme would happen about 22% of the time just by random chance if H0 were true, so this study does not provide adequate evidence against H0

Think:

Page 8: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• A randomization distribution is the distribution of statistics that would be observed, just by random chance, if the null hypothesis were true

1. Simulate randomizations assuming the null hypothesis is true

2. Calculate the statistic for each simulated randomization

Randomization Distribution

Page 9: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• In a randomized experiment the “randomness” is the random allocation of cases to treatment groups

• If the null hypothesis is true, it doesn’t make any difference which treatment group you get placed in

• Simulate randomizations assuming H0 is true by reallocate units to treatment groups, and keeping the response values the same

Randomized Experiments

Page 10: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• In observational studies, there is no random allocation to treatment groups

• In observational studies, what does “by random chance” even mean? What is random???

• How could we generate a randomization distribution for observational studies?

Observational Studies

Page 11: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• When data is collected by random sampling, without random allocation between groups, we can bootstrap to see what would happen by random chance

• Bootstrapping (resampling with replacement) simulates the distribution of the sample statistic that we would observe when taking many random samples of the population

Bootstrapping

Page 12: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• For a randomization distribution however, we need to know the distribution of the sample statistic, when the null hypothesis is true

• How could we bootstrap assuming the null hypothesis is true?

• Add/subtract values to each unit first to make the null hypothesis true (“shift the distribution”)

Bootstrapping

Page 13: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Reallocating versus Resampling

What is random? How do we simulate “random chance”?

Randomized Experiments

Random assignment to treatment groups

Reallocate (rerandomize)

Observational Studies

Random sampling from the population

Resample (bootstrap)

• In both cases, we need to make the null hypothesis true for a randomization distribution

Page 14: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Was the exercise by gender data collected via a randomized experiment?

(a)Yes(b) No(c) There is no way to tell

Exercise by Gender

Page 15: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• The randomness is not who is which gender (as with randomized experiments), but who is selected to be a part of the study

• Male sample mean: 12.4 hours• Female sample mean: 9.4 hours

• Add 3 hours to all the females, and then resample using bootstrapping

• www.lock5stat.com/statkey

Exercise by Gender

Page 16: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• Reallocating and resampling usually give similar answers in terms of a p-value

• For this class, it is fine to just use reallocating for tests, even if it is not actually a randomized experiment

• The point is to understand the reason for generating a randomization distribution

Method of Randomization

Page 17: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• Let’s return to the body temperature data

• Using bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05, 98.47 )

• Let’s do a hypothesis test to see how much evidence this data provides against = 98.6

H0 : = 98.6Ha : ≠ 98.6

Body Temperatures

Page 18: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• How would we create a randomization distribution?

• The sample mean is 98.26. Add 0.34 to each unit so we can sample with replacement mimicking sampling from a population with mean 98.6

• Take many bootstrap samples to create a randomization distribution

Body Temperatures

Page 19: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Randomization Distribution

p-value = 0.002

Page 20: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Two Distributions

Page 21: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• If a (1 – α)% confidence interval does not contain the value of the null hypothesis, then a two-sided hypothesis test will reject the null hypothesis using significance level α• Intervals provide a range of plausible values for the population parameter, tests are designed to assess evidence against a null hypothesis

Intervals and Tests

Page 22: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• Using bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05, 98.47 )

H0 : = 98.6Ha : ≠ 98.6

• At α = 0.05, we would reject H0

Body Temperatures

Page 23: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

REVIEW

Page 24: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Population

Sample

Sampling

Statistical Inference

The Big Picture

Exploratory Data Analysis

Page 25: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Was the sample randomly selected?

Possible to generalize to

the population

Yes

Should not generalize to

the population

No

Was the explanatory variable randomly

assigned?

Possible to make

conclusions about causality

Yes

Can not make conclusions

about causality

No

Data Collection

Page 26: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Variable(s) Visualization Summary StatisticsCategorical bar chart,

pie chartfrequency table,

relative frequency table, proportion

Quantitative dotplot, histogram,

boxplot

mean, median, max, min, standard deviation,

z-score, range, IQR,five number summary

Categorical vs Categorical

side-by-side bar chart, segmented bar chart,

mosaic plot

two-way table, proportions

Quantitative vs Categorical

side-by-side boxplots statistics by group

Quantitative vs Quantitative

scatterplot correlation

Page 27: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Descriptive StatisticsThink of a topic or question you would like to use data to help you answer.

– What would the cases be? – What would the variables be? (Limit to one or two variables)

Page 28: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Descriptive StatisticsHow would you visualize and summarize the variable or relationship between variables?

a) bar chart/pie chart, proportions, frequency table/relative frequency table

b) dotplot/histogram/boxplot, mean/median, sd/range/IQR, five number summary

c) side-by-side or segmented bar plots/mosaic plots, difference in proportions, two-way table

d) side-by-side boxplot, stats by groupe) scatterplot, correlation

Page 29: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Statistic vs Parameter

• A sample statistic is a number computed from sample data.

• A population parameter is a number that describes some aspect of a population

Page 30: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Sampling Distribution

• A sampling distribution is the distribution of statistics computed for different samples of the same size taken from the same population

• The spread of the sampling distribution helps us to assess the uncertainty in the sample statistic

• In real life, we rarely get to see the sampling distribution – we usually only have one sample

Page 31: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• A bootstrap sample is a random sample taken with replacement from the original sample, of the same size as the original sample

• A bootstrap statistic is the statistic computed on the bootstrap sample

• A bootstrap distribution is the distribution of many bootstrap statistics

Bootstrap

Page 32: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Original Sample

BootstrapSample

BootstrapSample

BootstrapSample

.

.

.

Bootstrap Statistic

Sample Statistic

Bootstrap Statistic

Bootstrap Statistic

.

.

.

Bootstrap Distribution

Page 33: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Confidence Interval

• A confidence interval for a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples

• A 95% confidence interval will contain the true parameter for 95% of all samples

Page 34: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Standard Error

• The standard error (SE) is the standard deviation of the sample statistic

• The SE can be estimated by the standard deviation of the bootstrap distribution

• For symmetric, bell-shaped distributions, a 95% confidence interval is

statistic 2 SE

Page 35: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Percentile Method

• If the bootstrap distribution is approximately symmetric, a P% confidence interval can be gotten by taking the middle P% of a bootstrap distribution

Page 36: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Bootstrap DistributionBest Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Best Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Observed Statistic

Best Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Observed Statistic

P%

Best Guess at Sampling Distribution

Statistic

2 3 4 5 6 7 8

Observed Statistic

P%P%P%

Upper BoundUpper Bound

Lower Bound

Page 37: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• How unusual would it be to get results as extreme (or more extreme) than those observed, if the null hypothesis is true?

• If it would be very unusual, then the null hypothesis is probably not true!

• If it would not be very unusual, then there is not evidence against the null hypothesis

Hypothesis Testing

Page 38: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• The p-value is the probability of getting a statistic as extreme (or more extreme) as that observed, just by random chance, if the null hypothesis is true

• The p-value measures evidence against the null hypothesis

p-value

Page 39: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Hypothesis TestingDistribution of Statistic Assuming Null

Statistic

-3 -2 -1 0 1 2 3

Observed Statistic

Distribution of Statistic Assuming Null

Statistic

-3 -2 -1 0 1 2 3

Distribution of Statistic Assuming Null

Statistic

-3 -2 -1 0 1 2 3

Observed Statistic

p-value

Page 40: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

• A randomization distribution is the distribution of sample statistics we would observe, just by random chance, if the null hypothesis were true

• The p-value is calculated by finding the proportion of statistics in the randomization distribution that fall beyond the observed statistic

Randomization Distribution

Page 41: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Statistical ConclusionsStrength of evidence against H0:

Formal decision of hypothesis test, based on = 0.05 :

statistically significant

not statistically significant

Page 42: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Formal Decisions

For a given significance level, ,

p-value < Reject Ho

p-value > Do not Reject Ho

Page 43: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

Errors

Reject H0 Do not reject H0

H0 true

H0 false

TYPE I ERROR

TYPE II ERRORTrut

h

Decision

0If true, probability = H

0If true,

probability = 1H

If true, probability =

ap

Hower

If true, probability = 1

ap r

Howe

Page 44: Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A

QUESTIONS???