topics 24 - 25

28
Topics 24 - 25

Upload: kathie

Post on 13-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Topics 24 - 25. Topic 24. Goodness-of-Fit. Topic 25 – Test of Significant, Chi-Square-goodness of fit: Categorical variable. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Topics 24 - 25

Topics 24 - 25

Page 2: Topics 24 - 25

Topic 24

Goodness-of-Fit

Page 3: Topics 24 - 25

Topic 25 – Test of Significant, Chi-Square-goodness of fit: Categorical variable

The purpose of Test of Significant is when we do know the population Parameter but we do not necessary agree with it or we have question about it. To do the test we need to run a sample and we use the statistic to test its validity.

Step 2: we initiate hypothesis regarding the question – we can not run test of significant without establishing the hypothesis

Step 3: Decide what test we have to run, in case of proportion, we use X2 –Goodness of fit test, calculate Expected value and enter observed and expected in L1 and L2.

Step 1: Identify and define the parameter. Ho: The population proportion/distribution are the same for all populationHa: the set of proportion for at least one population is differ from others

Page 4: Topics 24 - 25

Topic 22 – Test of Significant, Chi-Square: Categorical variable

Step 4: Run the test from calculator, make sure to utilize matrix function of your calculator for this option

Step 5: From the calculator write down the p-value chi-square-test

Step 6: Compare your p-value with α – alpha – Significant Level

If p-value is smaller than α We “reject” the null hypothesis, then it is statistically significant based on data. Utilize the components of chi-square to identify component which most influenced the outcome

If p-value is greater than the α we “Fail to reject” the null hypothesis, then it is not statistically significant based on data.

Last step: we write conclusion based on step 6 at significant level α• p- value > 0.1: little or no evidence against H0 • 0.05 < p- value <= 0.10: some evidence against H0 • 0.01 < p- value <= 0.05: moderate evidence against H0 • 0.001 < p- value <= 0.01: strong evidence against H0 • p- value <= 0.001: very strong evidence against H0

Page 5: Topics 24 - 25

Chi-square test of goodness-of-fit

Page 6: Topics 24 - 25

Activity 24- 1: Birthdays of the Week - Page 522

Are people equally likely to be born on any of the seven days of the week? Or are some days more likely to be a person’s birthday than other days? To investigate this question, days of birth were recorded for the 147 “ noted writers of the present” listed in The World Almanac and Book of Facts 2000. The counts for the seven days of the week are given here: Mon Tues Wed Thu Fri Sat Sun 17 26 22 23 19 15 25

a. Identify the observational units and variable here. Is the variable categorical or quantitative? If it is categorical, is it also binary? Observational units: Variable: Type:

b. Construct a bar graph of these data. Comment on what it reveals about whether the seven days of the week are equally likely to be a person’s birthday.

Page 7: Topics 24 - 25

a. Observational units: writers Variable: days of the weeks on which the birthdays fall Type: categorical

b. The following bar graph displays the data: There does not appear to be a great difference in the proportion of people born on any given day of the week.

Page 8: Topics 24 - 25

Activity 24- 1: Birthdays of the Week (Cont)c. Let πM represent the proportion of all people who were born

on a Monday, or equivalently, the probability that a randomly selected person was born on a Monday. Similarly define πM , πTu , πW , . . . , πSu . Are these parameters or statistics? Explain.

d. The null hypothesis says that the seven days of the week are equally likely to be a person’s birthday. In that case, what are the values of πM , πTu , π . . . , πSu ?

c. These values πM , πTu , πW , . . . , πSu ) describe the proportion of all people who were born on Tuesday, Wednesday,… Sunday. The values are parameters because they describe an entire population.

d. The values are πM , πTu , πW , . . . , πSu =1/7 = .1429.

Page 9: Topics 24 - 25

The test procedure you are about to learn is called a chi- square goodness- of- fit test. It applies to a categorical variable, which need not be binary. The null hypothesis asserts specific values for the population proportion in each category. The alternative hypothesis simply says that at least one of the population proportions is not as specified in the null hypothesis. As always, the test statistic measures how far the observed sample results deviate from what is expected if the null hypothesis is correct. With a chi- square test, you construct the test statistic by comparing the observed sample counts in each category to the expected counts under the null hypothesis.

Page 10: Topics 24 - 25

Activity 24- 1: Birthdays of the Week (Cont)e. Intuitively, what value would make sense for the expected

count of Monday birthdays in this study ( with a sample size of 147), under the null hypothesis that one- seventh of all birthdays occur on Mondays? Explain.

e. You would expect the count of Monday birthdays to be (1/7) 147 = 21.

Page 11: Topics 24 - 25

Activity 24- 1: Birthdays of the Week (Cont)f. Calculate the expected counts for each of the seven days.

Record them in the middle row of the following table. Hint: You should not need to do seven different multiplications in this case.

h. You calculate X 2 =.7619 + 1.1905+ . . . .7619 = 4.857.

Page 12: Topics 24 - 25

Activity 24- 1: Birthdays of the Week (Cont)i. What kind of values ( e. g., large or small) of the test

statistic constitute evidence against the null hypothesis that the seven days of the week are equally likely to be a person’s birthday? Do you think the value calculated in part h provides convincing evidence? If you are not sure, what additional information do you need? Explain your reasoning.

i. Large values of the test statistic constitute evidence against the null hypothesis. Answers will vary about whether this test statistic provides convincing evidence. The additional evidence you need is the p-value. You need to know how likely it is that you would obtain a test statistic this large (or larger) by random chance alone if all seven days of the week are equally likely to be a person’s birthday.

Page 13: Topics 24 - 25

As always, your next step is to calculate the p- value. The p- value tells you the probability of getting sample data at

least as far from the hypothesized proportions as these data are, by random chance if the null hypothesis is true.

So again, a small p- value indicates the sample data is unlikely to have occurred by chance alone if the null hypothesis is true, providing evidence in favor of the alternative.

When the test statistic is large enough to produce a small p- value, then the sample data provide strong evidence against the null hypothesis.

Page 14: Topics 24 - 25

Activity 24- 1: Birthdays of the Week (Cont)j. Calculate the p- value for this test as accurately as possible,

using Table IV. Also shade the region whose area corresponds to this p- value on the preceding sketch.

k. Write a sentence interpreting what this p- value means in the context of this study about birthdays of the week.

l. Based on this p- value, what test decision ( reject or fail to reject H 0 ) would you reach at the .10 level? How about at the .05 and .01 levels?

j. Using Table IV with 6 degrees of freedom, 4.587 < 8.56, so the p-value is greater than .2. k. If all seven days of the week are equally likely to be a person’s birthday, the probability that you would obtain a test statistic this large (or larger) by random chance alone is at least .2.

l. With the larger p-value, do not reject H 0 at the alpha= .10, .05, or .01 levels.

Page 15: Topics 24 - 25

Activity 24- 1: Birthdays of the Week (Cont)As with all inference procedures, certain technical conditions

must be satisfied for this chi- square procedure to provide accurate p- values. In addition to requiring a random sample from the population of interest, all expected counts need to be at least five. When this condition is not met, one option is to combine similar categories together to force all expected counts to be at least five.

m. Is the expected value condition satisfied for this birthday study? What about the random sampling condition? If not, would you be comfortable in generalizing the results to a larger population anyway? Explain.

n. Summarize your conclusion about whether these sample data provide evidence against the null hypothesis that any of the seven days of the week are equally likely to be a person’s birthday.

Page 16: Topics 24 - 25

MNm. Yes; all the expected counts are 21, which is greater than 5.

However, the random sampling condition is not met. These are the birthdays of “noted writers of the present,” which is not a random sample of the population of U.S. citizens.

n. You have no statistical evidence against the null hypothesis that the seven days of the week are all equally likely to be a person’s birthday (at least for the population of famous writers).

Page 17: Topics 24 - 25

Activity 24- 3: Birthdays of the Week -page 528

Reconsider the previous activity. Now let’s use the same sample data to test a different hypothesis about birthdays. Suppose you suspect that weekend days are less likely to be birthdays, perhaps because doctors want the weekend off and so do not schedule Caesarean deliveries for weekends. Let’s test whether the data provide evidence against the hypothesis that weekend days are half as likely as other days to be someone’s birthday and that all weekdays are equally likely. The probabilities still need to add up to one, so you are now testing H 0M Tu W Th F Sa Su

p 1/6 1/6 1/6 1/6 1/6 1/12 1/12

Page 18: Topics 24 - 25

Activity 24- 3: Birthdays of the Week (Cont)a. Calculate the expected counts under this null hypothesis.

Record them in the middle row of the following table. Hint: Th ese expected counts will not be the same for every day this time. Round them to three decimal places, not to integers.

b. Record each category’s contribution to the test statistic in the bottom row of the table. Then add those contributions to calculate the value of the test statistic.

c. Use Table IV to determine the p- value of this test as accurately as possible.

d. What test decision would you make at the .01 level?

Page 19: Topics 24 - 25

ABCDa. See middle row of the following table.

b. The test statistic is X 2 = 17.857. c. Using Table IV with 6 degrees of freedom, 16.81 < 17.857 <

18.55, so .005 < p-value < .01.

d. With the smaller p-value, reject H0 at the .01 level.

Page 20: Topics 24 - 25

Activity 24- 3: Birthdays of the Week (Cont)e. Which category ( day of the week) has the greatest

contribution to the test statistic? ( In other words, which day has the greatest ( O – E)2 /E value?) Is the observed count higher or lower than the expected count for that day? Explain what this result reveals.

f. Summarize the conclusion you would draw about the proposed hypothesis, also explaining the reasoning process behind your conclusion.

These last two questions reveal that when a chi- square test gives a significant result, you can identify the category that differs most substantially from the hypothesized model by finding the greatest value of ( O – E)2 /E .

e. Sunday contributes the most to the test statistic (13.720). Its observed count is greater than its expected count, which means you have many more births on Sunday than you would expect if only 112 of the births occur on Sunday.

f. You have strong statistical evidence that the null hypothesis is not true; that is, that one or more of the population proportions is not equal to the values that you proposed in the null hypothesis. you conclude the null hypothesis is false.

Page 21: Topics 24 - 25

Exercise 24-15: Candy Colors – Page 539

Exercise 24-21: Water Taste Test – Page 540

Page 22: Topics 24 - 25

Topic 25

Inference for Two-Way Tables

Page 23: Topics 24 - 25

Topic 25 – Test of Significant, Chi-Square: Categorical variable

The purpose of Test of Significant is when we do know the population Parameter but we do not necessary agree with it or we have question about it. To do the test we need to run a sample and we use the statistic to test its validity.

Step 2: we initiate hypothesis regarding the question – we can not run test of significant without establishing the hypothesis

Step 3: Decide what test we have to run, in case of proportion, we use X2 –test, calculate Expected value and enter observed and expected in matrix A and B.

Step 1: Identify and define the parameter. Ho: The population proportion/distribution are the same for all populationHa: the set of proportion for at least one population is differ from others

Page 24: Topics 24 - 25

Topic 25 – Test of Significant, Chi-Square: Categorical variable

Step 4: Run the test from calculator, make sure to utilize matrix function of your calculator for this option

Step 5: From the calculator write down the p-value chi-square-test

Step 6: Compare your p-value with α – alpha – Significant Level

If p-value is smaller than α We “reject” the null hypothesis, then it is statistically significant based on data. Utilize the components of chi-square to identify component which most influenced the outcome

If p-value is greater than the α we “Fail to reject” the null hypothesis, then it is not statistically significant based on data.

Last step: we write conclusion based on step 6 at significant level α• p- value > 0.1: little or no evidence against H0 • 0.05 < p- value <= 0.10: some evidence against H0 • 0.01 < p- value <= 0.05: moderate evidence against H0 • 0.001 < p- value <= 0.01: strong evidence against H0 • p- value <= 0.001: very strong evidence against H0

Page 25: Topics 24 - 25

Activity 25- 1: Newspaper Reading – page 544

Page 26: Topics 24 - 25

Chi-square Test

Page 27: Topics 24 - 25

Activity 25- 2: Newspaper Reading Page 548

Page 28: Topics 24 - 25

Exercise 25-7: Pursuit of Happiness – Page 557

Exercise 25-8: Pursuit of Happiness – Page 557

Exercise 25-19: Suitability for Politics – Page 561

Exercise 25-26: Water Taste Test – Page 563