week 10 nov 3-7
DESCRIPTION
Week 10 Nov 3-7. Two Mini-Lectures QMM 510 Fall 2014 . Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test - PowerPoint PPT PresentationTRANSCRIPT
Week 10 Nov 3-7
Two Mini-Lectures QMM 510Fall 2014
15-2
Chi-Square Tests ML 10.1
Chapter Contents15.1 Chi-Square Test for Independence
15.2 Chi-Square Tests for Goodness-of-Fit
15.3 Uniform Goodness-of-Fit Test
15.4 Poisson Goodness-of-Fit Test
15.5 Normal Chi-Square Goodness-of-Fit Test
15.6 ECDF Tests (Optional)
Chapter 15
So many topics, so little time …
15-3
Chi-Square Test for Independence
• A contingency table is a cross-tabulation of n paired observations into categories.
• Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading.
Contingency Tables
Chapter 15
15-4
Contingency Tables
• For example:
Chapter 15
Chi-Square Test for Independence
15-5
Chi-Square Test
• In a test of independence for an r x c contingency table, the hypotheses are H0: Variable A is independent of variable B H1: Variable A is not independent of variable B
• Use the chi-square test for independence to test these hypotheses.
• This nonparametric test is based on frequencies.
• The n data pairs are classified into c columns and r rows and then the observed frequency fjk is compared with the expected frequency ejk.
Chapter 15
Chi-Square Test for Independence
15-6
• The critical value comes from the chi-square probability distribution with d.f. degrees of freedom.
d.f. = degrees of freedom = (r – 1)(c – 1)where r = number of rows in the table
c = number of columns in the table
• Appendix E contains critical values for right-tail areas of the chi-square distribution, or use Excel’s =CHISQ.DIST.RT(α,d.f.)
• The mean of a chi-square distribution is d.f. with variance 2d.f.
Chi-Square Distribution
Chapter 15
Chi-Square Test for Independence
15-7
Consider the shape of the chi-square distribution:
Chi-Square Distribution
Chapter 15
Chi-Square Test for Independence
15-8
• Assuming that H0 is true, the expected frequency of row j and column k is:
ejk = RjCk/nwhere Rj = total for row j (j = 1, 2, …, r)
Ck = total for column k (k = 1, 2, …, c)n = sample size
Expected Frequencies
Chapter 15
Chi-Square Test for Independence
15-9
• Step 1: State the Hypotheses• H0: Variable A is independent of variable B • H1: Variable A is not independent of variable B
• Step 2: Specify the Decision Rule• Calculate d.f. = (r – 1)(c – 1)
• For a given α, look up the right-tail critical value (2R) from
Appendix E or by using Excel =CHISQ.DIST.RT(α,d.f.).• Reject H0 if 2
R > test statistic.
Steps in Testing the Hypotheses
Chapter 15
Chi-Square Test for Independence
15-10
• For example, for d.f. = 6 and α = .05, 2.05 = 12.59.
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
15-11
• Here is the rejection region.
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
15-12
• Step 3: Calculate the Expected Frequenciesejk = RjCk/n
• For example,
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
15-13
• Step 4: Calculate the Test Statistic• The chi-square test statistic is
• Step 5: Make the Decision• Reject H0 if test statistic 2
calc > 2R or if the p-value α.
Steps in Testing the Hypotheses
Chapter 15
Chi-Square Test for Independence
15-14
Example: MegaStat
Chapter 15
Chi-Square Test for Independence
p-value = 0.2154 is not small enough to reject the hypothesis of independence at α = .05
all cells have ejk 5 so Cochran’s Rule is met
Caution: Don’t highlight row or column totals
15-15
• For a 2 × 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions.
• The hypotheses are:
Test of Two Proportions
Figure 14.6
Chapter 15
Chi-Square Test for Independence
15-16
• The chi-square test is unreliable if the expected frequencies are too small.
• Rules of thumb:• Cochran’s Rule requires that ejk > 5 for all cells.• Up to 20% of the cells may have ejk < 5
Small Expected Frequencies
• Most agree that a chi-square test is infeasible if ejk < 1 in any cell.• If this happens, try combining adjacent rows or columns to enlarge the
expected frequencies.
Chapter 15
Chi-Square Test for Independence
15-17
• Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories.
Cross-Tabulating Raw Data
• For example, the variables Infant Deaths per 1,000 and Doctors per 100,000 can each be coded into various categories:
Chapter 15
Chi-Square Test for Independence
15-18
Why Do a Chi-Square Test on Numerical Data?
• The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression.
• There are outliers or anomalies that prevent us from assuming that the data came from a normal population.
• The researcher has numerical data for one variable but not the other.
Chapter 15
Chi-Square Test for Independence
15-19
• More than two variables can be compared using contingency tables.
• However, it is difficult to visualize a higher-order table.• For example, you could visualize a cube as a stack of tiled 2-way
contingency tables.• Major computer packages permit three-way tables.
3-Way Tables and Higher
Chapter 15
Chi-Square Test for Independence
15-20
Chi-Square Tests for Goodness-of-Fit ML 10.2
Purpose of the Test
• The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population.
• The chi-square test is versatile and easy to understand.
Chapter 15
Hypotheses for GOF tests:
• The hypotheses are: H0: The population follows a _____ distribution H1: The population does not follow a ______ distribution
• The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).
15-21
• Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using:
Test Statistic and Degrees of Freedom for GOF
where fj = the observed frequency of observations in class jej = the expected frequency in class j if the sample came from the hypothesized population
Chapter 15
Chi-Square Tests for Goodness-of-Fit
15-22
• If the proposed distribution gives a good fit to the sample, the test statistic will be near zero.
• The test statistic follows the chi-square distribution with degrees of freedom
d.f. = c – m – 1.
• where c is the number of classes used in the test and m is the number of parameters estimated.
Test Statistic and Degrees of Freedom for GOF tests
Chapter 15
Chi-Square Tests for Goodness-of-Fit
15-23
• Many statistical tests assume a normal population, so this the most common GOF test.
• Two parameters, the mean μ and the standard deviation σ, fully describe a normal distribution.
• Unless μ and σ are known a priori, they must be estimated from a sample in order to perform a GOF test for normality.
Is the Sample from a Normal Population?
Chapter 15
Normal Chi-Square GOF Test
15-24
Method 1: Standardize the Data
Chapter 15
Normal Chi-Square GOF Test
Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).
• Transform sample observations x1, x2, …, xn into standardized z-values.
• Count the sample observations within each interval on the z-scale and compare them with expected normal frequencies ej.
15-25
• Step 1: Divide the exact data range into c groups of equal width, and count the sample observations in each bin to get observed bin frequencies fj.
• Step 2: Convert the bin limits into standardized z-values:
Method 2: Equal Bin Widths
Chapter 15
• Step 3: Find the normal area within each bin assuming a normal distribution.
• Step 4: Find expected frequencies ej by multiplying each normal area by the sample size n.
Normal Chi-Square GOF Test
Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).
Chapter 15
15-26
Method 3: Equal Expected Frequencies
Normal Chi-Square GOF Test
• Define histogram bins in such a way that an equal number of observations would be expected under the hypothesis of a normal population, i.e., so that ej = n/c.
• A normal area of 1/c is expected in each bin.
• The first and last classes must be open-ended, so to define c bins we need c-1 cut points.
• Count the observations fj within each bin.
• Compare the fj with the expected frequencies ej = n/c.
Advantage: Makes efficient use of the sample.
Disadvantage: Cut points on the z-scale points may seem strange.
15-27
Method 3: Equal Expected Frequencies
• Standard normal cut points for equal area bins.
Table 15.16
Chapter 15
Normal Chi-Square GOF Test
15-28
Critical Values for Normal GOF Test
• Two parameters, m and s, are estimated from the sample, so the degrees of freedom are d.f. = c – m – 1.
• We need at least four bins to ensure at least one degree of freedom.
Chapter 15
Normal Chi-Square GOF Test
Small Expected Frequencies• Cochran’s Rule suggests at least ej 5 in each bin (e.g., with 4 bins
we would want n 20, and so on).
15-29
Visual Tests• The fitted normal superimposed on a histogram gives visual
clues as to the likely outcome of the GOF test.
• A simple “eyeball” inspection of the histogram may suffice to rule out a normal population by revealing outliers or other non-normality issues.
Chapter 15
Normal Chi-Square GOF Test
15-30
ECDF Tests ML 10.3
• There are alternatives to the chi-square test for normality based on the empirical cumulative distribution function (ECDF).
• ECDF tests are done by computer. Details are omitted here.
• A small p-value casts doubt on normality of the population.
• The Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values.
• The Anderson-Darling (A-D) test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test is widely used because of its power and attractive visual.
Chapter 15 ECDF Tests for Normality
15-31
Chapter 15
ECDF Tests
Example: Minitab’s Anderson-Darling Test for NormalityNear-linear probability plot suggests good fit to normal distribution
p-value = 0.122 is not small enough to reject normal population at α = .05
Data: weights of 80 babies (in ounces)
15-32
Chapter 15
ECDF Tests
Example: MegaStat’s Normality Tests
Near-linear probability plot suggests good fit to normal distribution
p-value = 0.2487 is not small enough to reject normal population at α = .05 in this chi-square test
Data: weights of 80 babies (in ounces)
Note: MegaStat’s chi-square test is not as powerful as the A-D test, so we would prefer the A-D test if software is available. The MegaStat probability plot is good, but shows no p-value.