The Chi-Square Distribution
Preliminary IdeaSum of n values of a random variable
1 2 3
1 2 3
We know that if a random variable X is normal,
...then has a Student's t-distribution
with ( -1) degrees of freedom.
Since ... = , it follows that the sum of values of a
n
n
x x x xx
nn
x x x x nx n
normal
random variable X also follows a Student's t-distribution with ( -1) degrees
of freedom.
n
Sum of Squares of random numbers
1 2 3
21
Question :
If ... follows a Student's t - distribution with ( -1)
degrees of freedom, what can we say about the sum of the squares
of these values?
In other words, what is the distribution of
nx x x x n
x
2 2 22 3
2 2 2 21 2 3
2
... ?
Answer :
Statisticans have shown that if is normal, then the sum of squares
of values of , namely,
...
has a χ distribution with ( -1) degrees of freedom.
n
n
x x x
X
n X
x x x x
n
The distribution2
1. It is called the chi-square distribution.
2. “Chi” rhymes with “High” – and the “ch” is pronounced like “k”.
3. It is a continuous random variable.
4. It has n – 1 degrees of freedom
5. It’s values are non-negative (i.e. ≥ 0)
6. It is always skewed to the right.
7. It becomes more symmetrical as n increases
8. It approximates a normal distribution for large values of n
Two Chi-square distributions
The sample variance s2 follows a chi-square distribution
2
2
22
2 2 2 2
1 2 3
The sample variance is defined by
.1
It follows that ( -1)
... .
Since the right-hand-side of this expression is a sum of sq
i
i
n
x xs
n
n s x x
x x x x x x x x
2 2
uares it follows that,
when X is normal, ( -1) has a distribution with ( -1) degrees of freedom.n s n
Standardizing the Test Statistic
In a test of hypothesis for a population variance σ2, the test statistic is the sample variance s2. The standardized test statistic is denoted by and is defined by:
2*
22*
20
1n s
Note: The standardized values are found in the standard chi-square tables on page 7 in the Formulas and Tables handout.
Chi-square table characteristics
The chi-square tables are not symmetrical.
Therefore lower-tail values and upper-tail values must be listed separately.
In the extract of the chi-square tables shown in the next slide, lower-tail areas are shaded in yellow, upper tail areas are shaded in blue.
Chi-square table (Page 7 in Formulas & Tables)
df 0.005 0.01 0.025 0.05 0.1 0.9 0.95 0.975 0.99 0.995
1 0.0000393 0.000157 0.000982 0.00393 0.0158 2.71 3.84 5.02 6.63 7.88
2 0.0001 0.020 0.0506 0.103 0.211 4.61 5.99 7.38 9.21 10.60
3 0.003 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.34 12.84
4 0.018 0.297 0.484 0.711 1.064 7.78 9.49 11.14 13.28 14.86
5 0.056 0.554 0.831 1.145 1.61 9.24 11.07 12.83 15.09 16.75
6 0.126 0.872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.228 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 0.36 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.95
9 0.53 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 0.73 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 0.95 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.73 26.76
12 1.20 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
25 6.08 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 6.55 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29
27 7.03 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.65
28 7.50 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 8.00 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 8.50 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67
Chi-square table
Examples
2
2
Lower Tail
(.05;10) 3.94
Upper Tail
(.95;10) 18.31
Two-Tail Test of Hypothesis
2 20 0
2 21 0
22 2*
20
21
22
2*0 1 2
2* 2*0 1 2
H :
H :
1TS:
AL: ( / 2, 1)
(1 / 2, 1)
DR: Do not reject H if
Reject H if or
n ss
A n
A n
A A
A A
Lower Tail Test of Hypothesis
2 20 0
2 21 0
22 2*
20
2
2*0
2*0
H :
H :
1TS:
AL: ( , 1)
DR: Do not reject H if
Reject H if
n ss
A n
A
A
Upper Tail Test of Hypothesis
2 20 0
2 21 0
22 2*
20
2
2*0
2*0
H :
H :
1TS:
AL: (1 , 1)
DR: Do not reject H if
Reject H if
n ss
A n
A
A
Example
A random sample of 20 students' grades had a standard deviation of 14.2%. Test the professor's claim with =
A professor claims that the standard deviation of grades in an exam is 10% i.e. =.10.
2
0.05.
Note: We cannot test for a population standard deviation directly, we must first convert it to the equivalent test for a population variance.
Thus, a test for =.10 is changed to a test for (. 2
2 2
10) .01. In this example the test statistic is a standard deviation of 14.2%
or .142 =(.142) = .020164.s s
The test of hypothesis
20
21
22 2*
20
21
22
2*0
2* 2*0
H : .01
H : .01
1 (20 1)(.020164)TS: .020164 38.3116
.01
AL: A (.025;19) 8.91
A (.975;19) 32.85
DR: Do not reject H if 8.91 32.85
Reject H if 8.91 or 32
n ss
0
.85
Conclusion: Reject H The professor is wrong, the standard deviation
is not equal to .10. Since the test statistic is greater
than the upper action limit, we can conclude that the
standard deviation of grades is greater than 10%.
The F distribution
Comparison of Two Population variances
2 20 1 2
2 21 1 2
H :
H :
We want to test the hypothesis that two population variances are equal, i.e.
We need to rewrite the null and alternative hypotheses so that we can use a single value to represent the test statistic.
Ratio of Variances
The null and alternative hypotheses are converted to the following form.
21
0 22
21
1 22
H : 1
H : 1
The Test StatisticA natural candidate to be the test statistic for the ratio of two population variances is the ratio of the corresponding sample variances
2122
s
s
The F-distribution
Statisticians have shown that the ratio of two chi-square variables follows a new distribution known as the F-distribution.
2
1
2 1
2
If we have one variable with -1 degrees of freedom, and another with
1 degrees of freedom then the ratio has an -distribution with 1 degrees
of freedom for the numerator and 1 degrees of
n
n F n
n
2
11 2 2
2
freedom for the denominator.
Therefore, for a specified cumulative area
( ; 1) ( ; 1; 1)
; 1)
a
a nF a n n
a n
Extract of F-tables (1-α=.95)
The F-distribution with 1 - α = .95
Denominator numerator df
df 1 2 3 4 5 6 7 8 9 10
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54
F-distribution examples
F(.95;4,9) = 3.63
F(.95;8,3) = 8.85
F(.99;15,20) = 3.09
F(.99;40,30) = 2.30
Ratio of Variances
We have already seen that for a sample of size n the sample variance has a χ2 distribution with n - 1 degrees of freedom.
It follows that the ratio of two variances
2122
s
s
numerator 1 denominator 2has an F-distribution with df 1 and df 1.n n
Test of Hypothesis for two variances2 2
0 1 2
2 21 1 2
21
0 22
21
1 22
2122
1 1 2
2 1 2
0 1 2 0 1 2
H :
H :
Rewrite the hypotheses as:
H : 1
H : 1
TS: *
AL: ( / 2; 1, 1)
(1 / 2; 1, 1)
DR: Do not reject H if * , Reject H if F* <A or F > A
sF
s
A F n n
A F n n
A F A
One-Tail Tests
For Lower Tail Tests: A = F(; n1 - 1; n2 - 1)
For Upper Tail Tests:A = F(1 - ; n1 - 1; n2 - 1).
Formula for Lower Tail F-values
Since the lower tail F-values are not given in the table we must use the formulas:
1 22 1
1 22 1
For one-tail tests:
1( ; -1, -1)
(1- ; 1, 1)
For two-tail tests:
1( / 2; -1, -1)
(1- / 2; 1, 1)
F n nF n n
F n nF n n
Examples of Lower tail F-values
F(.05;5,9) = 1/F(.95;9,5)
= 1/4.77
= 0.2096
F(.05;7,4) = 1/F(.95;4,7)
= 1/4.12
= 0.2427
EXAMPLE
The production manager of a textile company wants to test the hypothesis that the mean cost of producing a polyester fabric is the same for two different production processes. Assume that production costs are normally distributed for both processes.
Random samples of production costs for several production runs using the two different production processes are as follows:
Test the hypothesis that the two population variances are equal with a 2% level of significance.
Process I$20 $15 $20 $23 $24 $21
Process II$27 $19 $41 $30 $16
Sample Data
Pop 1 Pop 2
Sample size
n1 = 6 n2 = 5
Mean 20.5 26.6
Variance 9.9 97.3
Testing the Hypothesis21
0 22
21
1 22
2122
1 1 2
2 1 2
0
H : 1
H : 1
9.9TS: * .1017
97.3
AL: ( / 2; 1, 1) (.01;5,4) 1/ (.99;4,5) 1/11.39 .088
(1 / 2; 1, 1) (.99;5,4) 15.52
DR: Do not reject H if .088 * 15.52 , Reject
sF
s
A F n n F F
A F n n F
F
0
0
H if F* <.088 or F > 15.52
Conclusion: Do not reject H .