part viii - tests of signi cancepeople.math.binghamton.edu/jbrennan/home/s13mat148/partviii.pdftest...

Part VIII - Tests of SignificanceChapters 26, 28, and 29

Dr. Joseph Brennan

Math 148, BU

Dr. Joseph Brennan (Math 148, BU) Part VIII - Tests of Significance 1 / 90

Tests of Significance

Confidence Intervals: Intervals on the number line which are usedto estimate the population parameter (µ) from the sample statistic(x̄).

Tests of Significance: Tests intending to assess the evidenceprovided by the data in favor of some claim about a populationparameter (µ).

A significance test is a formal procedure which uses the data to choosebetween two competing hypotheses, the null hypothesis and thealternative hypothesis.

Hypotheses are statements about the population parameter (µ). Adecision rule based on the probability computation is used to choosebetween two hypotheses.


Stating Hypotheses

A hypothesis is a statement about the population parameter (µ) whosetruth is in question.

Example (Air Force recruits) (from McGHEE Introductory Statistics)

Suppose the mean weight of male Air Force recruits is thought to bearound 154 pounds. The following hypotheses can be drawn:

Verbal Statement Math StatementThe mean weight is 154 pounds H : µ = 154The mean weight is less than 154 pounds H : µ < 154The mean weight is greater than 154 pounds H : µ > 154The mean weight is not equal to 154 pounds H : µ 6= 154.

The tests we will develop require two hypotheses, the null hypothesis andthe alternative hypothesis.


The Null Hypothesis

Null Hypothesis: The basic or primary statement about the parameter(µ).

We abbreviate null hypothesis as H0.

We will consider a special case of a simple null hypothesis which hasa form (H0 : µ = µ0), where µ0 is the hypothesized value for µ.

The null hypothesis is usually a sceptical statement of no difference or noeffect. Generally, the null hypothesis represents some well-establishedposition that should not be rejected unless there is considerable evidenceto the contrary.


The Null Hypothesis

Note 1: The null hypothesis states that µ = µ0. For any samplefrom the population we do not expect that x̄ = µ0 exactly.

Recall from Part V: The probability of an exact value arising from acontinuous random variable is zero.

Even though x̄ will generally be different from µ0, it does not meanthat we will always reject H0. A test of significance attempts todetermine if the difference is real or if it is attributable tochance error.


The Null Hypothesis

Note 2: We do not test to accept H0; it is assumed to be true.Rather, we test to see if it should be rejected.

If we reject H0, there must be some other hypothesis that we arewilling to accept. This is called the ALTERNATIVE HYPOTHESISand is abbreviated as Ha.

We establish the alternative hypothesis by (partially) negating the nullhypothesis. Since H0 involves equality, the alternative hypothesisinvolves an inequality. Depending on the direction of inequality,there exist one-sided and two-sided alternative hypotheses :

One-sided alternative hypothesis has the form:

i) (Ha : µ > µ0) (right-sided alternative) orii) (Ha : µ < µ0) (left-sided alternative).

Two-sided alternative hypothesis has the form (Ha : µ 6= µ0).


The Null hypothesis

The direction of inequality in the alternative hypothesis (Ha) usuallyfollows from the question in the problem. If the direction is not specifiedin the problem, we should use the two-sided alternative as a default.

It would be an act of CHEATING to first look at the data and then frameHa to fit what the data shows.

The textbook calls this type of cheating data snooping.

If you do not have a specific direction firmly in mind in advance, youshould use the two-sided alternative hypothesis. Some statisticians wouldargue that we should always use a two-sided alternative.

Note 3: The alternative hypothesis is also known as researchhypothesis. The alternative hypothesis is a statement that we wantto prove.


Hypothesis Examples

The flat Earth model was common in ancient times, such as in the civilizationsof the Bronze Age or Iron Age. This may be thought of as the null hypothesis,H0, at the time.

H0 : World is Flat.

Hellenistic astronomy established the spherical shape of the earth around 300 BC.Many of the Ancient Greek philosophers assumed that the sun, moon and otherobjects in the universe circled around the Earth.

H0 : The Geocentric Model : Earth is the center of the Universe.

Copernicus had an alternative hypothesis, H1 that the world actually circledaround the sun, thus being the center of the universe. Eventually, people gotconvinced and accepted it as the null, H0.

H0 : The Heliocentric Model: Sun is the center of the universe.

Later someone proposed an alternative hypothesis that the sun itself also circled

around the something within the galaxy, thus creating a new null hypothesis. This

is how research works - the null hypothesis is refined through testing; even if it

isn’t correct, H0 is an improvement over its predecessors.Dr. Joseph Brennan (Math 148, BU) Part VIII - Tests of Significance 8 / 90

Hypothesis Examples

Example (Air Force recruits)The mean weight of male Air Force recruits is thought to be around 154pounds.

H0 : µ = 154, Ha : µ 6= 154.

(a) A 1998 study reported that the average weight of newborn kids is 7pounds. You plan to take a simple random sample of newborns to seeif the average weight has increased.

H0 : µ = 7

Ha : µ > 7


Hypothesis Examples

(b) Last year the company’s technicians spent on average 3 hours a dayto respond to from customers. Does this year’s data show a differentaverage response time?

H0 : µ = 3

Ha : µ 6= 3.

(c) The average square footage of one-bedroom apartments in a newdevelopment is advertised to be 460 square feet. A student groupthinks that the apartments are smaller than advertised. They hire anengineer to measure a sample of apartments to test their suspicion.

H0 : µ = 460

Ha : µ < 460.


Hypothesis Examples

(d) Suppose you are playing a game which involves rolling a die and youhave a feeling that 6 is appearing more often that it should! Let X bethe variable that records the number that shows up on rolling the die.

H0 : P(X = 6) = 1/6, Ha : P(X = 6) > 1/6.

(e) Suppose you are flipping a coin, which otherwise seems fair, and seemto believe that heads is appearing less often that it should!

H0 : P(H) = 1/2, Ha : P(H) < 1/2.


Why Bother?

For each experiment or data, the null hypothesis is a general default position

which needs to be substantiated or ruled out. Moreover, this needs to be done on

an experiment-by-experiment basis.


Decision Rule and Test Statistics

In testing hypotheses we speak of testing the null against the alternativehypothesis. We either reject or do not reject the null hypothesis basedon the evidence from the data.

If the null hypothesis is rejected, we accept the alternative hypothesis.The decision to reject H0 or not should be based on an appropriatedecision rule. A decision rule for a test is based upon a test statistic.

Test Statistic: Test statistic is a quantity computed from the data whichmeasures the compatibility between the null hypothesis and the data.


Test Statistic

Very often the test statistic has the following form:

Test Statistic =estimate - hypothesized value

standard deviation of the estimate

If the hypotheses are statements concerning the population mean µ, thentest statistic has the form:

z =x̄ − µ0σx̄

=x̄ − µ0(

σ√n

)The above statistic is called the z-statistic for µ because it is the z-scorefor x̄ (under the null hypothesis). If the population distribution is normalor sample size is large enough, (n ≥ 30), the distribution of z is standardnormal (z ∼ N(0, 1)). From interpretation of the z-score it follows that:

The z statistic shows by how many standard deviations x̄ is smaller orgreater than µ0 (specified by H0).


P - value

The test statistic is used to compute the P-value of the test.

P-value: In a test of hypotheses, the P-value is the probability that thetest statistic would take a value as extreme or more extreme (in the samedirection) than of that actually observed. This probability is computedunder the assumption that H0 is true.

Small P-values correspond to extreme values of the test statistic andshould lead to rejection of H0. The smaller the P-value, the stronger theevidence against H0. We decide to reject or not reject H0 by comparingthe P-value with the level of significance α.

Note: Usually statistical studies report the value of the test statistic andthe P-value.


Significance Level

Significance Level: The significance level, α, is a fixed constant whichdenotes the critical P-value which we regard to be decisive. This amountsto announcing in advance how much evidence against H0 we will require toreject H0.

The most frequently used values of α are 0.1, 0.05 or 0.01.

Rules of decision based on the P-value:

We reject H0 at α level if P-value< α. Otherwise, we fail to reject H0.

If we reject H0, we say that the result is statistically significant atα level, which means that the observed difference between the dataand H0 is too large to be attributed to the chance error.

When we reject H0, we accept Ha.


Example

Consider a test with two hypotheses

H0 : µ = 140, Ha : µ > 140,

where a sample of size 64 has a mean 143 and standard deviation of 10.

The value of x̄ is under the right tail of the distribution as 140 < 143. TheP-value is

P − value = P(x̄ ≥ 143)

= P

(Z >

143− 14010√64

)= P(Z > 2.4) = 0.0082.

Whatever traditional value of α we choose (0.01, 0.05 or 0.1), we willreject H0 since P − value < α. So we reject H0 and accept the alternativehypothesis as more plausible.Dr. Joseph Brennan (Math 148, BU) Part VIII - Tests of Significance 17 / 90

Case 1: Left-sided alternative (Ha : µ < µ0)

P-value computation:

Let z be the computed value of the test statistic. The way in which wecompute the P-value depends on the direction of the alternativehypothesis. There are three possible cases :

Case 1: Left-sided alternative (Ha : µ < µ0)P − value = P(Z ≤ z)


Case 2: Right-sided alternative (Ha : µ > µ0)

Case 2: Right-sided alternative (Ha : µ > µ0).

P − value = P(Z ≥ z)


Case 3: Two-sided alternative (Ha : µ 6= µ0)

Case 3: Two-sided alternative (Ha : µ 6= µ0).

P − value = 1− P(−z < Z < z).


The P - Value Computation

The direction of inequality in the P-value is the same as thedirection of inequality in the alternative hypothesis.

The P-value in the two-sided case is twice as large as theP-value in the one-sided case.

The explanation is the following: we have an alternative hypothesis(µ 6= µ0). To reject the null hypothesis we should observe eitherextreme positive or extreme negative values of the test statistic z .Suppose that for a given sample we found z = −2.2. Since we areconsidering extreme values in both directions in Ha, we argue that thez-values more extreme than the observed −2.2 are:

Z ≥ 2.2 OR Z ≤ −2.2.


The P-Value Computation

Z ≥ 2.2 OR Z ≤ −2.2.The above two inequalities can be combined into one:

Z ≤ −2.2 or Z ≥ 2.2 ⇔ |Z | ≥ 2.2.

Hence, the P-value is computed as

P − value = P(|Z | ≥ 2.2) = 1− P(−2.2 < Z < 2.2) = 2.78%.

As a consequence, it is easier to reject H0 in favor on a one-sidedalternative because the P-value in the case of two-sided alternative istwice that of the P-value in the case of a one-sided alternativehypothesis.


The P-Value Computation

The Pvalue provides the strength of evidence against H0. The smallerthe P-value, the stronger the evidence against H0.

If the P - value is less than 0.05, the result is often calledstatistically significant. This is because α = 0.05 is the mostfrequently used level of significance.If the P - value is less than 0.01, the result is called highlysignificant. The significance level α = 0.01 is used when we want toreject H0 only for VERY convincing evidence against it.


Common Misinterpretation of a P - Value

Many people misunderstand what question a P - value answers.

If the P-value is 0.03, that means that there is a 3% chance of observing adifference from H0 as extreme as you observed on a subsequent trial.

It is tempting to conclude that there is a 97% chance that the Ha iscorrect and a 3% chance that the H0 is correct.

This is an incorrect interpretation!

What you can say is that random sampling from identical populationswould lead to a difference smaller than you observed in 97% ofexperiments and larger than you observed in 3% of experiments.

You have to choose. Would you rather believe in a 3% coincidence? Orthat the H0 is incorrect?


One Sample z - Test for µ

One Sample z-Test for µ: A test to determine the validity of astatement concerning the mean µ based upon a single sample.

STEP 1: State the hypotheses.

H0 : µ = µ0,

As a default, the alternative Ha is two-sided. A problem may specifywhether Ha is left-sided or right-sided.

STEP 2: Choose the significance level α.Assume α = 0.05 unless otherwise stated.

STEP 3: Calculate the test statistic.

z =x̄ − µ0

σ√n

.


One Sample z - Test for µ

STEP 4: Compute the P - value. The formula for the P-valuedepends on the alternative hypothesis.

Recall that the P-value is the probability that a test statistic wouldtake a value more extreme than of that actually observed.

STEP 5: Make a decision:Reject H0 if P − value < α.Do not reject H0 is P − value > α.

STEP 6: State the conclusion in terms of the alternativehypothesis.

If you rejected H0, say ”there is enough evidence at α level that stateyour alternative hypothesis in words here”.

If you did not reject H0, say ”there is not enough evidence at α levelto say that state your alternative hypothesis in words here”.


Assumptions Associated to the z-Test

The assumptions for the one-sample z-test are the same as assumptionsfor calculating the confidence interval for µ in Chapter 21:

Assumption 1. The data results from a simple random samplefrom a very large population or observations are obtained by samplingwith replacement from a box (population).

Assumption 2. The population is either normal or the sample size islarge enough (n ≥ 30) for the Central Limit Theorem to apply.


Example (from Moore and McCabe)

Do middle-aged male executives have different average blood pressurethan the general population?

The National Center for Health Statistic reports that the mean systolicblood pressure for males 35 to 44 years is 128 and the standard deviationin this population is 15.

The medical director of a company looks at the medical records of 72company executives in this age group and finds that the mean systolicblood pressure in this group is 126.07. Is this enough evidence thatexecutive blood pressures differ from the national average?

Solution: We will go through the steps outlined in the algorithm forhypothesis testing.Step 1. (H0 : µ = 128) (Ha : µ 6= 128)

In words, H0 says that executives are not different from other men,whereas Ha says that they are different.


Example (Executive Blood Pressure)

Step 2. Choose α = 0.05.

Step 3.

z =x̄ − µ0

σ√n

=126.07− 128

15√72

≈ −1.09.

Step 4.

P − value = P(Z ≤ −1.09) + P(Z ≥ 1.09)= 1− P(−1.09 < Z < 1.09)= 100%− (86.21%− 13.79%) = 27.58%.



Step 5. Since P − value = 0.2758 > α = 0.05, the null hypothesis is notrejected.

In fact, more than 27% (about 1 time out of 4) of times a SRS of size 72

from the general male population would produce a mean blood pressure at

least as far from 128 as that of the executive sample.

Step 6. There is not enough evidence at α = 0.05 level that the bloodpressure of middle-aged executives differ from other men.


Example (Sleeping Habits)

In average, how many hours do people sleep at night? One hundredWal-Mart shoppers were asked this question. The sample mean was foundto be x̄ = 7.5 hours. Assume σ = 1.5 hours. Is the result significantlydifferent from 8 hours?

Solution: We can perform the one-sample z-test, but the conclusion willnot be valid.

The survey was not a random sample but a convenience sample; whichconsists of people who are readily available and convenient (Wal-Martshoppers).

Wal-Mart shoppers are specific in some sense and do not constitute arepresentative sample of all people. So, generalizations can not be madeto all the people by studying just a convenience sample of 100 Wal-Martshoppers.


Example (from Biostatistics by Triola & Triola)

A researcher is convinced that on average humans are colder thanreported. A simple random sample of 106 body temperatures was takenand with a mean of 98.20◦F. Assume that the population standarddeviation σ is known to be 0.62◦F. Use a 0.05 significance level to test thecommon belief that the mean body temperature of healthy adults is equalto 98.6◦F.


Example (Healthy Body Temperature)

Solution: We write out the following steps :

Step 1. (H0 : µ = 98.6◦F ) (Ha : µ < 98.6

◦F ).

Step 2. α = 0.05.

Step 3. The z-score is

z =x̄ − µ0

σ√n

=98.2− 98.6

0.62√106

= −6.642.

Step 4. The P-value is P(Z ≤ −6.642), which is way smaller than0.05.

Step 5. We reject H0.

Step 6. There is enough evidence at level α = 0.05 such thatµ < 98.6◦F is more acceptable.


Example(from Biostatistics by Triola & Triola)

The health of the bear population in Yellowstone National Park ismonitored by periodic measurements taken from anesthetized bears. Asample of 5 bears has a mean weight of 182.9 lb. Assuming that thestandard deviation σ is known to be 121.8 lb, use a 0.1 significance level totest the claim that the population mean of all such bear weights is 200 lb.

Solution:

Step 1. H0 : µ = 200 lb Ha : µ 6= 200 lb.Step 2 and 3 α = 0.1 and the z-score is

z =182.9− 200

121.8√5

= −0.31.

Step 4.

P-value: 1− P(−0.31 ≤ Z ≤ 0.31) = 0.2434 > 0.1

Step 5 and 6: We do not reject H0. There is not enough evidence atlevel α = 0.1 that µ 6= 200 lb.


WARNING

In the previous example, we assumed, very conveniently, that thedistribution of the bear weights is normal. Even under this assumption, thesample size of n = 5 is way too small to use the normal table; orcalculations do not apply!

What do we do in such a scenario? The answer lies in the Student’st-distribution.


Student’s t-Distribution

Student’s t-Distribution: (or simply the t-distribution) is a family ofcontinuous probability distributions that arises when estimating themean of a normally distributed population in situations where the samplesize is small and population standard deviation is unknown.

Like the normal distribution, the t-distribution is symmetric andbell-shaped. The t-distribution has heavier tails, meaning that it is moreprone to producing values that fall far from its mean.


Why ”Student”?

History: A derivation of the t-distribution was first published in 1908 byWilliam Sealy Gosset while he worked at the Guinness Brewery in Dublin.

One version of the origin of the pseudonym Student is that Gosset’semployer forbade members of its staff from publishing scientific papers, sohe had to hide his identity.

Another version is that Guinness did not want their competition to knowthat they were using the t-test to test the quality of raw material.

The t-test and the associated theory became well-known through the workof the famous statistician R.A. Fisher, who called the distributionStudent’s distribution.


t-Distributions

There is not a single t-distribution. The t-distributions are indexed byDegrees of Freedom, a term related to the sample size the t-distributionrepresents.

For a sample of size n, use a t-distribution with n − 1 degrees of freedom.We only need t-distributions for sample sizes less than 30.


The Family of t-Distributions

There is the whole family of the t - distributions indexed by the number of

degrees of freedom.

The probability densities of all the members of the family of t - distributions are

symmetric about 0, bell-shaped, but have more probability on the tails than does

the standard normal distribution. For this reason a t - distribution is called

heavy-tailed.Dr. Joseph Brennan (Math 148, BU) Part VIII - Tests of Significance 40 / 90

t-Distributions

We denote by tn−1 the t-distribution with n − 1 degrees of freedom.

Consider a random variable X that is normally distributed (or at leastsymmetric) of which n ≤ 30 samples are taken, the probability distributionfor the sum approximates tn−1. That is, for the mean µ and standarddeviation σ of X , the

X̄ − µσ√n−1

≈ tn−1.

Similar to confidence intervals with z-scores, a confidence level of C% hasa t-score tn−1C in the C/2 + 50th percentile.


Confidence Intervals Chart

To find a confidence interval for the population mean µ with confidencelevel C from a sample of size n with mean x̄ and sample standarddeviation s:

(1) If the population standard deviation σ is known, and eitherpopulation distribution is normal, or sample size is large (n ≥ 30)[

x̄ − zC × σ√n

, x̄ +zC × σ√

n

](2) If the population standard deviation σ is unknown thenCase 1: (n < 30 and population distribution is normal)[

x̄ − tn−1C × s√n − 1 , x̄ +

tn−1C × s√n − 1

]Case 2: (n ≥ 30 and population distribution is normal)[

x̄ − zC × s√n

, x̄ +zC × s√

n

]Dr. Joseph Brennan (Math 148, BU) Part VIII - Tests of Significance 42 / 90

Example (Vitamin C)

The amount of vitamin C in mg/100g in a certain produce is measured ina random sample of size 10:

26 31 23 21 10 25 33 12 16 30

Compute the 95% confidence interval for µ, the mean vitamin C content.

Solution: The sample mean is x̄ = 22.7 and the sample standarddeviation is s = 7.53.

Assuming that the distribution of vitamin C content in the produce isnormal, we will use the t-confidence interval since σ is unknown andn = 10 < 30. The t-distribution in this case has 9 degrees of freedom.


Example (Vitamin C)

Since we are looking for a confidence level of 95%, we must find thet-score of the 95/2 + 50 = 97.5th percentile. On our t-table, that is underdf = 9 and t0.025

t90.025 = 2.262

The 95% CI for µ is

[x̄ − tn−1C × s√n − 1 , x̄ +

tn−1C × s√n − 1 ]

=

[22.7− 2.262× 7.53√

9, 22.7 +

2.262× 7.53√9

]= [17.03, 28.38]

We are 95% confident that the mean vitamin content in the produce isbetween 17.03 and 28.38.


Example (Textbook Expenditures)

A random sample of semester textbook expenditures by 81 fulltimeuniversity students had a mean of $100 and standard deviation of $30.Find a 99% confidence interval for the mean expenditures for textbooks bystudents at this university.

Solution: We have x̄ = 100 and s = 30. The population standarddeviation σ is unknown, and n = 81 > 30.

The approximate 99% confidence interval µ for is found using a z-table(as n > 30). We must find the z-score, zC , for the 99/2 + 50 = 99.5thpercentile. zC = 2.576.[

100− 2.576× 309, 100 + 2.576× 30

9

]= [91.41, 108.59]

Interpretation: We are 99% confident that the true mean expendituresfor textbooks by students in the university is between $91.41 and $108.59.Dr. Joseph Brennan (Math 148, BU) Part VIII - Tests of Significance 45 / 90

One Sample t-Test for the Population Mean µ

A test of significance for the population mean µ with sample size n ≤ 30will use t-scores rather than z-scores.

Assume the null hypothesis states (H0 : µ = µ0).

The test statistic is calculated as

tx̄ =x̄ − µ0

s√n−1

The t - statistic is a basis for the t - test for µ which has analogous stepsto the z - test with only two differences:

1 The t-test uses t - statistic on Step 3.

2 In the t - test the P-value on Step 4 is computed as thecorresponding area under the t - curve with n − 1 degrees of freedom.

The t - tests are usually used in the case when σ is unknown, thedistribution of X is roughly normal, and the sample is small (n < 30).


Example (Piano Lessons for Preschoolers)

Do piano lessons improve the spatial-temporal reasoning of preschoolchildren? A study designed to test this hypothesis measured thespatial-temporal reasoning of 20 preschool children before and after 6months of piano lessons. The changes in reasoning scores are shown below

2 5 7 -2 2 7 4 1 0 7

-2 9 6 0 3 6 -1 3 -4 -6

Solution: Summary statistics : x̄ = 2.35, s = 3.98. The data’s histogramis shown on the next slide.



The distribution is not normal-like, but it is not extremely skewed.



Sample size n = 20 < 30 and σ is unknown, so we will use the t-test. Thehypotheses are

(H0 : µ = 0) (Ha : µ > 0)

The t-statistic is

t =x̄ − µ0

s√n−1

=2.35− 0

3.98√19

= 2.57,

with degrees of freedom n − 1 = 19. From the t-table:

P-value < 0.01.

The result is highly significant. We reject the null hypothesis and concludethat piano lessons improve spatial-temporal reasoning of preschoolers.


Example (Bear Weights)

A sample of 5 bears has a mean weight of 182.9 lb. Assuming that thestandard deviation σ is known to be 121.8 lb, use a 0.1 significance level totest the claim that the population mean of all such bear weights is 200 lb.

Solution: We shall use the t-test as n = 5 < 30 and assume the weightdistribution is normal.

Step 1. (H0 : µ = 200 lb) (Ha : µ 6= 200 lb)Step 2 and 3. α = 0.1 and the t-score is

t =182.9− 200

121.8√4

= −0.28

Step 4. The P-value is

2P(t4 ≥ | − 0.28|) > 2 · 0.25 = 0.5 > 0.1.

Step 5 and 6. We do not reject H0. There is not enough evidence atlevel α = 0.1 that µ 6= 200 lb.


Chart for Tests of Significance for µ

We have two types of test statistic for the null hypothesis (H0 : µ = µ0).

(1) If the sample size is large, (n > 30), then

test statistic =x̄ − µ0

s√n

and use the normal table to calculate a P-value.

(2) If the sample size is small, n ≤ 30, and the population distribution isroughly normal

test statistic =x̄ − µ0

s√n

and use the t-table with n − 1 degree of freedom to calculate a P-value.


Example (from McGHEE Introductory Statistics).

Rats that are raised in laboratory environment have a mean life span ofaround 24 months. A sample of 31 rats reared to adulthood in a germ-freeenvironment had life spans with a mean of 27.3 and a standard deviationof 5.9 months. Does this type of rearing have an effect on the life span ofthe laboratory rat?

Solution: We are given

x̄ = 27.3 s = 5.9 n = 31

Step 1. (H0 : µ = 24) (Ha : µ > 24).

Note: We set the alternative as one-sided since we would expect thelifetime to increase in the germ-free environment.

Step 2. Choose α = 0.05.


Example (Rat Lifespan)

Step 3. Calculate the test statistic:

z =x̄ − µ0

s√n

=27.3− 24

5.9√31

≈ 3.1

Step 4. Calculate the P-value:

P − value = P(Z ≥ 3.10) = 0.0001.

Step 5. The result is highly significant, so we reject H0.

Step 6. There is enough statistical evidence at α = 0.05 level that theaverage lifetime of rats living in a germ-free environment is greater than24 months.


Two-Sided Significance Tests and Confidence Intervals

Confidence intervals for µ and two-sided significance tests for µ arerelated. We can decide whether we should or should not reject H0 fromthe computed two-sided CI for µ.

Relationship between the two-sided test of significance andthe confidence interval for µ.

A level α two-sided significance test rejects a hypothesis (H0 : µ = µ0)exactly when the value µ0 falls outside a (1− α)% confidence interval forµ.

The significance level α of the two-sided test is related to theconfidence level C of the confidence interval through the rule

C = 1− α.


Two-Sided Significance Tests and Confidence Intervals

H0 : µ = µ0, Ha : µ 6= µ0

µ0

µ0

x̄

x̄

α = 1− C

Case 1 : µ0 is inside of the C% CI

Case 1 : µ0 is outside of the C% CI

Decision : Failed to reject H0 at level α

Decision : Reject H0 at level α


Example (Chicken)

A company that manufactures chicken feed has developed a new product.

The company claims that 12 weeks after hatching, the average weight ofchickens using this product will be 3.0 pounds. The owner of a largechicken farm decided to examine this new product, so he fed the newration to all 12,000 of his newly hatched chickens.

At the end of 12 weeks he selected a simple random sample of 20 chickensand weighed them. The sample mean for the 20 chickens is 3.06 poundsand the sample standard deviation was s = 0.63 pounds.

(a) Find a 95% confidence interval for the mean µ of the 12,000 chickens.

(b) Perform a test of significance to check a company’s claim (α = 0.05).


Example (Chicken)

Solution: We have 19 degrees of freedom and:

s = 0.63 n = 20 x̄ = 3.06 µ0 = 3.

95% Confidence Interval: x̄ ± t190.025 ×s√n − 1[

3.06− 2.093× 0.63√19, 3.06 + 2.093× 0.63√

19

]= [2.76, 3.36]

The hypotheses:H0 : µ = 3, Ha : µ 6= 3.

The test statistic:

t =x̄ − µ0

s√n−1

=3.06− 3

0.63√19

= 0.415.

P − value = 2P(t19 ≥ 0.415) > 2(0.10) ≥ 0.2


Example (Chicken)

We fail to reject H0 since P − value > α.Conclusion: We do not have enough statistical evidence at α = 0.05 levelto claim that the mean chicken weight is different from 3 pounds. So thedata supports the company’s claim.

NOTE: The computed 95% confidence interval contains µ0 = 3 (pounds)and the test of significance which uses α = 0.05 = 1− 0.95 level does notreject H0.

TRUE: If µ0 belongs to the 1− α confidence interval, then the α-leveltwo-sided test fails to reject H0.

TRUE: If the 1− α confidence interval does not contain µ0, then theα-level two-sided test rejects H0.


Two Types of Error in Tests of Significance

We either reject H0, or fail to reject H0 based on the data. We hope thatour decision is correct, but sometimes it will be wrong! There are twotypes of incorrect decisions:

TYPE I and TYPE II ERRORS

If we reject H0 when in fact H0 is true, this is TYPE I error.If we do not reject H0 when in fact Ha is true, this is TYPE II error.


Probabilities of Type I and Type II Errors

Type I error can be thought of as convicting an innocent person.

Type II error can be thought of as letting a guilty person go free.

Significance and Type I error.

From definition of the significance level α it follows that the probabilityof Type I error is equal to α. This explains why we want to choose α tobe small.

Power of the test and Type II error.

The power of the test is the probability that the test rejects H0 when Hais true. High power is desirable. The probability of the Type II error is1 minus the power of the test.


Practical Significance vs. Statistical Significance

Refer to Section 3 of Chapter 29 of the textbook.

Sometimes the difference is statistically significant but practicallyunimportant. The following example illustrates the point.

Example (Bulb)An engineer has designed an improved light bulb. The previous design hadan average lifetime of 1200 hours. Based on a sample of n = 2500 of thenew bulbs, the average lifetime was found to be x̄ = 1201. Take σ = 10(hours). Does a new bulb have greater lifetime?

Solution The hypotheses:

H0 : µ = 1200, Ha : µ > 1200.


Example (Bulb)

Test statistic:

z =x̄ − µ0

σ√n

=1201− 1200

10√2500

= 5.

The P - value:P − value = P(Z ≥ 5) ≈ 0.

We reject H0 and conclude that we have enough evidence that the newbulb is better. But how is it better? Is the lifetime increase of 1 hour for alight bulb really important?

REMARK: Statistical significance is easier to show with larger samplesizes n. Even a tiny difference between the true mean µ and thehypothesized mean µ0 will be evident if we choose large enough sample.


Example (Types of Error)

A medical researcher is working on a new treatment for a certain type ofcancer. The average survival time after diagnosis on the standardtreatment is 2 years. In an early trial, she tries the new treatment on threesubjects who have an average survival time after the diagnosis of 4 years.

Although the survival time has doubled, the results are not statisticallysignificant even at the 0.10 significance level. Suppose, in fact, that thenew treatment does increase the mean survival time in the population ofall patients with this particular type of cancer. What type of error, if any,has been committed?


Example (Types of Error)

Solution : The hypotheses are

H0 : µ = 2, Ha : µ > 2.

The results are not statistically significant means that we fail to rejectH0. But we know that the new treatment does increase the mean survivaltime which means that, in fact, Ha is true. So we failed to reject H0 whenin fact Ha is true. This is Type II error.

Comment: Having just 3 patients was not enough to prove a significanceof the result.


Concluding Remarks about Significance Tests

(i) We only discussed the one-sample significance tests for µ. Manyother significance tests exist. In fact, a test of significance can beconstructed for any population parameter or their difference. (seeChapter 27).

(ii) Different tests have different technical details (such as differenthypotheses, test statistics and rules for P-value computation), but allthe significance tests use the same steps and definitions, and have asimilar interpretation.

(iii) A chance model is required for a test of significance; a box model is atype of chance model.


Concluding Remarks about Significance Tests

(iv) A test of significance only determines if a difference is real or due tochance variation. It does not rank how important the difference is,explain what causes it, or check the validity of the study used toaccumulate data.

(v) The z-test and t-test are tests which compare the mean of a sampleto the mean established by an external standard.

(vi) The χ2-test, our next topic, compares observed and expectedfrequencies.


The Chi-Square Test

Often, we must ask the basic and necessary question :

How well does the model fit the facts?

In many cases, the answer is given by the χ2-test.

χ is a Greek letter.

It is often written as chi.

It is pronounced as ki as in kite.

The χ2-test compares observed and expected frequencies in determining ifa model is appropriate.


The Chi-Square Test: An Example

A gambler is accused of using a loaded die, but he pleads innocent. Arecord has been kept of the last 60 throws.

4 3 3 1 2 3 4 6 5 6

2 4 1 3 3 5 3 4 3 4

3 3 4 5 4 5 6 4 5 1

6 4 4 2 3 3 2 4 4 5

6 3 6 2 4 6 4 6 3 2

5 4 6 3 3 3 5 3 1 4

There is some disagreement about how to interpret the data and astatistician is called in. What is the verdict?



Solution: If the gambler is innocent, the numbers in the given tableshould be a result of drawing randomly from the a box containing numbers1 through 6. Therefore, each of the six numbers should appear in thetable approximately 10 times: the expected frequency is 10. To comparethis observation with what we have, we have to calculate the frequencydistribution:

Value Observed frequency Expected frequency

1 4 102 6 103 17 104 16 105 8 106 9 10

Sum 60 60

As we can observe, the table has too many 3’s and 4’s.



The standard error for the number of 3’s is√n · p(1− p) =

√60 · (1/6) · (5/6) ' 2.9.

Therefore, the observed number of 3’s is (17− 10)/2.9 ' 2.4 SE’s abovethe expected number.

However, we shouldn’t take the table one line at a time! For example,there are too many 4’s. But with many lines in the table, there is a highprobability that at least one of them will look suspicious - even if the die isfair! We need something more substantial to detect the fairness of the die.

The value χ2 is defined

χ2 = sum of(observed frequency− expected frequency)2

expected frequency.



The formula is not arbitrarily derived, as we shall see later. For now,

χ2 =(4− 10)2 + (6− 10)2 + · · ·+ (8− 10)2 + (9− 10)2

10= 14.2.

When the observed frequency is far from the expected frequency, the

corresponding term in the sum is large; when the two are close, thisterm is small.

Question: What is the chance that when a fair die is rolled 60 timesand χ2 is computed from the observed frequencies, its value turns out tobe 14.2 or more?

Note that larger values of χ2 would be even stronger evidence against themodel.


Karl Pearson

Calculating this chance is a tremendous undertaking! Back in the days ofKarl Pearson (1900’s) there were no computers.

He came up with a distribution to compute this probability by hand! Itinvolved a new curve, called the χ2-curve. There is one curve for eachnumber of degrees of freedom, analogous to the t-distribution. Moreover,if everything is specified, then

degrees of freedom = number of terms in χ2 − one.


The χ2-Curve

How does a χ2-curve look?


Properties of the χ2-Distribution

1. The χ2 is not symmetric unlike the Student t or the normaldistribution.

2. The values of χ2 can never be negative.

3. The χ2-distribution is different for each number of degrees offreedom, which is given by

df = n − 1,

where n is the number of categories.


The χ2-Table

This is only part of the full table; we have only highlighted the relevantpart that we will be making use of.Dr. Joseph Brennan (Math 148, BU) Part VIII - Tests of Significance 75 / 90


In our case, we need χ2 with 5 degrees of freedom. It follows from thetable that the probability of 14.2 or more is slightly more than 1%. If weare using modern day computational power, then we can get the answer isan instant :1.4%.In any event, the statistician’s work here is done! There is a strongevidence that the guy pleading innocence is actually a gambler and a fraud!


The χ2-Test

When testing a hypothesis on a trial with multiple categories (tickets), usethe χ2-test. The steps of the test are outlined below:

(i) Create the chance model (box model).

(ii) Create a frequency table consisting of observed frequency andexpected frequency for each category (ticket).

(iii) Compute the χ2-statistic.

(iv) Compute the degrees of freedom; the number of categories−1.(v) Obtain a P-value from the χ2-table and consider rejecting H0.


Example (Grand Juries)

A study of grand juries formed in Alameda County, California wasinvestigating if the age of jurors chosen is representative of the age of thepopulation. The size of a grand jury varies, but a total of 66 were sampled(representing 6 juries).

Age County Percentage Number of Jurors21 to 40 42 1541 to 50 23 1451 to 60 16 19

61 and over 19 18

Total: 100 66

Does the age composition of juries represent the county?



Solution: We use the χ2-test with a null hypothesis assuming juriesrepresent the age composition of the county. Since α was not specified, letα = 0.05.

(i) The box model:

(ii) The frequency table for 66 jurors:

Age Expected Observed21 to 40 27.7 1541 to 50 15.2 1451 to 60 10.6 19

61 and over 12.5 18

Total: 66 66



(iii) The χ2-statistic

χ2 =(15− 27.7)2

27.7+

(14− 15.2)215.2

+(19− 10.6)2

10.6+

(18− 12.5)212.5

≈ 15

(iv) There are 4 categories so 3 degrees of freedom.

(v) From the χ2-table, the P-value is less than 0.5% and approaches 0%.

We reject the null hypothesis as our significance level was assumed to be5%. We have a statistically significant sample pointing towards bias inchoosing older jurors.


Uses of χ2-Curve

There are several other uses of χ2-curve:

χ2-statistic can be used to test independence.

χ2-statistic can be used with any number of categories!

χ2-statistic can be used to test a claim about standard deviation .


Independent Experiments

If experiments are performed independently, the results can be pooled withseparate χ2-statistics and degrees of freedom.

Example: Assume experiment A is performed independently of experimentB.

Assume A has χ2 = 5.8 with 5 degrees of freedom and B has χ2 = 3.1with 2 degrees of freedom.

The combined experiment A + B has χ2 = 5.8 + 3.1 = 8.9 and 5 + 2 = 7degrees of freedom.


Independence Testing

The χ2-test is able to test for independence. This will be highlightedthrough examples.

The HANES study of 2, 237 Americans between the ages of 25 and 34recorded the gender and dominant hand of subjects.

Men Women Total:Right-Handed 934 1,070 2,004Left-Handed 113 92 205

Ambidextrous 20 8 28Total: 1,067 1,170 2,237

Assume that subjects were chosen in a simple random sample. From thissample, is dominant hand independent from gender?



We have a null hypothesis:

H0: Dominant Hand and Gender are Independent

We have an alternative hypothesis:

HA: Dominant Hand and Gender are Dependent

We do not know the population parameters with respect to dominant handand gender, only the information given by the sample. We have a largesample, so we will assume the population matches the sample.

Men WomenRight-Handed 87.5% 91.5%Left-Handed 10.6% 7.9%

Ambidextrous 1.9% 0.7%



Using H0, the hypothesis that hand dominance and gender areindependent, we are able to construct a table for observed and expectedfrequencies:

Observed Men Observed Women Expected Men Expected Women934 1,070 956 1,048113 92 98 10720 8 13 15

How many degrees of freedom are there? When testing forindependence in an m × n table, there are (m − 1)× (n − 1) degrees offreedom.

In this example we start with a 3× 2 table; 3 rows and 2 columns.Therefore, we have (3− 1) · (2− 1) = 2 degrees of freedom.



We now find the χ2-statistic:

χ2 =∑ (observed - expected)2

expected

=(934− 956)2

956+

(1, 070− 1, 048)21, 048

+(113− 98)2

98+

(92− 107)2107

+(20− 13)2

13+

(8− 15)215

χ2 ≈ 12

We have yet to set a confidence level, though the P-value for aχ2-statistic of 12 with 2 degrees of freedom is less than 0.5%.

The sample provides strong statistical evidence against the null hypothesis.Gender and dominant hand appears to be independent.


Z -Test or χ2-Test?

When should the χ2-test be used, as opposed to the z-test?

The z-test says whether the data are like the result of drawing atrandom from a box whose average is given.

The χ2-test says whether the data are like the result of drawing atrandom from a box whose contents are given.

The z-test deals with averages.

The χ2-test deals with frequencies from all categories; this test is morecomprehensive and deals with the balance expected from the model.


Example (from Statistics by Samuels et. al.)

A cross between white and yellow summer squash gave progeny of thefollowing colors:

COLOR WHITE YELLOW GREEN

No. of progeny 155 40 10

Question Are these date consistent with the 12 : 3 : 1 ratio predicted by acertain genetic model? (Use a χ2-test with α = 0.10.)


Example (Squash)

There are three categories involved here. According to given data

Observed Frequency Expected Frequency

White 155 (12/16) · 205 = 153.75Yellow 40 (3/16) · 205 = 38.44Green 10 (1/16) · 205 = 12.81

The χ2-statistic is

χ2 = sum of(expected frequency− observed frequency)2

expected frequency

=(155− 153.75)2

153.75+

(38.44− 40)238.44

+(12.81− 10)2

12.81= 0.689.


Example (Squash)

Recall that there is the null hypothesis (H0 : no change in ratio) versus(Ha : significant change in ratio).

We want to compare the area under χ2 with 2 = 3− 1 degrees of freedomwith α = 0.10.

The probability is given by

P(χ22 > 0.689) = P(χ22 > 0.689) is bigger than 0.10.

Therefore, we cannot reject the null hypothesis based on the data atα = 0.10.


part viii - tests of signi cancepeople.math.binghamton.edu/jbrennan/home/s13mat148/partviii.pdftest...

Documents