11.1 – significance tests: the basics purpose: to assess the evidence provided by data about some...

11.1 – Significance Tests: The Basics

Purpose: to assess the evidence provided by data about some claim covering a population.

Basic Idea: an outcome that would rarely happen if a claim were true is good evidence that the claim is not true.

A significance test is a formal procedure for comparing observed data with a hypothesis whose truth we want to access. The hypothesis is a statement about a population parameter, like the population mean µ or population proportion p.

The statement being tested in a significance test is called the null hypothesis, Ho. The significance test is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of ‘no effect’, ‘no difference’, or no change from historical values. It is the status quo.

A significance test assesses the evidence provided by data against a null hypothesis, Ho, in favor of the alternative hypothesis, Ha. The claim about the population we are trying to find evidence for is the alternative hypothesis.

Hypotheses always refer to some population, not to a particular outcome. Be sure to state Ho and Ha in terms of a population parameter. We are trying to make a statement about an unknown population value and don’t need to make hypothesis about things we already know.

The alternative hypothesis should express the hopes or suspicions we have before we see the data. It is cheating to first look at the data and then frame Ha to fit what the data show.

Example:

Because of variation in the manufacturing process, tennis balls produced by a particular machine did not have identical diameters. Let µ denote the true average diameter for tennis balls currently being produced.

Suppose that the machine was initially calibrated to achieve the design specification µ = 3 inches. The manufacturer is now concerned that the diameters no longer conform to the specification (i.e., µ ≠ 3 inches must now be considered a possibility). If sample evidence suggests that µ ≠ 3 inches, the production process will have to be halted while the machine is recalibrated. Because stopping production is costly, the manufacturer wanted to be quite sure that µ ≠ 3 before undertaking recalibration.

Under these circumstances, a sensible choice of hypothesis is: Ho: µ = 3 (the spec is being met, recalibration is unnecessary) Ha: µ ≠ 3 (the spec isn’t being met, recalibration is necessary)

Only compelling sample evidence would then result in Ho being rejected.

Stating Ha is not always straight-forward. It is not always clear – whether Ha should be 1- or 2-sided.

Example 11.3 on page 162 in your book concerns a job satisfaction survey. The parameter of interest is the mean µ of the difference in scores. The authors of the study wanted to know if the 2 work conditions have different levels of job satisfaction. They did not specify the direction of the difference. Here the null hypothesis will be Ho = 0. Since the direction of the difference was not specified, it could be µ < 0 or µ > 0. For simplicity it is just written as Ha ≠ 0.

Do Now: Exercise 11.4, page 693

Each of the following situations calls for a significance test. State the appropriate Ho and Ha in each case. Be sure to define your parameter each time.

a) Larry’s car averages 26 miles per gallon on the highway. He switches to a new brand of motor oil that is advertised to increase gas mileage. After driving 3000 highway miles with the new oil, he wants to determine if the average gas mileage has increased.

b) A May 2005 Gallup Poll report on a national survey of 1028 teenagers revealed that 72% of teens said they rarely or never argue with their friends. You wonder whether this national result will be true in your school. You conduct your own survey of a random sample of students at your school.

Solution:

a) µ = the mean gas mileage for Larry’s car on the highway Ho: µ = 26 mpg Ha: µ > 26 mpg

b) p = proportion of teens in your schools who rarely or never fight with their friends. Ho: p = 0.72 Ha: p ≠ 0.72

Verify 3 conditions are met before you begin your calculations: SRS, Normality, independence.

Normality check for means: population distribution is Normal, or large sample size (n≥30).

Normality check for proportion: np ≥ 10 and n(1 – p) ≥ 19.

A test is based on a test statistic. A test statistic is the function of sample data on which a conclusion to reject or fail to reject Ho is based. Some principles that apply to most tests:• The test is based on a statistic that compares the value of the parameter as stated in the null hypothesis with an estimate of the parameter from the sample data.• Values of the estimate far from the parameter value in the direction specified by the alternative hypothesis give evidence against Ho.• To assess how far the estimate is from the parameter, standardize the estimate. In many common situations, the test statistic has the form: z = estimate – hypothesize value standard deviation of the estimate

The P-value is a quantitative measure of just how unlikely a given finding is, assuming the null hypothesis is true. We may compare this value to a significance level in order to decide whether or not a finding is significantly different from what was expected. P-value is a measure of the rarity of a finding.

P-value is the probability (computed supposing Ho to be true) that the test statistic will take a value at least as extreme as that actually observed. It is the conditional probability of observing results at least as extreme as ours if Ho were true.

We can compare the P-value with a value we regard as decisive. The decisive value of P is called a significance level. It is written as the Greek letter alpha, . The most commonly used significance level is = 0.05; but, it may be preferable to choose a different level based on the situation.

Small P-values indicate strong evidence against Ho. Calculating P-values requires knowledge of the sampling distribution of the test statistic when Ho is true.

There is a four step process to test hypotheses. Each step has several requirements and it will help you organize your answers so that you don’t lose points on the AP exam.

1. Hypotheses a) Write the null hypothesis b) Write the alternative hypothesis – is it one sided or two? Are we interested in the upper tail or lower or both?

11.2 – Carrying out significance tests

2. Model a) Which is the correct inference procedure. There are many b) List assumptions and check assumptions. If you are checking np show the result – don’t just say its greater than 10. c) Name the test.

3. Mechanics a) Write down the statistics and use proper notation. b) Draw a curve depicting the model – mark the hypothesized parameter and the observed statistic, shade the appropriate tail. c) Calculate the value of the test statistic. It could be z, t, χ2. Show the formula; substitute all the proper values; and, give the final result. (You can do the calculations in the calculator and just show the results). d) Find the P-value. Often you will be able to use the TI and copy down the P-value.

4. Conclusion a) Link the P-value to the decision. You need to be clear how the calculated P-value led to your decision. b) State the decision about the null hypothesis. Either you reject it or fail to reject it – never accept it. c) Interpret the decision in the proper context.

Confidence intervals and two-sided significance tests are closely connected, provided that the significance level for the test and the confidence level for ht interval add to 100%.

11.3 – Use and abuse of tests

Points to keep in mind when using or interpreting significance tests.

1. Choosing a level of Significance How small a P-value is convincing evidence against H0? If Ho represents an assumption that people have believed for years, strong evidence will be needed (small P). What are the consequences of rejecting Ho?

2. Statistical Significance and Practical Importance Statistical significance is not the same thing as practical importance (see ex. 11.44, pg 722). Pay attention to the actual data as the P-value.

3. Don’t Ignore Lack of Significance There is a tendency to infer that there is no effect whenever a P-value fails to attain the usual 5% standard. In some areas of research small effects that are detectable only with large sample sizes can be of great practical significance. When planning a study verify that the test you plan to use has a high probability of detecting an effect of the size you hope to find.

4. Statistical Inference is not valid for all sets of data. Badly designed experiments or surveys often produce invalid results.

5. Beware of Multiple Analyses Statistical significance ought to mean that you have found an effect that you were looking for.

11.4 Using Inference to Make Decisions

Type 1 error: Reject H0 when H0 is actually trueType 2 error: Fail to reject H0 when H0 is false.

Which error is the more serious depends upon the circumstances.

The significance level of any fixed level test is the probability of a Type 1 error. That is, is the probability that the test will reject the null hypothesis H0 when H0 is in fact true.

The probability that a fixed level significance test will reject H0 when a particular alternative value of the parameter is true is call the power of the test against the alternative.

The power of a test against any alternative is 1 minus the probability of a Type 2 error for that alternative; that is, power = 1 - β

Increasing the size of the sample increase the size of the power (reduces the probability of Type 2 error) when the significance level remains fixed. We can also increase the power of a test by using a higher significance level.

11.1 – significance tests: the basics purpose: to assess the evidence provided by data about some...

Documents