hypothesis testing

67
Hypothesi s Testing [email protected]

Upload: kartikharish

Post on 20-Nov-2014

435 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Hypothesis Testing

Hypothesis Testing

[email protected]

Page 2: Hypothesis Testing

Introduction

•A statistical hypothesis is an assumption about an

unknown population parameter.•It is a well defined procedure which helps us to decide

objectively whether to accept or reject the hypothesis

based on the information available from the sample.•In statistical analysis, we use the concept of probability to

specify a probability level at which a researchers

concludes that the observed difference between the sample

statistics and population parameter is not due to chance.

Page 3: Hypothesis Testing

Hypothesis Testing Procedure

Page 4: Hypothesis Testing

Step 1: - Set Null and Alternative Hypothesis

Step 2:-Determine the appropriate Statistical Test

Step 3: -Set the level of Significance

Step 4: -Set the Decision Rule

Step 6: - Analyze the data

Step 7: - Arrive at a statistical conclusion

Step 5: - Collect the Sample data

Page 5: Hypothesis Testing

• The null hypothesis is denoted by Ho, is the

hypothesis which is tested for the possible rejection

under the assumption that it is true.

• Theoretically, Ho is set as no difference considered

true, until and unless it is proved wrong by the

collected sample data.

• The alternative hypothesis is denoted by H1 or Hα, is

a logical opposite of the Ho.

Step 1: - Set Null and Alternative Hypothesis

Page 6: Hypothesis Testing

1. H0: = 0 versus

Ha: 0 (two-sided)

2. H0: 0 versus

Ha: < 0 (one-sided)

3. H0: 0 versus

Ha: > 0 (one-sided)

Often H0 for a one-sided test is written as H0: = 0. Remember a p-value is computed assuming H0 is true, and 0 is the value used for that computation.

Page 7: Hypothesis Testing

• After setting he hypothesis, the researches has to

decide on an appropriate statistical test that will be

tested for the statistical analysis.

• The statistic used in the study (mean, proportion,

variance etc.) must also be considered when a

researchers decides on appropriate statistical test,

which can be applied for hypothesis testing in order

to obtain the best results.

Step 2: - Determine the appropriate Statistical Test

Page 8: Hypothesis Testing

• The level of significance is denoted by α is the probability,

which is attached to a null hypothesis, which may be

rejected even when it is true.

• The level of significance also known as the size of the

rejection region or the size of the critical region.

• Level of significance must be determined before we draw

samples, so that the obtained result is free from the bias of a

decision maker.

• 0.01, 0.05, 0.010

Step 3: -set the level of significance

Page 9: Hypothesis Testing

• Next step is to establish a critical region, which is

the area under the normal curve . These regions are

termed as acceptance region (when the Ho is

accepted) and the rejection region or critical region.

• If the computed value of the test statistic falls in the

acceptance region , the null hypo is accepted .

• Otherwise Ho is rejected.

Step 4: - Set the decision Rule

Page 10: Hypothesis Testing

• In this stage data are collected and appropriate

sample statistics are computed.

• The first 4 steps should be completed before

collecting the data for the study.

Step 5: - Collect the sample data

Page 11: Hypothesis Testing

• In this step the researcher has to compute the test

statistic. This involves selection of appropriate

probability distribution for a particular test.

• For Example- When the sample is small, then t-

distribution is used. If sample size is large then use

Z-test.

• Some commonly used testing procedures are F, t, Z,

chi square.

Step 6: - Analyze the data

Page 12: Hypothesis Testing

• In this step the researcher draw a conclusion. A

statistical conclusion is a decision to accept or reject

a Ho. This depends whether the computed statistic

falls in the acceptance region or rejection region.

Step 7: - Arrive at a statistical conclusion

Page 13: Hypothesis Testing

Critical Region

The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject the null hypothesis.

Page 14: Hypothesis Testing

Significance Level

The significance level (denoted by ) is the probability that the test statistic will fall in the critical region when the null hypothesis is actually true. Common choices for are 0.05, 0.01, and 0.10.

Page 15: Hypothesis Testing

Critical Value

A critical value is any value separating the

critical region (where we reject the H0) from

the values of the test statistic that does not

lead to rejection of the null hypothesis, the

sampling distribution that applies, and the

significance level . For example, the critical

value of z = 1.645 corresponds to a

significance level of = 0.05.

Page 16: Hypothesis Testing

Two-tailed,Right-tailed,

Left-tailed Tests

The tails in a distribution are the extreme regions bounded by critical values.

Page 17: Hypothesis Testing

Two-tailed TestH0: =

H1:

is divided equally between the two tails of the critical

region

Means less than or greater than

Page 18: Hypothesis Testing

Right-tailed Test

H0: =

H1: > Points Right

Page 19: Hypothesis Testing

Left-tailed Test

H0: =

H1: <

Points Left

Page 20: Hypothesis Testing

Conclusions in Hypothesis Testing

We always test the null hypothesis.

1. Reject the H0

2. Fail to reject the H0

Page 21: Hypothesis Testing

Accept versus Fail to Reject

Some texts use “accept the null hypothesis.”

We are not proving the null hypothesis.

The sample evidence is not strong enough to warrant rejection (such as not enough evidence to convict a suspect).

Page 22: Hypothesis Testing

Reject H0 if the test statistic falls within the critical region.

Fail to reject H0 if the test statistic does not fall within the critical region.

Decision Criterion

Page 23: Hypothesis Testing

The Two Types of Errors and Their Probabilities

When the null hypothesis is true, the probability of a type 1 error, the level of significance, and the -level are all equivalent.

When the null hypothesis is not true, a type 1 error cannot be made.

Page 24: Hypothesis Testing

Type I ErrorA Type I error is the mistake of

rejecting the null hypothesis when it is true.

The symbol (alpha) is used to represent the probability of a type I error.

Page 25: Hypothesis Testing

Type II ErrorA Type II error is the mistake of failing

to reject the null hypothesis when it is false.

The symbol (beta) is used to represent the probability of a type II error.

Page 26: Hypothesis Testing

Type I and Type II Errors

Page 27: Hypothesis Testing

Controlling Type I and Type II Errors

For any fixed , an increase in the sample size

n will cause a decrease in

For any fixed sample size n , a decrease in will cause an increase in . Conversely, an increase in will cause a decrease in .

To decrease both and , increase the sample size.

Page 28: Hypothesis Testing

DefinitionPower of a Hypothesis Test

The power of a hypothesis test is the probability (1 - ) of rejecting a false null hypothesis, which is computed by using a particular significance level and a particular value of the population parameter that is an alternative to the value assumed true in the null hypothesis.

Page 29: Hypothesis Testing

Trade-Off in Probability for Two Errors

There is an inverse relationship between the probabilities of the two types of errors.Increase probability of a type 1 error =>

decrease in probability of a type 2 error

Page 30: Hypothesis Testing

Type 2 Errors and Power

Three factors that affect probability of a type 2 error1. Sample size; larger n reduces the probability of a type 2

error without affecting the probability of a type 1 error.

2. Level of significance; larger reduces probability of a type 2 error by increasing the probability of a type 1 error.

3. Actual value of the population parameter; (not in researcher’s control. Farther truth falls from null value (in Ha direction), the lower the probability of a type 2 error.

When the alternative hypothesis is true, the probability of making the correct decision is called the power of a test.

Page 31: Hypothesis Testing

Z-Test (For large Samples)

• Hypothesis testing for large samples is based on the

assumption that the population, from which the

sample is drawn, is normal. As a result the sampling

distribution of the mean is also normally distributed.

• Even when the population is not normally

distributed the sampling distribution of mean for a

large sample size is normally distributed.

Page 32: Hypothesis Testing

Formula for single population mean

n

xz

Where, µ =population mean,

σ = population standard Deviation,

n = sample size and

x bar is sample mean.

Page 33: Hypothesis Testing

Example• A marketing research firm conducted a survey 10 years ago

and found that the avg. household income of a particular

region is Rs. 10000. Mr. X, Who recently joined the firm as

vice president has expressed doubts about the accuracy of

the data. For verifying the data, the firm has decided to take

a random sample of 200 households that yields a sample

mean (income) of Rs. 11000. Assume that the population

S.D. of the household income is Rs. 1200. Verify Mr. X’s

doubts using the 7 step hypothesis testing. Let α=0.05.

Page 34: Hypothesis Testing

SolutionStep-1: Set null and alternative hypothesis

Here researcher is trying to verify whether there is any

change in the avg. household income within 10

years. The Ho is set as no difference, i.e. the avg.

household income has not changed.

Ho: µ=10000, H1: µ#10000

Step-2: Determine Statistical Test

Sample size is > than 30, and the sample mean is used

as statistic. So Z formula is used

n

xz

Page 35: Hypothesis Testing

SolutionStep-3: Set the level of significance (α)

It is known as size of the rejection region or size of

the critical region. Here it is 0.05

Step-4: Set the decision rule

H1 shows that we have two-tailed test (means

household income can be less than or more than

10000 Rs.) and the level of significance is 0.05. The

acceptance region covers 95% of the area and the

rejection region covers the remaining 5% of the

area at the two ends of the distribution. (Zα/2=1.96)

Page 36: Hypothesis Testing

0

0.025 0.025

1-.05=0.95

+1.96-1.96

0.47500.4750

Page 37: Hypothesis Testing

SolutionStep-5: Collect the sample data

Sample size is 200=n, and Sample Statistic (Sample

Mean) = 11000.

Step-6: Analyze the data.

n=200, µ=10000, σ =1200, x bar or mean = 11000

79.11

200

1200

1000011000

z

Page 38: Hypothesis Testing

SolutionStep-7:Arrive at a Statistical Conclusion

Calculated value of Z is 11.79

And tabulated value of Z is 1.96

As Calculated > Tabulated , We reject the null Hypo

Means Mr. X’s doubts about this average

household income was right.

Page 39: Hypothesis Testing

Example / Case Study

• A soft drink company produces 2 liters bottles of

one of its popular drinks. The quality control

department is responsible for verifying that each

bottle contains exactly 2 liters of soft drink. The

result of a random check of 40 bottles undertaken by

the quality control officer are given in table.

Use 0.01 level of significance to test whether each

bottle contains exactly 2 liters of soft drink

Page 40: Hypothesis Testing

Bottle Sr. No.

Quantity in liters

1 1.97

2 1.98

3 1.99

4 2.01

5 2.02

6 2.03

7 2.01

8 1.97

9 1.96

10 2.04

11 2.00

12 2.01

13 2.02

14 1.99

15 2.00

16 1.97

Bottle Sr. No.

Quantity in liters

17 1.98

18 2.03

19 1.98

20 1.99

21 2.01

22 2.05

23 2.03

24 2.04

25 2.01

26 1.97

27 1.98

28 1.99

29 1.98

30 2.03

31 2.01

32 1.99

Bottle Sr. No.

Quantity in liters

33 1.97

34 1.96

35 2.02

36 2.03

37 2.04

38 1.98

39 1.99

40 2.01

Page 41: Hypothesis Testing

When population S.D. is unknown. The population

S.D. (σ) is replaced by sample S.D. (s).

Then

The above formula of Z is based on the assumption

that the sample is drawn from an infinite population.

If population is finite then Z formula is

n

sx

z

1.

NnN

n

xz

Page 42: Hypothesis Testing

Confidence Level (1-α)%

α One-tailed test Two-tailed test

90% 0.10 1.28 1.645

95% 0.05 1.645 1.96

99% 0.01 2.33 2.575

Page 43: Hypothesis Testing

Hypo Testing for a Population Proportion

In business research information is generally expressed in

terms of proportions. For ex. Market share of a company,

quality defects, consumer preferences etc.

This kind of data is highly dynamic in nature. Business

researchers sometimes want to test the hypothesis about

such proportions.

Formula

npq

ppz

Where, n= sample size, p = population proportion, q=1-p and p bar is sample proportion

Page 44: Hypothesis Testing

Example

The production manager of a company that manufactures

electric heaters believes that atleast 10% of the heaters are

defective. For testing his belief, he takes a random sample

of 100 heaters and finds that 12 heaters are defective. He

takes the level of significance at 5% for testing the

hypothesis. ( table value of Z at 5% is 1.96)

Sol.

Ho : p=0.10

H1 : p#0.10 (two-tailed test)

Example / Case Study

Page 45: Hypothesis Testing

p= population proportions = 10/100=0.10

q= 1-p = 1- 0.10 = 0.90

p bar = sample proportion = 12/100 = 0.12

n = 100

67.0

1000)(0.10)(0.9

0.1012.0

npq

ppz

Ho is accepted as Z calculated is < then Z tabulated (0.67<1.96)

Page 46: Hypothesis Testing

Hypo Testing for the difference between two means usingFormula

When Population S.D. are not given

2.Population ofMean sample2. ofMean x

1.Population ofMean sample1. ofMean x

2.Population of S.D. σ 2. sample of size then

1.Population of S.D. σ 1. sample of size then

μμxxz

22

11

22

11

2

2

2

1

2

1

2121

sample2. of S.D. s

sample1. of S.D. s

ns

ns

μμxxz

2

1

2

2

2

1

2

1

2121

Page 47: Hypothesis Testing

The means of two single large sample of 1000 and 2000

members are 67.5 inches and 68.0 inches respectively. Can

the samples be regarded as drawn from the same population

of S.D. 2.5 inches.(Z value at 5% is 1.96)

Sol.

Ho : (samples are from the same population

H1 :2121 or 0

21#

Example / Case Study

Page 48: Hypothesis Testing

1.55.1-Z

-5.1

2000)5.2(

1000)5.(2

00.865.67

μμxxz

0.86x 67.5. x

2.5 σ .0002n 2.5 σ .0001n

22

2

2

2

1

2

1

2121

21

2211

As Z calculated is > Z Tabulated .

So Ho is rejected. W conclude that samples are not drawn from the same population.

Page 49: Hypothesis Testing

Small Sample testing

When we have sample size < 30. to test the hypothesis about

population parameter we use t-test.

The t-statistic is a standardized score for measuring the difference between the sample mean and the null hypothesis value of the population mean:

ns

xt 0

error standard

valuenullmean sample

This t-statistic has (approx) a t-distribution with df = n - 1.

Page 50: Hypothesis Testing

Royal tyres has launched a new brand of tyres for

tractors and claims that under normal circumstances

the average life of the tyres is 40000 km. A retailer

wants to test this claim and has taken a random

sample of 8 tyres. The results obtained are presented

in the table. He tests the life of the tyres under

normal conditions Test the hypothesis at 5% . (t

value with 7 d.f. at 5% is 2.365).

Example / Case Study

Page 51: Hypothesis Testing

• He tests the life of the tyres under normal conditions and found a mean life of 39750 km. and S.D. (s) 2618.61.

Tyres Km

1 35000

2 38000

3 42000

4 41000

5 39000

6 41500

7 43000

8 38500

Page 52: Hypothesis Testing

Sol.

Ho : µ = 40000

H1 : µ # 40000X bar = 39750,

S = 2618.61

N = 8

27.0

82618.61

4000039750

ns

μxt

As t calculated is < t tabulated

We accept Ho

Page 53: Hypothesis Testing

Hypo Testing for difference between two population means (small samples)

2121

2

2

21

2

1

2121

n1

n1

2nn1)(ns1)(ns

μμxxt

With degrees of freedom n1+n2 -1

Page 54: Hypothesis Testing

Example

XYZ constructions is a leading company in the

construction sector in India. It wants to construct

flats in Raipur & Dehradun, the capitals of the

newly formed states of Chattisgarh and Uttarakhad,

respectively. The company wants to estimate the

amount that customers are willing to spend on

purchasing a flat in the two states. It randomly

selected 25 potential customers from Raipur and 27

Page 55: Hypothesis Testing

from Dehradun and posed the question, “how much

areyou willing to spend on a flat?” The data

collected from the two cities is shown in tables.

The company assumes that the intention to purchase of

the customers is normally distributed with equal

variance in the two cities taken for the study. On the

basis of the samples taken for the study, estimate the

difference in population means taking 95% as the

confidence level.

Page 56: Hypothesis Testing

Proposed Exp. On flats by customers from Raipur (in thousand Rs.)

125 165

130 170

126 130

127 145

150 130

135 140

140 150

160 160

120 140

150 145

155 165

145

140

165

135

130

Proposed Exp. On flats by customers from Dehradun (in thousand Rs.)

185 135

165 185

160 180

170 190

180 145

190 160

170 170

150 180

155 145

160

145

150

155

160

145

140

Page 57: Hypothesis Testing

Statistical Inference about the difference between the means of two related population (matched samples)

Page 58: Hypothesis Testing

Chi-Square Test for Categorical data

Categorical Data: It is defined as the counting of frequencies from one or more variables.

Chi square test is the category of non-parametric test. i.e. here we are not sure about the population distribution (whether it is normal or not). The statistical tests that do not require prior knowledge about the population are termed as non-parametric tests.

Page 59: Hypothesis Testing

Product

Age

Mobile banking

Internet Banking

Personal Banking

Row Total

17-27 125 175 145 445

28-35 155 180 197 532

36-44 167 210 150 527

45-57 146 156 142 444

58-70 133 156 176 465

Column Total 726 877 810 2413

Preference of type of banking across different age groups

Page 60: Hypothesis Testing

sFrequencie Expected f

sFrequencie Observed f

,f

ffχ

is StatisticTest χ

1

0

e

e02

2

Page 61: Hypothesis Testing

Conditions for applying chi square

• The sample should consist of at least 50

observations and should be drawn randomly from

the population.

• The individual observation in a sample should be

independent to each other.

• Data should not be presented in the % or ratio form,

rather they should be expressed in original units.

• Sum of the frequencies must be 5 or more.

Page 62: Hypothesis Testing

Goodness of fit Test2χ

This test compares the theoretical frequencies with

the observed frequencies to determine the difference

between theoretical and observed frequencies.

Q.- Five coins are tossed 3200 times and the

following results are obtained. Test the

hypothesis that coins are biased at 5%. (chi

square value at 5% with 5 d.f. is 11.070

Page 63: Hypothesis Testing

No. of Heads 0 1 2 3 4 5

Frequency 80 570 1100 900 500 50

Ho : coin is unbiased

Let p = the probability of getting a head = ½

From binomial distribution, expected frequencies are:

F(x) = N.P(x), wherexnxqp

(x)!x)!-(n

n! P(x)

Page 64: Hypothesis Testing

100 F(5) 500, F(4)

1000 F(3) 1000, F(2) ,500)1(.3200)1(

100)0(

(1/2)(1/2)(5)!0)!-(5

5!3200. 3200.P(0) head 0 ofFrequecny

qp(x)!x)!-(n

n! P(x)

1/2 p-1 q 1/2,p 5,n 3200,N

xnx

xnx

PF

similarly

F

Page 65: Hypothesis Testing

No. of Heads

Observed Frequency (fo)

Expected Frequency (fe)

(fo-fe) (fo-fe)

0

1

2

3

4

5

80

570

1100

900

500

50

100

500

1000

1000

500

100

400

4900

10000

10000

0

2500

4

9.8

10

10

0

25

58.8

Page 66: Hypothesis Testing
Page 67: Hypothesis Testing

Example• A marketing research firm conducted a survey 4 years ago

and found that the avg. household income of a particular

region is Rs. 10050. Mr. X, Who recently joined the firm as

vice president has expressed doubts about the accuracy of

the data. For verifying the data, the firm has decided to take

a random sample of 200 households that yields a sample

mean (income) of Rs. 11000. Assume that the population

S.D. of the household income is Rs. 1500. Verify Mr. X’s

doubts using the 7 step hypothesis testing. Let α=0.05.