statistical inference: confidence intervals and hypothesis testing

44
Statistical inference: confidence intervals and hypothesis testing

Upload: julie-houston

Post on 16-Jan-2016

247 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical inference: confidence intervals and hypothesis testing

Statistical inference: confidence intervals and hypothesis

testing

Page 2: Statistical inference: confidence intervals and hypothesis testing

Objective

The objective of this session is Inference statistic Sampling theory Estimate and confidence intervals Hypothesis testing

Page 3: Statistical inference: confidence intervals and hypothesis testing

Statistical analysis

Descriptivecalculate various type of descriptive statistics in order to summarize certain quality of the data

Inferential

use information gained from the descriptive statistics of sample data to generalize to the characteristics of the whole population

Page 4: Statistical inference: confidence intervals and hypothesis testing

Inferential statistic application

2 broad areas Estimation create confidence intervals to estimate the true

population parameter

Hypothesis testing test the hypotheses that the population parameter has

a specified range

Page 5: Statistical inference: confidence intervals and hypothesis testing

Population & Sample

X

population sample

s

mean:

standard deviation:

Page 6: Statistical inference: confidence intervals and hypothesis testing

Sampling theory

When working with the samples of data we have to rely on sampling theory to give us the probability distribution pertaining to the particular sample statistics

This probability distribution is known as

“the sampling distribution”

Page 7: Statistical inference: confidence intervals and hypothesis testing

Sampling distributions

Assume there is a population …

Population size N=4

Random variable, X,

is age of individuals

Values of X: 18, 20,

22, 24 measured in

years A

B C

D

Page 8: Statistical inference: confidence intervals and hypothesis testing

Sampling distributions

Summary measures for the Population Distribution

1

2

1

18 20 22 2421

4

2.236

N

ii

N

ii

X

N

X

N

.3

.2

.1

0

A B C D (18) (20) (22) (24) Population mean Distribution

P(X)

Page 9: Statistical inference: confidence intervals and hypothesis testing

Sampling distributions

Summary measures of sampling distribution

1

2

1

2 2 2

18 19 19 2421

16

18 21 19 21 24 211.58

16

N

ii

X

N

i Xi

X

X

N

X

N

Page 10: Statistical inference: confidence intervals and hypothesis testing

Properties of summary measures

Sampling distribution of the sample arithmetic mean

Sampling distribution of the standard deviation of the sample means

nSE

X

nsNX /,~ 2

Page 11: Statistical inference: confidence intervals and hypothesis testing

Estimation and confidence intervals

Estimation of the population parameters: point estimates confidence intervals or interval estimators

Confidence intervals for: Means Variance

Large or Small samples ???

Page 12: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30) apply Z-distribution

Pro

bab

ility

dis

trib

utio

n

1

confidence interval

2/ 2/

Page 13: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30) From the normally distributed variable, 95% of

the observations will be plus or minus 1.96 standard deviations of the mean

Page 14: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30) The confident interval is given as

n

s96.1

95% confidence interval

-1.96 SE +1.96 SE

Pro

bab

ility

dis

trib

utio

n

2.5% in tail

2.5% in tail

Page 15: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30)

95% confidence interval

-1.96 SE +1.96 SE

Pro

bab

ility

dis

trib

utio

n

2.5% in tail

2.5% in tail

95.096.196.1

n

sX

n

sXp

Page 16: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30) Thus, we can state that:

“the sample mean will lie within an interval plus or minus 1.95 standard errors of the population mean 95% of the time”

Page 17: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30)

Example

we have data on 60 monthly observations of the returns to the SET 100 index. The sample mean monthly return is 1.125% with a standard deviation of 2.5%. What is the 95% confidence interval mean ???

Page 18: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30)

Example (cont’d) Standard error is calculated as

the confidence interval would be

The probability statement would be

3227.060

5.2SE

95.07575.14925.0 p

7575.14925.0

6325.0125.16325.0125.1

Page 19: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

large samples (n >= 30)

Example (cont’d) The probability statement would be

How does the analyst use this information ???

95.07575.14925.0 p

Page 20: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

What about small samples (n < 30) apply t-distribution

2 1 2 1

Pro

bab

ility

dis

trib

utio

n

Page 21: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

What about small sample ??? (n < 30) Apply t-distribution The confidence interval becomes

The probability statement pertaining to this confidence interval is

n

StX

n

StX n

nn

n1

2/,11

2/,1

11

2/,11

2/,1n

StX

n

StXp n

nn

n

Page 22: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for means

Example From 20 observations, the sample mean is calculated as

4.5%. The sample standard deviation is 5%. At the 95% level of confidence:

the confidence interval is …

the probability statement is …

Page 23: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for variances

Apply a distribution The confidence interval is given as

The probability statement pertaining to this confidence interval is

2/,12/1,1

2

22

2

2 )1()1(

nn

snsn

2

1)1()1(

2/,12/1,1

2

22

2

2

nn

snsnp

Page 24: Statistical inference: confidence intervals and hypothesis testing

Confidence intervals for variances

Example From a sample of 30 monthly observations the variance

of the FTSE 100 index is 0.0225. With n-1 = 29 degrees of freedom (leaving 2.5% level of significant in each tail)

the confidence interval is …

the probability statement is …

Page 25: Statistical inference: confidence intervals and hypothesis testing

Hypothesis testing

2 Broad approaches Classical approach P-value approach

is an assumption about the value of a population parameter of the probability distribution under consideration

Page 26: Statistical inference: confidence intervals and hypothesis testing

Hypothesis testing

When testing, 2 hypotheses are established the null hypothesis the alternative hypothesis

The exact formulation of the hypothesis depends upon what we are trying to establish

e.g. we wish to know whether or not a population parameter, , has a value of 0

01

00

:

:

H

H

Page 27: Statistical inference: confidence intervals and hypothesis testing

Hypothesis testing

How about we wish to know whether or not a population parameter, , is greater than a given figure , the hypothesis would then be …

And if we wish to know whether or not a population parameter is greater than a given figure , the hypothesis would then be …

0

0

Page 28: Statistical inference: confidence intervals and hypothesis testing

The standardized test statistic

In hypothesis testing we have to standardizing the test statistic so that the meaningful comparison can be made with the Standard normal (z-distribution) t-distribution distribution

The hypothesis test may be One-tailed test Two-tailed test

2

MEAN

VARIANCE

Page 29: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

Two-tailed test of the mean Set up the hypotheses as

Decide on the level of significance for the test (10, 5, 1% level etc.) and establish 5, 2.5, 0.5% in each tail

Set the value of in the null hypothesis Identify the appropriate critical value of z (or t) from the tables

(reflect the percentages in the tails according to the level of significance chosen)

01

00

:

:

H

H

0

Page 30: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

Two-tailed test of the mean Applying the following decision rule:

Accept H0 if

Reject H0 if otherwise

zns

Xz

/2

0

Page 31: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

Example Consider a test of whether or not the mean of a portfolio

manager’s monthly returns of 2.3% is statistically significantly different from the industry average of 2.4%. (from 36 observations with a standard deviation of 1.7%)

Page 32: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

Example An analyst claims that the average annual rate of return generated by a technical stock

selection service is 15% and recommends that his firm use the services as an input for its research product. The analyst’s supervisor is skeptical of this claim and decides to test its accuracy by randomly selecting 16 stocks covered by the service and computing the rate of return that would have been earned by following the service’s recommendations with regards to them over the previous 10-year period. The result of this sample are as follows: The average annual rate of return produced by following the service’s advice on the 16 sample

stocks over the past 10 years was 11% The standard deviation in these sample results was 9%

Determine whether or not the analyst’s claim should be accepted or

rejected at the 5% level of significant ???

Page 33: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

One-tailed test of the mean (Right-tailed tests) Set up the hypotheses as

Applying the following decision rule:

Accept H0 if

Reject H0 if

01

00

:

:

H

H

zns

X

/2

0

zns

X

/2

0

Page 34: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

Example If we wish to test that the mean monthly return on the

FTSE 100 index for a given period is more than 1.2. From 60 observations we calculate the mean as 1.25% and the standard deviation as 2.5%.

Page 35: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

Example We wish to test that the mean monthly return on the

S&P500 index is less than 1.30%. Assume also that the mean return from 75 observations is 1.18%, with a standard deviation of 2.2%.

Page 36: Statistical inference: confidence intervals and hypothesis testing

Hypothesis test of the population mean

Two-tailed test Applying the following decision rule:

Accept H0 if

Reject H0 if otherwise

One-tailed test Applying the following decision rule:

Accept H0 if

Reject H0 if

)2/(2

20

2

))2/(1(2 )1(

sn

2)1(2

0

2)1(

sn

2)1(2

0

2)1(

sn Left or right

tailed test ???

How ‘bout the other ???

Page 37: Statistical inference: confidence intervals and hypothesis testing

Hypothesis testing of the variance

Two-tailed test The standardized test statistic for the population

variance is

This standardized test statistic has a distribution

20

2)1(

sn

2

Page 38: Statistical inference: confidence intervals and hypothesis testing

Hypothesis testing of the variance

Example If we wish to test the variance of share B is below 25.

The sample variance is 23 and the number of observation is 40

Page 39: Statistical inference: confidence intervals and hypothesis testing

The p-value method of hypothesis testing

The p-value is the lowest level of significance at which

the null hypothesis is rejected If the p-value ≥ the level of significance (α)

accept null hypothesis If the p-value < the level of significance (α)

reject null hypothesis

otherwiseHreject

valuepifHaccept

0

0

Page 40: Statistical inference: confidence intervals and hypothesis testing

Calculation the p-value

If we wish to find an investment give at least 13.2%. Assume that the mean annualized monthly return of a given bond index is 14.4% and the sample standard deviation of those return is 2.915%, there were 30 observations an the returns are normally distributed.

Page 41: Statistical inference: confidence intervals and hypothesis testing

Calculation the p-value

The test statistic is:

With degree of freedom = 29 a t-value of 2.045 leaves 2.5% in the taila t-value of 2.462 leaves 1% in the tail

2.13:

2.13:

1

0

H

H

255.230/915.2

2.134.14

/2

0

ns

X

Page 42: Statistical inference: confidence intervals and hypothesis testing

Calculation the p-value

Calculate p-value from interpolation

P-value = 0.025 – (0.50 x (0.025 – 0.01) = 0.0175 = 1.75%

P-value (1.75%) < α (5%), thus reject null hypothesis

50.0417.0

21.0

045.2462.2

045.2255.2

Page 43: Statistical inference: confidence intervals and hypothesis testing

Conclusion

Meaning of statistical inference Sampling theory Application of statistical inference Confidence intervals Estimation Hypothesis testing Two-tailed One-tailed

means variance

Z-distribution t-distribution

X2-distribution

Page 44: Statistical inference: confidence intervals and hypothesis testing

Conclusion Under the following circumstances: The Appropriate Reliability Factor

for Determining Confidence Intervals for a Population Mean is:

1. The data in the population are normally distributed with a known standard deviation.

Z-value

2. The data in the population are normally distributed, there standard deviation is unknown, but can estimated from sample data.

T-valueHowever, a Z-value can be used as an approximation of the t-value, if the sample is large.

3. The data in the population are not normally distributed, there standard deviation is known, and the sample size is large.

Z-value

4. The data in the population are not normally distributed, there standard deviation is known, and the sample size is large.

No good reliability factor exists