chapter 4: making statistical inferences from samples

76
Chapter 4: Making Statistical Inferences from Samples 4.1 Introduction 4.2 Basic univariate inferential statistics 4.3 ANOVA test for multi-samples 4.4 Tests of significance of multivariate data 4.5 Non-parametric methods 4.6 Bayesian inferences 4.7 Sampling methods 4.8.Resampling methods 1 Chap 4-Data Analysis Book- Reddy

Upload: pascale-cole

Post on 30-Dec-2015

45 views

Category:

Documents


0 download

DESCRIPTION

Chapter 4: Making Statistical Inferences from Samples. 4.1 Introduction 4.2 Basic univariate inferential statistics 4.3 ANOVA test for multi-samples 4.4 Tests of significance of multivariate data 4.5 Non-parametric methods 4.6 Bayesian inferences 4.7 Sampling methods - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 1

Chapter 4: Making Statistical Inferences from Samples

4.1 Introduction

4.2 Basic univariate inferential statistics

4.3 ANOVA test for multi-samples

4.4 Tests of significance of multivariate data

4.5 Non-parametric methods

4.6 Bayesian inferences

4.7 Sampling methods

4.8.Resampling methods

Page 2: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 2

4.1 IntroductionThe primary reasons for resorting to sampling as against measuring the

whole population is:- to reduce expense - to make quick decisions (say, in case of a production process), - often it is impossible to do otherwise.

Random sampling, the most common form of sampling, involves selecting samples from the population in a random manner

(the samples should be independent so as to avoid bias- not as simple as it sounds)

Such inferences, usually involving descriptive measures such as the mean value or the standard deviation, are called estimators. These are mathematical expressions to be applied to sample data in order to deduce the estimate of the true parameter.

Page 3: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 3

Fig. 4.13 Overview of various types of parametric hypothesis tests treated in this chapter along with section numbers. The lower set of three sections treat non-

parametric tests.

Hypothesis Tests

One sample Two samples

Mean/Proportion Variance Probability

distributionCorrelationcoefficient

One variable Two variables

4.2.2/ 4.2.4(a)

4.2.5(a)

Multivariate

4.2.6 4.2.7

Mean/Proportion

Variance

4.2.3(a)4.2.3(b)/4.2.4(b)

Mean

4.2.5(b) 4.3

One variable

4.4.2

Multi samples

Mean

One variable

ANOVAHotteling T^2

Non-parametric 4.5.1 4.5.2 4.5.3

Two types of tests:-Parametric and-Non-parametric

Page 4: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 4

4.2 Basic Univariate

4.2.1(a) Sampling distribution of the meanConsider a population from which many random samples are taken.

What can one say about the distribution of the sample estimators?

Let be the population mean and sample mean respectively,

be the population std dev and sample std dev

Then, regardless of the shape of the population frequency distribution:

4.1

And std dev of the population mean (or SE or standard error of the mean)

4.2

where n is the number of samples selected.

Use sample std dev if population std dev is not known

1/ 2( )xSE

n

and x

x and s

x

xs

Page 5: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 5

Fig. 4.1 Illustration of the Central

Limit Theorem.

The sampling distribution of

contrasted with the parent population

distribution for three cases with different parent distributions:

as sample size increases, the sampling distribution gets closer

to a normal distribution (and the standard error of the

mean decreases)

x

Page 6: Chapter 4: Making Statistical Inferences from Samples

If a ball bounces to the right k times on its way down (and to the left on the remaining pins) it ends up in the kth bin counting from the left. Denoting the number of rows of pins in a bean machine by n, the number of paths to the kth bin on the bottom is given by the binomial coefficient . If the probability of bouncing right on a pin is p (which equals 0.5 on an unbiased machine) the probability that the ball ends up in the kth bin equals is the probability mass function of a binomial distribution.According to the central limit theorem the binomial distribution approximates the normal distribution provided that n, the number of rows of pins in the machine, is large.

Galton’s Boards (1889)

The machine consists of a vertical board with interleaved rows of pins. Balls are dropped from the top, and bounce left and right as they hit the pins. Eventually, they are collected into one-ball-wide bins at the bottom. The height of ball columns in the bins approximates a bell curve

Page 7: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 7

4.2.1(b) Confidence limits for the mean

Instead of the behavior of many samples all taken from one population, what can one say about only one large random sample.

This process is called inductive reasoning or arguing backwards from a set of observations to a reasonable hypothesis.

However, the benefit provided by having to select only a sample of the population comes at a price: one has to accept some uncertainty in our estimates.

Based on a sample taken from a population:

• one can deduce intervals bounds of the population mean at a specified confidence level

• one can test whether the sample mean differs from the presumed population mean

 

Page 8: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 8

4.2.1(b) Confidence limits for the mean

The confidence interval of the population mean = 4.5b

This formula is valid for any shape of the population distribution provided, of course, that the sample is large (say, n>30).

Half-width of the 95% CL is ( ) : bound of the error of estimation

For small samples (n<30), instead of variable z, use student-t variable.

 

Eq.4.5 corresponds to the long-run bounds, i.e., in the long run roughly 95% of the intervals will contain .

Prediction of a single x value:

Prediction interval of x = 4.6

where tc/2 is the two-tailed critical value at d.f. = n-1 at the desired CL

/ 2x

c

sx z

n

1/ 2/ 2

1. (1 )c xx t s

n

1.96 xs

n

x

Page 9: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 9

Example 4.2.1: Evaluating manufacturer quoted lifetime of light bulbs from sample data

A manufacturer claims that the distribution of the lifetimes of his best model has a mean = 16 years and standard deviation = 2 years when the bulbs are lit for 12 hours every day. Suppose that a city official wants to check the claim by purchasing a sample of 36 of these bulbs and subjecting them to tests that determine their lifetimes.

 

(i) Assuming the manufacturer’s claim to be true, describe the sampling distribution of the mean lifetime of a sample of 36 bulbs. Even though the shape of the distribution is unknown, the Central Limit Theorem suggests that the normal distribution can be used:

years.<x>= =16 and s = 2/ 36 0.33x

Page 10: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 10

Fig. 4.2 Sampling distribution of for a normal distribution N(16, 0.33).

Shaded area represents the probability of the mean life of the

bulb being < 15 years

Mean,Std. Dev.16,0.333

Normal Distribution

x

dens

ity

14 15 16 17 180

0.2

0.4

0.6

0.8

1

1.2

ii) What is the probability that the sample purchased by the city officials has a mean-lifetime of 15 years or less?

The normal distribution N (16, 0.33) is drawn and the darker shaded area to the left of x=15 provides the probability of the city official observing a mean life of 15 years or less.

Next, the standard normal statistic is computed as:

This probability or p-value can be read off from Table A3 as p( ) = 0.0013. Consequently, the probability that the consumer group will observe a sample mean of 15 or less is only 0.13%.  

15 16z= 3.0

/ 2 / 36

x

n

3.0z

Page 11: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 11

(c) If the manufacturer’s claim is correct, compute the ONE TAILED 95% prediction interval of a single bulb from the sample of 36 bulbs.

From the t-tables (Table A4), the critical value is tc =1.7 for

d.f .=36-1=35 and CL=95% corresponding to the one-tailed distribution.

95% prediction value of x=

= = 12.6 years.1/ 2116 (1.70).2.(1 )

36

1/ 2/ 2

1. (1 )c xx t s

n

Page 12: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 12

4.2.2 Hypothesis Tests for Single Sample Mean

During hypothesis testing the intent is to decide which of two competing claims is true.

For example, one wishes to support the hypothesis that women live longer than men.

Samples from each of the two populations are taken, and a test, called statistical inference is performed to prove (or disprove) this claim.

Since there is bound to be some uncertainty associated with such a procedure, one can only be confident of the results to a degree that can be stated as a probability. If this probability value is higher than a pre-selected threshold probability, called significance level of the test, then one would conclude that women do live longer than men; otherwise, one would have to accept that the test was non-conclusive.

Page 13: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 13

Once a sample is drawn, the following steps are performed:

• formulate the hypotheses: the null or status quo, and the alternate (which are complementary)

• select a confidence level and estimate the corresponding significance level (say, 0.01 or 0.05)

• identify a test statistic (or random variable) that will be used to assess the evidence against the null hypothesis

• determine the critical or threshold value of the test statistic from probability tables

• compute the test statistic for the problem at hand • rule out the null hypothesis only if the absolute value is greater

than the critical statistic , and accept the alternate hypothesis

Page 14: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 14

Fig. 4.4 Illustration of critical cutoff values between one tailed and two-tailed tests assuming the normal distribution. The shaded areas represent the probability values corresponding to 95% CL or

0.05 significance level or p =0.05. The critical values shown can be determined from Table A3.

x

x

( )f x

( )f x

-1.645 -1.96 1.96

p=0.05 p=0.025

Be careful that you select the appropriate significance level when a confidence level is stipulated

Page 15: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 15

Example 4.2.2. Evaluating whether a new type of light bulb has longer life

Traditional light bulbs have:

mean life = 1200 hours and standard deviation = 3.

To compare the life against that of a new type of light bulb

Use the classical test and define two hypotheses: • The null hypothesis which represents the status quo, i.e., that the new

process is no better than the previous one H0 : = 1200 hours,

• The research or alternative hypothesis (Ha) is the premise that > 1200

 

Say, sample size n = 100 and significance or error level of the test is = 0.05.

Use one-tailed test (since the new bulb manufacturing process should have a longer life, not just different from that of the traditional process).

Page 16: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 16

The mean life of the sample of =100 bulbs can be assumed to be normally distributed with mean 1200 and standard error

From the standard normal table (Table A3), the one tailed critical z- value is:

which leads to =1200+1.64 x 300 /(100)1/2 =1249

• Suppose testing of the 100 tubes yields a value of =1260. As , one would reject the null hypothesis at the 0.05 significance (or error) level.

This is akin to jury trials where the null hypothesis is taken to be that the accused is innocent-

the burden of proof during hypothesis testing is on the alternate hypothesis.

Hence, two types of errors can be distinguished:

 • Concluding that the null hypothesis is false when in fact it is true is called a Type I error,

and represents the probability (i.e., the pre-selected significance level) of erroneously rejecting the null hypothesis. This is also called the “false negative” or “false alarm” rate.

• The flip side, i.e. concluding that the null hypothesis is true when in fact it is false, is called a Type II error and represents the probability of erroneously accepting the alternate hypothesis, also called the “false positive” rate.

x

/ (300) /( 100) 30n

=0.05 1.64z c 0

c

x

/ nz

x c

x x x c

Page 17: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 17

Fig. 4.3 The two kinds of error that occur in a classical test. (a) If H 0 is true, then significance level = probability of erring (rejecting the true hypothesis H0). (b) If Ha is true, then =probability of erring ( judging that the false hypothesis H0 is acceptable).

The numerical values correspond to data from Example 4.2.2.

Mean,Std. Dev.1200,30

Normal Distribution

x

dens

ity

1100 1150 1200 1250 13000

3

6

9

12

15(X 0.001)

Mean,Std. Dev.1260,30

Normal Distribution

x

dens

ity

1200 1250 1300 1350 14000

3

6

9

12

15(X 0.001)

N(1200,30)

N(1260,30)

Reject HoAccept Ho

Area representsprobability of falselyrejecting null hypothesis(Type I error)

Area representsprobability of falselyaccepting the alternative hypothesis (Type II error)

Critical value

False negative

False positive

Page 18: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 18

4.2.3 Two Independent Samples and Paired Difference Tests

(a1) Two independent sample test for evaluating the means of two independent random samples from the two populations under consideration whose variances are unknown and unequal (but reasonably close)

Test statistic:

For large samples, the confidence intervals of the difference in the population means can be determined as:

For smaller sample sizes, the z standardized variable is replaced with the student-t variable. The critical values are found from the student t- tables with degrees of freedom d.f.= n1 + n2 -2.

2/1

2

22

1

21

2121

)(

)()(

n

s

n

s

xxz

1 2 1 21 2

2 21/ 21 2

1 2

1 2

( ) . ( , )

where ( , )=( )n n

cx x z SE x x

s sSE x x

4.8

4.7

Page 19: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 19

Fig. 4.5 Conceptual illustration of four characteristic cases that may arise during two-sample testing of medians. The box and whisker plots provide some indication as to the variability in the results of the tests.

- Case (a) clearly indicates that the samples are very much different, while the opposite applies to case (d). - However, it is more difficult to draw conclusions from cases (b) and (c), and it is in such cases that statistical tests are useful.

(a) (b) (c) (d)

Page 20: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 20

Example 4.2.3. Verifying savings from home energy conservation measures

Certain electric utilities fund contractors to weather strip residences to conserve energy.

Suppose an electric utility wishes to determine the cost-effectiveness of their weather-stripping program by comparing the annual electric energy use of 200 similar residences in a given community

Samples collected from both types of residences yield:

- Control sample: mean = 18,750 ; s1 = 3,200 and n1 = 100.

- Weather-stripped sample: mean = 15,150 ; s2 = 2,700 and n2 = 100.

The mean difference = =18750 – 15150 = 3,600, i.e., the mean saving in each weather-stripped residence is 19.2% (=3600/18750)

However, there is an uncertainty associated with this mean value

At the 95% CL, corresponding to a significance level =0.05 for a one-tailed distribution, zc = 1.645 from Table A3, and from eq. 4.8:

2 21/ 21 2(18,750 15,150) 1.645( )

100 100

s s

1 2( )x x

1 2

Page 21: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 21

The confidence interval is approximately:

=3600 689 = (2,911 and 4,289).

These intervals represent the lower and upper values of saved energy at the 95% CL.

To conclude, one can state that the savings are positive, i.e., one can be 95% confident that there is an energy benefit in weather-striping the homes. More specifically, the mean saving is 19.2% of the baseline value with an uncertainty of 19.1% (= 689/3600) in the savings at the 95% CL.

Thus, the uncertainty in the savings estimate is as large as the estimate itself which casts doubt on the efficacy of the conservation program.

This example reflects a realistic concern in that energy savings in homes from energy conservation measures are often difficult to verify accurately.

2 21/ 23200 2700

3600 1.645( )100 100

Page 22: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 22

4.2.3 Two Independent Samples and Paired Difference Tests (contd.)

(a2) “Pooled variances” also used when the samples are small and the variances of both populations are close. Here, instead of using individual standard deviation values s1 and s2, a new quantity called the pooled variance sp is used:

with d.f. = n1 + n2-2

- pooled variance is the weighted average of the two sample variances

Pooled variance approach is said to result in tighter confidence intervals, and hence its appeal. However, several authors discourage its use

Confidence intervals of the difference in the population means is:

2 22 1 1 2 2p

1 2

(n 1)s (n 1)ss

n n 2

1 2 1 21 2 ( ) . ( , )cx x t SE x x

2 1/ 21 2 p

1 2

1 1( , ) [s ( )]

n nSE x x

where

Page 23: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 23

Example 4.2.4. Comparing energy use of two similar buildings based on utility bills- the wrong way

Buildings which are designed according to certain performance standards are eligible for recognition as energy-efficient buildings by federal and certification agencies. A recently completed building (B2) was awarded such an honor.

The federal inspector, however, denied the request of another owner of an identical building (B1) close by who claimed that the differences in energy use between both buildings were within statistical error.

An energy consultant was hired by the owner to prove that B1 is as energy efficient as B2. He chose to compare the monthly mean utility bills over a year between the two commercial buildings based on the data recorded over the same 12 months and listed in Table 4.1.

Page 24: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 24

Month Building B1Utility cost ($)

Building B2 Utility cost ($)

Difference in Costs (B1-B2)

Outdoor temperature (0C)

1 693 639 54 3.52 759 678 81 4.73 1005 918 87 9.24 1074 999 75 10.45 1449 1302 147 17.36 1932 1827 105 267 2106 2049 57 29.28 2073 1971 102 28.69 1905 1782 123 25.510 1338 1281 57 15.211 981 933 48 8.712 873 825 48 6.8Mean 1,349 1,267 82Std. Deviation

530.07 516.03 32.00

Null hypothesis: mean monthly utility charges for the two buildings are equal . Since the sample sizes are less than 30, the t-statistic has to be used.

2 22p

(12 1).(530.07) (12 1).(516.03)s 273,630.6

12 12 2

1/ 2

(1349 1267) 0 82t= 0.38

1 1 213.54[(273,630.6)( )]12 12

and the t-statistic:Pooled variance :

One-tailed critical value is 1.321 for CL=90 % and d.f.=12+12-2=22: Cannot reject null hypothesis

Page 25: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 25

Fig. 4.6 Variation of the utility bills for the two buildings B1 and B2 (Example 4.2.5)

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10 11 12

B1

B2

Difference

Month of Year

Utilit

y Bi

lls ($

/mon

th)

There is, however, a problem with the way the energy consultant performed the test. Looking at figure below would lead one not only to suspect that this conclusion is erroneous, but also to observe that the utility bills of the two buildings tend to rise and fall together because of seasonal variations in the climate. Hence the condition that the two samples are independent is violated. It is in such circumstances that a paired test is relevant.

Page 26: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 26

Example 4.2.5. Comparing energy use of two similar buildings based on utility bills- the right way

 

Here, the test is meant to determine whether the monthly mean of the differences in utility charges between both buildings ( ) is zero or not.

The null hypothesis is that this is zero, while the alternate hypothesis is that it is different from zero. Thus:

= with d.f. = 12-1=11

where the values of 82 and 32 are found from Table 4.1.

 

For = 0.05 with a one-tailed test, from Table A4 critical value t0.05 = 1.796.

Because 8.88 >>this critical value, one can safely reject the null hypothesis.

In fact, Bldg 1 is less energy efficient than Bldg 2 even at = 0.0005

(or CL = 99.95%), and the owner of B1 does not have a valid case at all!

x-D

t - statistic =x

s n

D

D D

0

/

82

32 12888

/.

Page 27: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 27

4.2.4 Single Sample Tests for ProportionsInstances of surveys performed in order to determine fractions or

proportions of populations who either have preferences of some sort or have a certain type of equipment- can be interpreted as either a “success” (the customer has gas heat) or a “failure”- a binomial experiment

Let p be the population proportion one wishes to estimate from the sample proportion

The large sample confidence interval of for the two tailed case at a significance level z

^ number of successes in sample

total number of trials

xp

n

^ ^ ^1/ 2

/ 2[ (1 ) / ]p z p p n

^

p

4.13

Page 28: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 28

Example 4.2.6. In a random sample of n=1000 new residences in Scottsdale, AZ, it was found that 630 had swimming pools. Find the 95% confidence interval for the fraction of buildings with pools.

In this case, n=1000, while^ 630

0.631000

p . From Table A3, the

one-tailed critical value 0.025 1.96z , and hence from eq. 4.2.13, the

two tailed 95% confidence interval for p is:

1/ 2 1/ 20.63(1 0.63) 0.63(1 0.63)0.63 1.96[ ] 0.63 1.96[ ]

100 100p

or 0.5354 < p < 0.7246.

13113

Page 29: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 29

Example 4.2.7. The same equations can also be used to determine sample size in order for p not to exceed a certain range or error e. For instance, one would like to determine from Example 4.2.6 data, the sample size which will yield an estimate of p within 0.02 or less at 95% CL Then, recasting eq. 4.13 results in a sample size:

^ ^2 2

/ 22 2

(1 ) (1.96 )(0.63)(1 0.63)2239

(0.02)

z p pn

e

It must be pointed out that the above example is somewhat

misleading since one does not know the value of ^

p beforehand. One may have a preliminary idea, in which case, the sample size n would be an approximate estimate and this may have to be revised once some data is collected.

Page 30: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 30

4.2.5 Single (and Two) Sample Tests of Variance

The confidence intervals for a population variance 2 based on sample variance s2 are to be determined. To construct such confidence intervals, one will use the fact that if a random sample of size n is taken from a

population that is normally distributed with variance 2 , then the random variable

22

2 1s

n

4..15

has the chi-square distribution with =(n-1) degrees of freedom. The

advantage of using 2 instead of s2 is similar to the advantage of standardizing a variable to a normal random variable. Such a transformation allows standard tables (such as Table A5) to be used for determining probabilities irrespective of the magnitude of s2. The basis of these probability tables is again akin to finding the areas under the chi-square curves.

Such tests allow one to specify a confidence level for the population variance from a sample

Page 31: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 31

Example 4.2.9. A company which makes boxes wishes to determine whether their automated production line requires major servicing or not. They will base their decision on whether the weight from one box to another is significantly different from a maximum permissible population

variance value of 2 = 0.12 kg2. A sample of 10 boxes is selected, and their variance is found to be s2 = 0.24 kg2. Is this difference significant at the 95% CL?

From eq. 4.15, the observed chi-square value is 2 10 1(0.24) 18

0.12

.

Inspection of Table A5 for =9 degrees of freedom, reveals that for a

significance level 0.05 , the critical chi-square value c2 = 16.92

and, for 0.025 , c2 = 19.02. Thus, the result is significant at

0.05 or 95% CL. However, the result is not significant at the 97.5% CL. Whether to service the automated production line based on these statistical tests involves performing a decision analysis.

Page 32: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 32

Page 33: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 33

4.2.6 Tests for DistributionsThe Chi-square ( 2 ) statistic applies to discrete data. It is used to

statistically test the hypothesis that a set of empirical or sample data does not differ significantly from that which would be expected from some specified theoretical distribution. In other words, it is a goodness-of-fit test to ascertain whether the distribution of proportions of one group differs from another or not. The chi-square statistic is computed as:

22

( )exp

exp

f f

fobs

k

4.17

where fobs is the observed frequency of each class or interval, fexp is the expected frequency for each class predicted by the theoretical distribution, and k is the

number of classes or intervals. If 2 =0, then the observed and theoretical

frequencies agree exactly. If not, the larger the value of 2 , the greater the

discrepancy. Tabulated values of 2 are used to determine significance for

different values of degrees of freedom =k-1 (see Table A5). Certain restrictions apply for proper use of this test. The sample size should be greater than 30, and none of the expected frequencies should be less than 5. In other words, a long tail of the probability curve at the lower end is not appropriate. The following example serves to illustrate the process of applying the chi-square test.

Page 34: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 34

Example 4.2.11. Ascertaining whether non-code compliance infringements in residences is random or not A county official was asked to analyze the frequency of cases when home inspectors found new homes built by one specific builder to be non-code compliant, and determine whether the violations were random or not. The following data for 380 homes were collected:

No. of code infringements 0 1 2 3 4 Number of homes 242 94 38 4 2 The underlying random process can be characterized by the Poisson

distribution: exp( )

( )!

x

P xx

. The null hypothesis, namely that the sample

is drawn from a population that is Poisson distributed is to be tested at the 0.05 significance level.

The sample mean 0(242) 1(94) 2(38) 3(4) 4(2)

380

=0.5 infringements

per home

Page 35: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 35

For a Poisson distribution with =0.5, the underlying or expected values are found for different values of x as shown in table

X=number of non-code compliance

P(x).n Expected no

0 (0.6065).380 230.470 1 (0.3033).380 115.254 2 (0.0758).380 28.804 3 (0.0126).380 4.788 4 (0.0016).380 0.608 5 or more (0.0002).380 0.076 Total (1.000).380 380

The last three categories have expected frequencies that are less than 5, which do not meet one of the requirements for using the test (as stated above). Hence, these will be combined into a new category called “3 or more cases” which will have an expected frequency of 4.7888+0.608+0.076=5.472. The following statistic is calculated first:

2 2 2 2

2 (242 230.470) (94 115.254) (38 28.804) (6 5.472)

230.470 115.254 28.804 5.472

=7.483

Since there are only 4 groups, the degrees of freedom = 4-1=3, and from Table

5, the critical value at 0.05 significance level is 2critical =7.815. Hence, the null

hypothesis cannot be rejected at the 0.05 significance level; this is, however, marginal.

Page 36: Chapter 4: Making Statistical Inferences from Samples

Chap 3-Data Analysis-Reddy 36

Example 3.4.2. Extension of a spring under different loads:

Standard deviations of load and extension are 3.742 and 18.298 respectively, while the correlation coefficient = 0.998. This indicates a very strong positive correlation between the two variables as one should expect.

Load (Newtons) 2 4 6 8 10 12 Extension (mm) 10.4 19.6 29.9 42.2 49.2 58.5

 Load

(Newtons)Extension

(mm) x-xbar y-ybar   Product  2 10.4 -5 -24.57   122.85  4 19.6 -3 -15.37   46.11  6 29.9 -1 -5.07   5.07  8 42.2 1 7.23   7.23  10 49.2 3 14.23   42.69  12 58.5 5 23.53   117.65             

Mean 7.000 34.967     sum 341.600stdev 3.742 18.298     cov(xy) 68.320

          corr 0.998

_ _

i i1

1cov(xy)= . (x x ).(y y )

n-1

n

i

Recall the concept of Correlation Coefficient

Page 37: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 37

4.2.7 Tests on the Pearson Correlation CoefficientMaking inferences about the population correlation coefficient from knowledge of the sample correlation coefficient r. Assumption: both the variables are normally distributed (bivariate normal population), Fig. 4.2.5 provides a convenient way of ascertaining the 95% CL of the population correlation coefficient for different sample sizes. Say, r = 0.6 for a sample n = 10 pairs of observations, then the 95% CL for the population correlation coefficient are (-0.05 < < 0.87), which are very wide. Notice how increasing the sample size shrinks these bounds. For n = 100, the intervals are (0.47 < < 0.71).

Table A7 lists the critical values of the sample correlation coefficient r for testing the null hypothesis that the population correlation coefficient is statistically significant (i.e., 0 ) at the 0.05 and 0.01 significance levels for one and two tailed tests. The interpretation of these values is of some importance in many cases, especially when dealing with small data sets.

Say, analysis of the 12 monthly bills of a residence revealed a linear

correlation of r=0.6 with degree-days at the location. Assume that a two-tailed test applies. The sample correlation barely suggests the presence of a correlation at a significance level =0.05 (the critical value from Table A7 is c =0.576)

while none at =0.01, (for which c =0.708).

Page 38: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 38

Fig. 4.8 Plot depicting 95% confidence bands for population correlation in a bivariate normal population for various sample sizes n. The bold vertical line defines the lower and upper limits of when r = 0.6 from a data

set of 10 pairs of observations (from Wonnacutt and Wonnacutt, 1985 by permission of John Wiley and Sons)

Page 39: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 39

4.3 ANOVA test for multi-samples

The statistical methods known as ANOVA (analysis of variance) are a broad set of widely used and powerful techniques meant to identify and measure sources of variation within a data set. This is done by partitioning the total variation in the data into its component parts. Specifically, ANOVA uses variance information from several samples in order to make inferences about the means of the populations from which these samples were drawn (and, hence, the appellation).

Fig. 4.9 Conceptual explanation of the basis of an ANOVA test

Page 40: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 40

ANOVA methods test the null hypothesis of the form:

0 1 2

i

: ...

: at least two of the 's are differentk

a

H

H

4.18

Adopting the following notation:

1 2

1 2

1 2

1 2

Sample sizes: , ...,

Sample means: , ...

Sample standard deviations: , ...

Total sample size: ...

Grand average: weighted average of all n responses

k

k

k

k

n n n

x x x

s s s

n n n n

x

Then, one defines between-sample variation called “treatment sum of squares1” (SSTr) as:

2

1

SSTr= ( )k

iii

n x x

with d.f.= k-1 4.19

and within-samples variation or “error sum of squares” (SSE) as:

2

1

SSE= ( 1)k

i ii

n s

with d.f.= n-k 4.20

1 The term “treatment” was originally coined for historic reasons where one was interested in evaluating the effect of treatments or changes in a product development process. It is now used synonymously to mean “classes” from which the samples are drawn.

Page 41: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 41

Together these two sources of variation: the “total sum of squares” (SST):

SST = SSTr + SSE = 2

1 1

( )k n

ij

i j

x x

with d.f.= n-1 4.21

SST is simply the sample variance of the combined set of n data points=

2( 1)in s where s is the standard deviation of all the n data points.

The statistic defined below as the ratio of two variances is said to follow the F-distribution:

MSTr

MSEF 4.22

where MSTr is the mean between-sample variation =SSTr/(k-1)

and MSE is the mean total sum of squares=SSE/(n-k) Recall that the p-value is the area of the F curve for (k-1,n-k) degrees of

freedom to the right of F value. If p-value (the selected significance level), then the null hypothesis can be rejected. Note that the test is meant to be used for normal populations and equal population variances.

errerror

Page 42: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 42

Example 4.3.1. Comparing mean life of five motor bearings A motor manufacturer wishes to evaluate five different motor bearings for motor vibration (which adversely results in reduced life). Each type of bearing is installed on different random samples of six motors. The amount of vibration (in microns) is recorded when each of the 30 motors are running.

Sample Brand 1 Brand 2 Brand 3 Brand 4 Brand 5 1 13.1 16.3 13.7 15.7 13.5 2 15.0 15.7 13.9 13.7 13.4 3 14.0 17.2 12.4 14.4 13.2 4 14.4 14.9 13.8 16.0 12.7 5 14.0 14.4 14.9 13.9 13.4 6 11.6 17.2 13.3 14.7 12.3 Mean 13.68 15.95 13.67 14.73 13.08 Std. dev. 1.194 1.167 0.816 0.940 0.479

Determine whether the bearing brands have an effect on motor vibration at the =0.05 significance level.

Page 43: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 43

Fig. 4.10 (a) Effect plot. (b) Means plot showing the 95% CL intervals

In this example, k=5, and n=30. The one-way ANOVA table is first generated

Source d.f. Sum of Squares Mean Square F- value Factor 5-1=4 SSTr=30.855 MSTr=7.714 8.44 Error 30-5=25 SSE=22.838 MSE=0.9135 Total 30-1=29 SST=53.694

From the F tables (Table A6) and for =0.05, the critical F value for d.f. =(4,25) is Fc=2.76, which is less than F=8.44 computed from the data. Hence, one is compelled to reject the null hypothesis that all five means are equal, and conclude that type of bearing motor does have a significant effect on motor vibration. In fact, this conclusion can be reached even at the more stringent significance level of =0.001. The results of the ANOVA analysis can be conveniently illustrated by generating an effects plot, or means plot which includes 95% CL intervals

Page 44: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 44

A limitation of the ANOVA method is that the null hypothesis is rejected even if one motor bearing is different from theothers. In order to pin-point the cause for this rejection,different methods have been developed.

One could adopt a paired comparison approach. With 5 sets, 10 paired tests are needed

- Tedious- More importantly, sensitivity decreases,

i.e., Type I error increases

The Tukey method is widely used(applies only when samples are equal)

Student t-test is used and approach allows clear visual representation

Page 45: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 45

Tukey’s procedure is based on comparing the distance (or

absolute value) between any two sample means _ _

| |i jx x to a threshold value T that depends on significance level as well as on the mean square error (MSE) from the ANOVA test. The T value is calculated as:

1/ 2( )i

MSET q

n 4.3.8

where ni is the size of the sample drawn from each population, q values are called the studentized range distribution values (Table A8 for =0.05 for d.f. =(k,n-k)

If _ _

| |i jx x >T, then one concludes that i j at the corresponding

significance level. Otherwise, one concludes that there is no difference between the two means

Page 46: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 46

Fig. 4.11 Graphical depiction summarizing the ten pairwise comparisons following Tukey’s procedure. Brand 2 is significantly different from Brands 1,3 and 5, and so is Brand 4 from Brand 5

(Example 4.3.2)(bars drawn to correspond to a specified confidence level based on t-

tests)

Example 4.3.2. 1Using the same data as that in Example 4.3.1, conduct a multiple comparison procedure to distinguish which of the motor bearing brands are superior to the rest. Following Tukey’s procedure given by eq. 4.3.8, the critical distance between sample means at =0.05 is:

1/ 2 1/ 20.913( ) 4.15( ) 1.62

6i

MSET q

n

where q is found by interpolation from Table A8 based on d.f.=(k,n-k)=(5,25).

The pairwise distances between the five sample means Samples Distance Conclusion*

1,2 |13.68 15.95 | 2.27 i j

1,3 |13.68 13.67 | 0.01

1,4 |13.68 14.73 | 1.05

1,5 |13.68 13.08 | 0.60

2,3 |15.95 13.67 | 2.28 i j

2,4 |15.95 14.73 | 1.22

2,5 |15.95 13.08 | 2.87 i j

3,4 |13.67 14.73 | 1.06

3,5 |13.67 13.08 | 0.59

4,5 |14.73 13.08 | 1.65 i j

* Only if distance > critical value of 1.62

1 From Devore and Farnum (2005) with permission from Thomson Brooke/Cole.

Page 47: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 47

Fig. 4.12 Two bivariate normal distributions and associated 50% and 90% contours assuming equal standard deviations for both variables. However, the left hand side plots presume the two variables to be uncorrelated,

while those on the right have a correlation coefficient of 0.75 which results in elliptical

4.4 Tests of Significance of Multivariate Data(not covered)

Underlying assumptionsof distributions are important:Distortion due to correlated variables

-Multivariate analysis (also called multifactor analysis) deals with statistical inference and model building as applied to multiple measurements made from one or several samples taken from one or several populations. - They can be used to make inferences about sample means and variances. Rather than treating each measure separately as done in t-tests and single-factor ANOVA, these allow the analyses of multiple measures simultaneously as a system of measurements (results in sounder inferences )

Page 48: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 48

4.5 Non-Parametric TestsParametric tests have implicit built-in assumptions regarding the distributions from which the samples are taken. Comparison of populations using the t-test and F-test can yield misleading results when the random variables being measured are not normally distributed and do not have equal variances. It is obvious that fewer the assumptions, broader would be the potential applications of the test. One would like that the significance tests used lead to sound conclusions, or that the risk of coming to wrong conclusions be minimized. Two concepts relate to the latter aspect.

- robustness of a test is inversely proportional to the sensitivity of the test and to violations of the underlying assumptions.

- power of a test is a measure of the extent to which cost of experimentation is reduced. There are instances when the random variables are not quantifiable measurements but can only be ranked in order of magnitude (as in surveys). Rather than use actual numbers, nonparametric tests usually use relative ranks by sorting the data by rank (or magnitude), and discarding their specific numerical values. Nonparametric tests are generally less powerful than parametric ones, but on the other hand, are more robust and less sensitive to outlier points

Page 49: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 49

4.5.1 Spearman Rank Coefficient MethodExample 4.5.1. Non-parametric testing of correlation between the sizes of faculty research grants and teaching evaluations The provost of a major university wants to determine whether a statistically significant correlation exists between the research grants and teaching evaluation rating of its senior faculty. Data over three years has been collected as assembled in Table 4.8 which also shows the manner in which ranks have been generated and the quantities i i id u v computed.

Table 4.8

Faculty Research grants ($)

Teaching evaluation

Research Rank (ui)

Teaching Rank (vi)

Diff di

Diff sq. di

2

1 1,480,000 7.05 5 7 -2 4 2 890,000 7.87 1 8 -7 49 3 3,360,000 3.90 10 2 8 64 4 2,210,000 5.41 8 5 3 9 5 1,820,000 9.02 7 9 -2 4 6 1,370,000 6.07 4 6 -2 4 7 3,180,000 3.20 9 1 8 64 8 930,000 5.25 2 4 -2 4 9 1,270,000 9.50 3 10 -7 49 10 1,610,000 4.45 6 3 3 9 TOTAL 260

Page 50: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 50

Using eq. 4.34 with n=10:

6(260)

1 0.57610(100 1)sr

Thus, one notes that there exists a negative correlation between the sample data. However, whether this is significant for the population correlation coefficient s

can be ascertained by means of a statistical test:

0 : 0 (there is no significant correlation)

: 0 (there is sigificant correlation)s

a s

H

H

Table A10 in Appendix A gives the absolute cutoff values for different significance levels. For n=10, the critical value for 0.05 is 0.564, which suggests that the correlation can be deemed to be significant at the 0.05 significance level, but not at the 0.025 level.

2

2

61

( 1)i

s

dr

n n

where n is the number of paired measurements, and the difference between the ranks for the ith measurement for ranked variables u and v is

i i id u v

.Spearman Rank Correlation Coeff:

0.6480.648 which suggests that the correlation is not significant.

Page 51: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 51

Critical Values ofSpearman’s Rank CorrelationCoefficient

Page 52: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 52

4.5.2 Wilcoxon Rank TestsRather than compare specific parameters (such as the mean and the

variance), the non-parametric tests evaluate whether the probability distributions of the sampled populations are different or not. The test is nonparametric and no restriction is placed on the distribution other than it needs to be continuous and symmetric.

(a) The Wilcoxon rank sum test is meant for independent samples where the individual observations can be ranked by magnitude. The following example illustrates the approach.

Example 4.5.2. Ascertaining whether oil company researchers and academics differ in their predictions of future atmospheric carbon dioxide levels The intent is to compare the predictions in the change of atmospheric carbon dioxide levels between researchers who are employed by oil companies and those who are in academia. The gathered data shown in Table 4.9 in percentage increase in carbon dioxide from the current level over the next 10 years from 6 oil company researchers and seven academics. Perform a statistical test at the 0.05 significance level in order to evaluate the following hypotheses:

(a) Predictions made by oil company researchers differ from those made by academics.

(b) Predictions made by oil company researchers tend to be lower than those made by academics. (not treated in the slides)

Page 53: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 53

Table 4.9 Wilcoxon rank test calculation for paired independent samples

Oil Company Researchers Academics Prediction (%) Rank Prediction (%) Rank 1 3.5 4 4.7 6 2 5.2 7 5.8 9 3 2.5 2 3.6 5 4 5.6 8 6.2 11 5 2.0 1 6.1 10 6 3.0 3 6.3 12 7 - - 6.5 13 SUM 25 66

Ranks are assigned as shown for the two groups of individuals combined. Since there are 13 predictions, the ranks run from 1 through 13 as shown in the table. The test statistic is based on the sum totals of each group (and hence its name). If they are close, the implication is that there is no evidence that the probability distributions of both groups are different; and vice versa. Let TA and TB be the rank sums of either group. Then

( 1) 13(13 1)91

2 2A B

n nT T

4.36

where n= n1 + n2 with n1 = 6 and n2 = 7. Note that n1 should be selected as the one with fewer observations. A small value of TA implies a large value of TB, and vice versa. Hence, greater the difference between both the rank sums, greater the evidence that the samples come from different populations. Since one is testing whether the predictions by both groups are different or not, the two-tailed significance test is appropriate. Table A11 provides the lower and upper cutoff values for different values of n1 and n2 for both the one-tailed and the two-tailed tests. Note that the lower and higher cutoff values are (28, 56) at 0.05 significance level for the two-tailed test. The computed statistics of TA = 25 and TB = 66 are outside the range, the null hypothesis is rejected, and one would conclude that the predictions from the two groups are different.

Page 54: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 54

Page 55: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 55

4.6 Bayesian InferencesThe strength of Bayes’ theorem lies in the fact that it provides a framework for

including prior information in a two-stage experiment whereby one could draw stronger conclusions

It is especially advantageous for small data sets. It shown that its predictions converge with those of the classical method:

(i) as the data set of observations gets larger; and

(ii) if the prior distribution is modeled as a uniform distribution.

It was pointed out that advocates of the Bayesian approach view probability as a degree of belief held by a person about an uncertainty issue as compared to the objective view of long run relative frequency held by traditionalists.

We will discuss how the Bayesian approach can also be used to make statistical inferences from samples about an uncertain quantity and also used for hypothesis testing problems.

Page 56: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 56

4.6.2 Inference about one uncertain quantityConsider the case when the population mean is to be estimated (point and

interval estimates) from the sample mean _

x with the population assumed to be Gaussian with a known standard deviation . The probability P of a two-tailed distribution at significance level can be expressed as:

_ _

/ 2 / 21/ 2 1/ 2( . . ) 1P x z x z

n n 4.37

where n is the sample size and z is the value from the standard normal tables. The traditional interpretation is that one can be (1 ) confident that the above interval contains the true population mean. However, the interval itself should not be interpreted as a probability interval for the parameter. The Bayesian approach uses the same formula but the mean and standard deviation are modified since the posterior distribution is now used which includes the sample data as well as the prior belief. The confidence interval is usually narrower than the traditional one and is referred to as the credible interval or the Bayesian confidence interval. The interpretation of this credible interval is somewhat different from the traditional confidence interval: there is a (1 ) probability that the population mean falls within the interval. Thus, the traditional approach leads to a probability statement about the interval, while the Bayesian about the population parameter

Page 57: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 57

The relevant procedure to calculate the credible intervals for the case of a Gaussian population and a Gaussian prior is presented without proof below. Let the prior distribution, assumed normal, be characterized by a mean 0 and

variance 20 , while the sample values are

_

x and xs . Selecting a prior distribution

is equivalent to having a quasi-sample of size n0 whose size is given by:

2

0 20

xsn

4.38

The posterior mean and standard deviation * and * are then given by:

_

0 01/ 2

0 0

* and *( )

xn n x s

n n n n

4.39

Note that the expression for the posterior mean is simply the weighted average of the sample and the prior mean, and is likely to be less biased than the sample mean alone. Similarly, the standard deviation is divided by the total normal sample size and will result in increased precision. However, had a different prior rather than the normal distribution been assumed above, a slightly different interval would have resulted which is another reason why traditional statisticians are uneasy about fully endorsing the Bayesian approach.

Page 58: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 58

Example 4.6.1. Comparison of classical and Bayesian confidence intervals A certain solar PV module is rated at 60 W with a standard deviation of 2 W. Since the rating varies somewhat from one shipment to the next, a sample of 12 modules has been selected from a shipment and tested to yield a mean of 65 W and a standard deviation of 2.8 W. Assuming a Gaussian distribution, determine the 95% confidence intervals by both the traditional and the Bayesian approaches.

(a) Traditional approach: _

1/ 2 1/ 2

2.81.96 65 1.96 65 1.58

12xs

xn

(b) Bayesian approach. Using eq.4.38 to calculate the quasi-sample size inherent in the prior:

2

0 2

2.81.96

2n , i.e., the prior is equivalent to information from an

additional 2 modules tested. Next, eq.4.39 is used to determine the posterior mean and standard deviation:

1/ 2

2(60) 12(65) 2.8* 64.29 and * 0.748

2 12 (2 12)

The Bayesian 95% confidence interval is then: * 1.96 * 64.29 1.96(0.748) 62.29 1.47 Since prior information has been used, the Bayesian interval is likely to be centered better and be more precise (with a narrower interval) than the classical interval.

64.29

Page 59: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 59

4.6.3 Hypothesis TestingThe traditional or frequentist approach to hypothesis testing is to divide the sample

space into an acceptance region and a rejection region, and posit that the null hypothesis can be rejected only if the probability of the test statistic lying in the rejection region can be ascribed to chance or randomness at the preselected significance level .

Advocates of the Bayesian approach have several objections to this line of thinking (Phillips, 1973):

• the null hypothesis is rarely of much interest. The precise specification of, say, the population mean is of limited value; rather, ascertaining a range would be more useful;

• the null hypothesis is only one of many possible values of the uncertain variable, and undue importance being placed on this value is unjustified ;

• as additional data is collected, the inherent randomness in the collection process would lead to the null hypothesis to be rejected in most cases;

• erroneous inferences from a sample may result if prior knowledge is not considered.

Page 60: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 60

4.6.2 Inference about one uncertain quantityThe Bayesian approach to hypothesis testing is not to base the conclusions on a traditional significance level like p<0.05. Instead it makes use of the posterior credible interval introduced in the previous section. The procedure is summarized below for the instance when one wishes to test the population mean of the sample collected against a prior mean value 0 (Bolstad, 2004).

(a) One sided hypothesis test: Let the posterior distribution of the mean

value be given by 1( / ,... )ng x x . The hypothesis test is set up as:

0 0 1 0: versus :H H 4.40

Let be the significance level assumed (usually 0.10, 0.05 or 0.01). Then, the posterior probability of the null hypothesis, for the special case when the posterior distribution is Gaussian:

*

00 0 1...( : / ) ( )

*nP H x x P z

4.41

where z is the standard normal variable with * and * given by eq. 4.39. If the

probability is less than our selected value of , the null hypothesis is rejected, and one concludes that 0 .

Page 61: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 61

Example 4.6.2. Traditional and Bayesian approaches to determining CLs The life of a certain type of smoke detector battery (assumed normal) is:

-mean of 32 months and a standard deviation of 0.5 months. A building owner decides to test this claim at a significance level of 0.05. He tests a sample of 9 batteries and finds:

-mean of 31 and a sample standard deviation of 1 month. Note that this is a one-side hypothesis test case. (a) The traditional approach would entail testing 0 1: 31 versus : 31H H

. The Student t value: 31 32

3.01/ 9

t

. From Table A4, the critical value for

d.f.=8 is 0.05 1.86t . Thus, he can reject the null hypothesis, and state that the

claim of the manufacturer is incorrect.

= 32 32

Page 62: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 62

(b) The Bayesian approach, on the other hand, would require calculating the posterior probability of the null hypothesis. The prior distribution has a mean 0=32 and variance 2

0 =0.52.

First, use eq 4.38, and determine 2

0 2

14

0.5n , i.e., the prior information is

“equivalent” to increasing the sample size by 4. Next, use eq. 4.39 to determine the posterior mean and standard deviation:

1/2

4(32) 9(31) 1.0* 31.3 and *= 0.277

4 9 (4+9)

.

From here: 32.0 31.3

2.530.277

t

. From the student t table (Table A4) for d.f.

= (9+4-1)=12, this corresponds to a confidence level of less than 99% or a probability of less than 0.01. Since this is lower than the selected significance level 0.05 , he can reject the null hypothesis. In this case, both approaches gave the same result, but sometimes one would reach different conclusions especially when sample sizes are small.

Page 63: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 63

4.7 Sampling MethodsThere are different ways by which one could draw samples; this aspect falls under the

purview of sampling design.

 

There are three general rules of sampling design:• the more representative the sample, the better the results;• all else being equal, larger samples yield better results, i.e., the results are more precise;• larger samples cannot compensate for a poor sampling design plan or a poorly executed

plan.

 

Some of the common sampling methods are described below:

 

(a) random sampling (also called simple random sampling) is the simplest conceptually, and is most widely used. It involves selecting the sample of n elements in such as way that all possible samples of n elements have the same chance of being selected. Two important strategies of random sampling involve:

 • sampling with replacement, in which the object selected is put back into the population

pool and has the possibility to be selected again in subsequent picks, and• sampling without replacement , where the object picked is not put back into the

population pool prior to picking the next item.

Page 64: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 64

(b) non-random sampling: Often occurs unintentionally or unwittingly.

Introduces bias or skewness is introduced- and misleading confidence limits

In some cases, the experimenter intentionally selects the samples in a non-random manner and analyzes the data accordingly.

Types of nonrandom sampling • stratified sampling involves partitioning population into disjoint subsets or strata based

on some criterion. This improves efficiency of the sampling process in some instances, • cluster sampling in which strata/clusters are first generated, then random sampling is

done to identify a subset of clusters, and finally all the elements in the picked clusters are analysis;

• sequential sampling is a quality control procedure where a decision on the acceptability of a batch of products is made from tests done on a sample of the batch. Tests are done on a preliminary sample, and depending on the results, either the batch is accepted or further sampling tests are performed. This procedure usually requires fewer samples to be tested to meet a pre-stipulated accuracy.

• composite sampling where elements from different samples are combined together;• multistage or nested sampling which involves selecting a sample in stages. A larger

sample is first selected, and then subsequently smaller ones (IAQ testing in buildings)• convenience sampling, also called opportunity sampling is a method of choosing

samples arbitrarily following the manner in which they are acquired. Though impossible to treat rigorously, it is commonly encountered in many practical situations.

Page 65: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 65

Stratified SamplingExample 4.7.2 Stratified sampling for variance reduction (better estimate of

mean)

A home improvement center wishes to estimate the mean annual expenditure of its local residents in the hardware section and the drapery section.

- Men visit the store more frequently and spend annually approximately $50; expenditures of as much as $100 or as little as $25 per year are found occasionally.

- Annual expenditures by women can vary from nothing to over $500 (variance much greater)

  - Assume that 80% of the customers are men and that sample size is 15

If simple random sampling were employed, one would expect the sample to consist of approximately 12 men (80% of 15) and 3 women.

Stratified sampling: 5 men and 10 women selected instead (more women have been preferentially selected because their expenditures are more variable).

Page 66: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 66

Stratified Sampling Example contd.

Suppose the annual expenditures of the members of the sample turned out to be:

- Men: 45, 50, 55, 40, 90

- Women: 80, 50, 120, 80, 200, 180, 90, 500, 320, 75

The appropriate weights must be applied to the original sample data if one wishes to deduce the overall mean. Thus, if Mi and Wi are used to designate the ith sample of men and women, respectively,

 

where 0.80 and 0.20 are the original weights in the population, and 0.33 and 0.67 the sample weights respectively.

79$]169567.0

20.0280

33.0

80.0[

15

1]W

67.0

20.0M

33.0

80.0[

15

1=

-X

10

1=iii

5

1=i

Page 67: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 67

BiasedUnbiased

Actual value

Pro

babi

lity

dis

trib

utio

ns

Inefficientestimator

Efficientestimator

Actual value

Pro

babi

lity

dis

trib

utio

ns

Fig. 4.14 Concept of biased and unbiased estimators

Fig. 4.15 Concept of efficiency of estimators(related to distribution,

allows stronger inferential statements to be made)

4.7.2 Desirable Properties of Estimators

Page 68: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 68

Fig. 4.16 Concept of mean square error which includes bias and

efficiency of estimators

Unbiased

Minimum meansquare error

Actual value

Pro

babi

lity

dis

trib

utio

ns

n=200Mean,Std. Dev.

0,0.10.2,0.20.5,0.51,1

Normal Distribution

x

densi

ty

-1 -0.5 0 0.5 1 1.5 20

1

2

3

4 n=200

n=50

n=10

True value

n=5

Fig. 4.17 A consistent estimator is one whose distribution becomes gradually peakedas the sample size n is increased

Page 69: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 69

Assume the underlying probability distribution to be normal. Let RE be the relative error (also called the margin of error or bound on error of estimation) of the population mean at a confidence level (1 ) , which for a two-tailed distribution is defined as:

1 / 2.RE z 4.45

A measure of variability in the population the CV defined as:

std.dev.

true meanxs

CV

where sx is the sample stdev. The maximum value of sx at a stipulated CL:

,1 / 2 1. .xs z CV x

4.46

First, a simplifying assumption is made by replacing (N-1) by N in eq. 4.3 which is the expression for the standard error of the mean for small samples. Then

2 2 2

2 x xs s( ) =

n Nxs N n

n N

4.47

Finally, using the definitions of RE and CV stated above, the required sample size is:

221

2/ 2 1

1 111 ( )

.x

nRE

z CV Ns N

4.48

4.7.3 Determining Sample Size during Random Surveys

where sx is the sample standard deviation

Page 70: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 70

Example 4.7.1. Determination of random sample size needed to verify peak reduction in residences at preset confidence levels An electric utility has provided financial incentives to a large number of their customers to replace their existing air-conditioners with high efficiency ones. This rebate program was initiated in an effort to reduce the aggregated electric peak during hot summer afternoons which is dangerously close to the peak generation capacity of the utility. The utility analyst would like to determine the sample size necessary to assess whether the program has reduced the peak as projected such that the relative error RE 10% at 90% CL. The following information is given: The total number of customers: N=20,000 Estimate of the mean peak saving 2 kW (from engineering calculations) Estimate of the standard deviation sx =1 kW (from engineering calculations) This is a two-tailed distribution problem with 90% CL which corresponds to a one-tailed significance level of / 2 = (100-90)/2/100=0.05. Then, from Table

A4, 0.05 1.65z . Inserting values of RE = 0.1 and 1

0.52

xsCV

in eq.

4.48, the required sample size is:

2

1658.2 660

0.1 1[ ](1.65).(0.5) 20,000

n

Page 71: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 71

Fig. 4.18 Size of random sample needed to achieve different relative errors of the population mean for two different values of population variability (CV of 25% and 50%). (Example 4.7.1)

Sample size vs REPopulation =20,000

0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

0 5 10 15 20

Relative Error (%)

Sam

ple

siz

e n CV (%)

50

25

Population size =20,00090% confidence level

Page 72: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 72

4.8 Resampling MethodsThe rationale behind resampling methods is to draw one sample, treat this original sample as a

surrogate for the population, and generate numerous sub-samples by simply resampling the sample itself.

Thus, resampling refers to the use of given data, or a data generating mechanism, to produce new samples from which the required estimator can be deduced numerically. It is obvious that the sample must be unbiased and be reflective of the population (which it will be if the sample is drawn randomly), otherwise the precision of the method is severely compromised.

Effron and Tibshrirami (1982) have argued that given the available power of computing, one should move away from the constraints of traditional parametric theory with its over-reliance on a small set of standard models for which theoretical solutions are available, and substitute computational power for theoretical analysis.

This parallels the manner in which numerical methods have in large part replaced closed forms solution techniques in almost all fields of engineering mathematics.

• Resampling is much more intuitive and provides a way of simulating the physical process without having to deal with statistical constraints of the analytic methods.

• A big virtue of resampling methods is that they extend classical statistical evaluation to cases which cannot be dealt with mathematically.

• The downside to the use of these methods in that they require large computing resources (of the order of 1000 and more samples)- not an issue now

Page 73: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 73

The creation of multiple sub-samples from the original sample can be done in several.

The three most common resampling methods are:

• Permutation method (or randomization method) is one where all possible subsets of r items (which is the sub-sample size) out of the total n items (the sample size) are generated, and used to deduce the population estimator and its confidence levels or its percentiles.

• The jackknife method creates subsamples with replacement. There are several numerical schemes for implementing the jackknife scheme. A widespread method of implementation is to simply create n subsamples with (n-1) data points wherein a single different observation is omitted in each subsample.

 • The bootstrap method is similar but differs in that no groups are formed but the

different sets of data sequences are generated by simply sampling with replacement from the observational data set. Individual estimators deduced from such samples permit estimates and confidence intervals to be determined. The method would appear to be circular, i.e., how can one acquire more insight by resampling the same sample? The simple explanation is that “the population is to the sample as the sample is to the bootstrap sample”.

Page 74: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 74

Example 4.8.1 Using the bootstrap method for deducing 95% CL of mean

The data corresponds to the breakdown voltage (in kV) of an insulating liquid which is indicative of its dielectric strength. Determine the 95% CL.

62 50 53 57 41 53 55 61 59 64 50 53 64 62 50 68 54 55 57 50 55 50 56 55 46 55 53 54 52 47 47 55 57 48 63 57 57 55 53 59 53 52 50 55 60 50 56 58

Bootstrap with 1000 samples

First, use the large sample confidence interval formula to estimate the 95% CL intervals of the mean. Summary

quantities are : sample size n=48, 2646ix and

2 144,950ix from which _

54.7x and standard

deviation s= 5.23. The 95% CL interval is then: 5.23

54.7 1.96 54.7 1.5 (53.2,56.2)48

Bootstrap: The 95% confidence intervals correspond to the two-tailed 0.05 significance level yield (53.2, 56.1) which are very close to the classical parametric range

Page 75: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 75

Athletic score I.Q. Score Athletic rank I.Q. Rank 97 114 1 3 94 120 2 1 93 107 3 7 90 113 4 4 87 118 5 2 86 101 6 8 86 109 7 6 85 110 8 5 81 100 9 9 76 99 10 10

Example 4.8.2. Using the bootstrap method with a nonparametric test to ascertain correlation of two variables One wishes to determine whether there exists a correlation between athletic ability and intelligence level. A sample of 10 high school athletes was obtained involving their athletic and I.Q. scores. The data is listed in terms of descending order of athletic scores in the first two columns

Page 76: Chapter 4: Making Statistical Inferences from Samples

Chap 4-Data Analysis Book-Reddy 76

Fig. 4.20 Histogram based on 100 trials of the sum of 5 random ranks from the sample of 10. Note that in only 2% of the trials was the sum

equal to 17 or lower. Hence, with 98% CL, there is a correlation between athletic ability and IQ

Sum of 5 ranks

Freq

uenc

y

- The athletic scores and the I.Q. scores are rank ordered from 1 to 10 as shown in the last two columns of the table. - The table is split into two groups of five “high” and five “low”. An even split of the group is advocated since it uses the available information better and usually leads to better “efficiency”. -The sum of the observed I.Q. ranks of the five top athletes is =(3+1+7+4+2)= 17.

The resampling scheme will involve numerous trials where a subset of 5 numbers is drawn randomly from the set {1..10}. One then adds these five numbers for each individual trial. If the observed sum across trials is consistently higher than 17, this will indicate that the best athletes will not have earned the observed I.Q. scores purely by chance. The probability can be directly estimated from the proportion of trials whose sum exceeded 17.