probability & statistical inference lecture 6

50
PROBABILITY & STATISTICAL INFERENCE LECTURE 6 MSc in Computing (Data Analytics)

Upload: reuben

Post on 23-Feb-2016

78 views

Category:

Documents


1 download

DESCRIPTION

Probability & Statistical Inference Lecture 6. MSc in Computing (Data Analytics). Lecture Outline. Quick Recap Testing the difference between two sample means Practical Hypothesis Testing Analysis Of Variance. General Steps in Hypotheses testing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probability & Statistical Inference Lecture 6

PROBABILITY & STATISTICAL INFERENCE LECTURE 6MSc in Computing (Data Analytics)

Page 2: Probability & Statistical Inference Lecture 6

Lecture Outline Quick Recap Testing the difference between two

sample means Practical Hypothesis Testing Analysis Of Variance

Page 3: Probability & Statistical Inference Lecture 6

General Steps in Hypotheses testing1. From the problem context, identify the parameter of

interest.2. State the null hypothesis, H0 .3. Specify an appropriate alternative hypothesis, H1.4. Choose a significance level, .5. Determine an appropriate test statistic.6. State the rejection region for the statistic.7. Compute any necessary sample quantities, substitute

these into the equation for the test statistic, and compute that value.

8. Decide whether or not H0 should be rejected and report that in the problem context.

Page 4: Probability & Statistical Inference Lecture 6

Type of questions that can be answered with Two sample hypothesis tests A manufacturing plant want to compare

the defective rate of items coming off two different process lines.

Whether the test results of patients who received a drug are better than test results of those who received a placebo.

The question being answered is whether there is a significant (or only random) difference in the average cycle time to deliver a pizza from Pizza Company A vs. Pizza Company B.

Page 5: Probability & Statistical Inference Lecture 6

Difference in Means of Two Normal Distributions, Variances Known

Page 6: Probability & Statistical Inference Lecture 6

Test Assumptions

Page 7: Probability & Statistical Inference Lecture 6

Example

Page 8: Probability & Statistical Inference Lecture 6

Example

Page 9: Probability & Statistical Inference Lecture 6

Example

The P-Value is the exact significance level of a statistical test; that is the probability of obtaining

a value of the test statistic that is at least as extreme as that when the null hypothesis is true

Page 10: Probability & Statistical Inference Lecture 6

Confidence Interval on a Difference in Means, Variances Known

Page 11: Probability & Statistical Inference Lecture 6

Example

Page 12: Probability & Statistical Inference Lecture 6

Example

Page 13: Probability & Statistical Inference Lecture 6

Difference in Means of Two Normal Distributions, Variances unknownWe wish to test:

The pooled estimator of 2:

Page 14: Probability & Statistical Inference Lecture 6

Difference in Means of Two Normal Distributions, Variances unknown

Page 15: Probability & Statistical Inference Lecture 6

Example

Page 16: Probability & Statistical Inference Lecture 6

Example

Page 17: Probability & Statistical Inference Lecture 6

Example

Page 18: Probability & Statistical Inference Lecture 6

Confidence Interval on the Difference in Means, Variance Unknown

Page 19: Probability & Statistical Inference Lecture 6

Example

Page 20: Probability & Statistical Inference Lecture 6

Example

Page 21: Probability & Statistical Inference Lecture 6

Example

Page 22: Probability & Statistical Inference Lecture 6

Practical Hypothesis Testing1. From the problem context, identify the parameter of

interest.2. State the null hypothesis, H0 .3. Specify an appropriate alternative hypothesis, H1.4. Choose a significance level, .5. Calculate the P-value using a software package of

choice.6. Decide whether or not H0 should be rejected and

report that in the problem context. Reject H0 when P-Value is less than .(Golden rule: Reject H0 for small )

Page 23: Probability & Statistical Inference Lecture 6

Some Reserach Look up the correct formula for

calculating the hypotheses test between two proportions

What are the assumptions for the test Find an example of the research

Page 24: Probability & Statistical Inference Lecture 6

Analysis of Variance

Page 25: Probability & Statistical Inference Lecture 6

Introduction In the previous section we were

concerned with the analysis of data where we compared the sample means.

Frequently data contains more that two samples, they may compare several treatments.

In this lecture we introduce statistical analysis that allows us compare the mean of more that two samples. The method is called ‘Analysis of Variance ‘ or AVOVA for short.

Page 26: Probability & Statistical Inference Lecture 6

Total Sum of SquaresData set:

14, 12, 10, 6 ,4, 2Group A:

6 ,4, 2Group B:

14, 12, 10Overall Mean : 8Total Sum of Squares:SST= (14-8)2 + (12-8)2 + (10-8)2 + (6-8)2 + (4-8)2 +

(2-8)2 =112

Page 27: Probability & Statistical Inference Lecture 6

Between Group Variation Sum of Squares of

the Model:SSm= na(µ - µa)2 +

nb(µ - µb)2

=3*(8-4)2 + 3*(8-12)2

=96

Page 28: Probability & Statistical Inference Lecture 6

Within Group Variation Sum of Squares of the

Error:

SSe=

= (14-12)2 + (12-12)2 + (10-12)2 + (6-4)2 + (4-4)2 + (4-2)2 +

= 16

2

1 1

__)(

k

i

n

jjij xx

Page 29: Probability & Statistical Inference Lecture 6

Structure of the DataGroup Observation Total Mean

1 x11 x12 .......... x1n x1

2 x21 x22.......... x2n x2

.

.

...........

a xa1 xa2 .......... xan xa

Total

1x

2x

ax

x

Page 30: Probability & Statistical Inference Lecture 6

ANOVA Table

Source Degrees of Freedom

Sum Of Squares Mean Square

F- Stat

Model a - 1 SSM /(a-1) MSM / MSE

Error n-aSSE /(n-a)

Total n-1SST /(n-1)

2

1

)( xxn

ii

a

jjj xxn

1

2)(

2

1 1

__)(

a

i

n

jjij xx

Where : n is the sample size and a is the number of groups

Page 31: Probability & Statistical Inference Lecture 6

ANOVA Table – Original ExampleSource Degrees

of Freedom

Sum Of Squares Mean Square

F- Stat

Model 2 - 1 = 1 96 96 24

Error 6 – 2 = 4 16 4

Total 6 – 1 = 5 112

Where : n is the sample size and a is the number of groups

Page 32: Probability & Statistical Inference Lecture 6

Model Assumptions Independence of observations within and

between samples normality of sampling distribution equal variance - This is also called the

homoscedasticity assumption

Page 33: Probability & Statistical Inference Lecture 6

The ANOVA Equation We can describe the observations in the

above table using the following equation:

njai

Y ijiij ,......,2,1,......,2,1

Where : n is the sample size and k is the number of groups

Page 34: Probability & Statistical Inference Lecture 6

ANOVA Hypotheses We wish to test the hypotheses:

The analysis of variance partitions the total variability into two parts.

Page 35: Probability & Statistical Inference Lecture 6

Example

Page 36: Probability & Statistical Inference Lecture 6

Graphical Display of Data

Figure 13-1 (a) Box plots of hardwood concentration data. (b) Display of the model in Equation 13-1 for the completely randomized single-factor experiment

Page 37: Probability & Statistical Inference Lecture 6

Example We can use ANOVA to test the

hypotheses that different hardwood concentrations do not affect the mean tensile strength of the paper. The hypotheses are:

The ANOVA table is below:

Page 38: Probability & Statistical Inference Lecture 6

Example The p-value is less than 0.05 therefore

the H0 can be rejected and we can conclude that at least one of the hardwood concentrations affects the mean tensile strength of the paper.

Page 39: Probability & Statistical Inference Lecture 6

Test Model Assumptions Use the Bartletts Test to test for

homoscedasticity assumption Bartlett's test (Snedecor and Cochran, 1983) is

used to test if k samples have equal variances. Bartlett's test is sensitive to departures from

normality. That is, if your samples come from non-normal distributions, then Bartlett's test may simply be testing for non-normality. The Levene test is an alternative to the Bartlett test that is less sensitive to departures from normality.

Page 40: Probability & Statistical Inference Lecture 6

Barlett Test for Equal Variance The hypotheses for the Barlett test are

as follows:

The barlett test statistic follows a chi-squared distribution

Interpert the p-value like any other hypothese test

ji,pair on least at for : H

... : H22

i1

222

210

j

n

Page 41: Probability & Statistical Inference Lecture 6

If the Assumption of Equal Variance is not met If the assumption for equal variance is

not met use the Welches ANOVA Assignment for next week:

Investigate the difference between the standard ANOVA and Welches ANOVA?

Page 42: Probability & Statistical Inference Lecture 6

Demo

Page 43: Probability & Statistical Inference Lecture 6

Confidence Interval about the mean

For 20% hardwood, the resulting confidence interval on the mean is

Page 44: Probability & Statistical Inference Lecture 6

Confidence Interval about on the difference of two treatments

For the hardwood concentration example,

Page 45: Probability & Statistical Inference Lecture 6

An Unbalanced Experiment

Page 46: Probability & Statistical Inference Lecture 6

Multiple Comparisons Following the ANOVA The least significant difference (LSD) is

If the sample sizes are different in each treatment:

Page 47: Probability & Statistical Inference Lecture 6

Example: Multi-comparison Test

Page 48: Probability & Statistical Inference Lecture 6

Example: Multi-comparison Test

Page 49: Probability & Statistical Inference Lecture 6

Demo

Page 50: Probability & Statistical Inference Lecture 6

Exercises