Download - Probability and Statstical Inference 6

8/12/2019 Probability and Statstical Inference 6

1/54

PROBABILITY & STATISTICALINFERENCE LECTURE 6

MSc in Computing (Data Analytics)


2/54


3/54

General Steps in Hypotheses testing

1. From the problem context, identify the parameter of interest.

2. State the null hypothesis, H0.

3. Specify an appropriate alternative hypothesis, H1.

4. Choose a significance level, .

5. Determine an appropriate test statistic.6. State the rejection region for the statistic.

7. Compute any necessary sample quantities, substitute these into theequation for the test statistic, and compute that value.

8. Decide whether or not H0should be rejected and report that in theproblem context.


4/54

Type of questions that can be answered with Two sample

hypothesis tests

A manufacturing plant want to compare thedefective rate of items coming off two different

process lines.

Whether the test results of patients who received a

drug are better than test results of those who

received a placebo.

The question being answered is whether there is a

significant (or only random) difference in theaverage cycle time to deliver a pizza from Pizza

Company A vs. Pizza Company B.


5/54

Difference in Means of Two Normal Distributions, Variances

Known


6/54

Test Assumptions


7/54

Example


8/54

Example


9/54

Example

The P-Valueis the exact significance level of a statistical test; thatis the probability of obtaining a value of the test statistic that

is at least as extreme as that when the null hypothesis is true


10/54

Confidence Interval on a Difference in Means, Variances

Known


11/54

Example


12/54

Example


13/54

Difference in Means of Two Normal Distributions,

Variances unknown

We wish to test:

The pooled estimator of 2:


14/54

Difference in Means of Two Normal Distributions,

Variances unknown


15/54

Example


16/54

Example


17/54

Example


18/54

Confidence Interval on the Difference in Means, Variance

Unknown


19/54

Example


20/54

Example


21/54

Example


22/54

Practical Hypothesis Testing

1.

From the problem context, identify the parameter ofinterest.

2. State the null hypothesis, H0.

3. Specify an appropriate alternative hypothesis, H1.

4. Choose a significance level, .

5. Calculate the P-value using a software package of choice.

6. Decide whether or not H0should be rejected and report

that in the problem context. Reject H0when P-Value is lessthan .

(Golden rule: Reject H0for small )


23/54

Some Reserach

Look up the correct formula for calculating thehypotheses test between two proportions

What are the assumptions for the test

Find an example of the research


24/54

Answer to research

Large-sampletest on the difference in populationproportions


25/54

Example

Example of large-sampletest on the difference inpopulation proportions


26/54


27/54


28/54

Analysis of Variance


29/54

Introduction

In the previous section we were concerned with theanalysis of data where we compared the sample

means.

Frequently data contains more that two samples,

they may compare several treatments.

In this lecture we introduce statistical analysis that

allows us compare the mean of more that two

samples. The method is called Analysis of Variance or AVOVA for short.


30/54

Total Sum of Squares

Data set:

14, 12, 10, 6 ,4, 2

Group A:

6 ,4, 2

Group B:

14, 12, 10

Overall Mean : 8

Total Sum of Squares:

SST= (14-8)2 + (12-8)2 +

(10-8)2 + (6-8)2 + (4-8)2 +(2-8)2 =112


31/54

Between Group Variation

Sum of Squares of the

Model:

SSm= na( - a)2 + nb( -

b)2

=3*(8-4)2 +3*(8-12)2

=96


32/54

Within Group Variation

Sum of Squares of theError:

SSe=

= (14-12)2 + (12-12)2 +(10-12)2 + (6-4)2 + (4-

4)2 + (4-2)2 +

= 16

2

1 1

_ _

)(

k

i

n

j

jij xx


33/54

Structure of the Data

Group Observation Total Mean

1 x11 x12 .......... x1n x1

2 x21 x22..........

x2n x2

.

.

.

..........

a xa1

xa2

.......... xan

xa

Total

1x

2x

ax

x


34/54

ANOVA Table

Source Degrees of

Freedom

Sum Of Squares Mean

Square

F- Stat

Model a - 1 SSM /(a-1) MSM / MSE

Error n-aSSE /(n-a)

Total n-1SST /(n-1)2

1

)( xx

n

i

i

a

j

jj xxn

1

2)(

2

1 1

_ _

)(

a

i

n

j

jij xx

Where : n is the sample size and a is the number of

groups


35/54

ANOVA TableOriginal Example

Source Degrees of

Freedom

Sum Of Squares Mean

Square

F- Stat

Model 2 - 1 = 1 96 96 24

Error 62 = 4 164

Total 61 = 5 112

Where : n is the sample size and a is the number of groups


36/54

Model Assumptions

Independence of observations within and betweensamples

normality of sampling distribution

equal variance - This is also called the

homoscedasticity assumption


37/54

The ANOVA Equation

We can describe the observations in the abovetable using the following equation:

nj

ai

Y ijiij ,......,2,1

,......,2,1

Where : n is the sample size and k is the number of groups


38/54

ANOVA Hypotheses

We wish to test the hypotheses:

The analysis of variance partitions the total variability

into two parts.


39/54

Example


40/54

Graphical Display of Data

Figure 13-1 (a)Box plots of hardwood concentration data. (b) Display of

the model in Equation 13-1 for the completely randomized single-factor

experiment


41/54

Example

We can use ANOVA to test the hypotheses thatdifferent hardwood concentrations do not affect the

mean tensile strength of the paper. The hypotheses

are:

The ANOVA table is below:


42/54

Example

The p-value is less than 0.05 therefore the H0 canbe rejected and we can conclude that at least one

of the hardwood concentrations affects the mean

tensile strength of the paper.


43/54

Test Model Assumptions

Use the Bartletts Test to test for homoscedasticityassumption

Bartlett's test (Snedecor and Cochran, 1983) is used

to test if ksamples have equal variances.

Bartlett's test is sensitive to departures from

normality. That is, if your samples come from non-

normal distributions, then Bartlett's test may simply

be testing for non-normality. The Levene test is analternative to the Bartlett test that is less sensitive to

departures from normality.


44/54

Barlett Test for Equal Variance

The hypotheses for the Barlett test are as follows:

The barlett test statistic follows a chi-squared

distribution

Interpert the p-value like any other hypothese test

ji,paironleastatfor:H

...:H

22

i1

222

210

j

n

If the Assumption of Equal Variance is


45/54

If the Assumption of Equal Variance is

not met

If the assumption for equal variance is not met usethe Welches ANOVA

Assignment for next week:

Investigate the difference between the standard

ANOVA and Welches ANOVA?


46/54

Demo


47/54

Confidence Interval about the mean

For 20% hardwood, the resulting confidence interval on the mean is


48/54

Confidence Interval about on the difference of two treatments

For the hardwood concentration example,


49/54

Multiple Comparisons Following the


50/54

Multiple Comparisons Following the

ANOVA The least significant difference (LSD) is

If the sample sizes are different in each treatment:


51/54

Example: Multi-comparison Test


52/54

Example: Multi-comparison Test


53/54

Demo


54/54

Exercises

Download - Probability and Statstical Inference 6

Top Related