inference we want to know how often students in a medium-size college go to the mall in a given...

67

Upload: jasper-mccoy

Post on 13-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 2: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Inference

We want to know how often students in a medium-size college go to the mall in a given year.

We interview an SRS of n = 10. If we interviewed lots of SRSs, the “average

sample frequency of visits” would be centered around the true “average population frequency of visits.”

Page 3: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

45 50 550

0.5

1

48 50 520

1

2

48 50 520

1

2

45 50 550

1

2

48 50 520

0.5

1

48 50 520

1

2

45 50 550

1

2

48 50 520

0.5

1

48 50 520

0.5

1

48 50 520

2

4

45 50 550

0.5

1

48 50 520

1

2

45 50 550

1

2

48 50 520

1

2

45 50 550

0.5

1

45 50 550

0.5

1

49.6896 49.7618 50.0742 49.8520 50.0590 50.3243 49.2806 49.6056 50.4129 49.3963 49.3617 49.7741 49.9237 50.2201 49.2904 50.1797

Page 4: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Inference

Suppose that instead we interviewed an SRS of n = 400.

Our estimates will be more reliable because estimates from other SRSs would be similar … that is, our estimates would be less variable.

Page 5: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

45 50 550

50

49.9496 50.0165 50.0941 50.0573 50.1674 50.1402 50.0506 50.0838 49.9865 50.0195 49.9752 49.9439 49.9738 49.9966 50.0396 49.9819

Page 6: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Because we didn’t have money for 16 separate samples, we actually only collected data from the first sample, whose sample mean is = 49.9496.

Is the true number actually 50? Is the difference between 50 and 49.9496 purely a fluke? Does this result exclude 50 as a possibility?

Page 7: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

The Central Limit Theorem says that if the entire population has a mean and a standard deviation , then in repeated samples of size n the sample mean approximately follows a Normal distribution

nNx ,~

Page 8: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

The first sample had a mean = 49.9496 and a standard deviation = 1.0264. x

n Strd Dev of = =

Sample A 10 0.324576 = 1.0264/sqrt(10)

Sample B 400 0.0513 = 1.0264/sqrt(400)

x

x x nx

Page 9: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 10: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

We know that 95% of all observations fall within ± two standard deviations of the mean.

Likewise, 95% of all sample means fall within ± two standard deviations of the observed sample mean.

So, for 1900 out of 2,000 samples, the interval will contain the true population mean.

nx x2

Page 11: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

2 x (0.0513) 2 x (0.0513)

Page 12: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Now there are two possibilities. Either1. the true population mean is contained in the

interval

2. or this is one of those 5% of samples whose interval does not contain the true value.

4000264.129496.49,

4000264.129496.49

(49.8470, 50.0522)

nx x2

Page 13: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 14: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 15: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

C is typically set at 95%, but it’s sometimes chosen to be 90% or 99%.

STATA Exercise 1

Page 16: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

z* = 1.96 if C=95%-z*= - 1.96 if C=95%

Page 17: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

So don’t use 2 when constructing a 95% CI: use 1.96.

Page 18: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

If the margin of error is too large…

Reduce – is determined by the population: a population

with a lot of variability will increase the chance that a sample contain observations very far from the true mean.

– This is easier to say than to do.

Page 19: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

40 600

50

16 samples. The of the population increases from 1 to 4, increasing the spread of the sample and the likelihood of getting wrong.

Page 20: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

If the margin of error is too large…

Increase the sample size (larger n)

Page 21: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 22: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 23: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

If the margin of error is too large…

Be less confident of your estimate …

Use a lower confidence level (make C smaller, hence a smaller z*)

Page 24: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

If the margin of error is too large…

“We’re 99% sure that the President will receive 51.5% of the votes, with a ±5% margin of error.”

“We’re 95% sure that the President will receive 51.5% of the votes, with a ±3% margin of error.”

“We’re 90% sure that the President will receive 51.5% of the votes, with a ±1% margin of error.”

Page 25: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Cautions

1. Is it an SRS?2. Is the data unbiased (or do we know the

bias)?3. Are there no outliers that influence the

sample mean?4. Is n large? If not, is the underlying

population Normally distributed?5. Do you know the true ?

Theorems of mathematical statistics are true; statistical methods are effective only when used with skill.

Page 26: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Cautions

FALSE: “The probability that the true mean falls within is 95%”

– This is false because either the interval contains the true population mean (which is not a random variable), with Pr=1, or it doesn’t, with Pr=0.

TRUE: “The probability that the interval is one of the ones that contain the true mean is 95%”

nx x2

nx x2

Page 27: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Tests of Significance

Page 28: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Making claims about the population parameters

In our sample, we observed a mean of 49.9496 visits to the mall per year.– Assuming that the true population mean is 50,

how likely is it that we observe a sample mean as small as 49.9496, or even smaller?

– if the true population mean were 45, how likely is it that we observe a sample mean as large as 49.9496, or even larger?

Page 29: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

n

xz

x

-0.0252

4001.0264

5049.9496

z

2.4748

4001.0264

4549.9496

z

Making claims about the population parameters

Page 30: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

x-0.0252 2.4748

if the true population mean were 45, how likely is it that we observe a sample mean as large as 49.9496? Pr=0.68%

If the true population mean were 50, how likely is it that we observe a sample mean at least as small as 49.9496? Pr=49%

Making claims about the population parameters

Page 31: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

We found that if the true mean is 45, the Pr of observing a sample mean as large as 49.9496 is 0.68%. Either– we’ve observed a very rare event (our sample is

really unusual)– the true mean is not 45. There’s another number

that makes the observed sample more likely.

Making claims about the population parameters

A sample outcome that would be extreme if a hypothesis were true

is evidence that they hypothesis is not true.

Page 32: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

A sample outcome that would be extreme if a hypothesis were true

is evidence that they hypothesis is not true.

Page 33: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

H0: =45

Ha: 45This is a two-sided alternative hypothesis

These are hypothesis about the population.

Page 34: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 35: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 36: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Test Statistics

A test statistic measures compatibility between the null hypothesis and the data.

The z-score can be used as a test statistic because we can compare it against 1.96, the z-score that delimits a 0.95 area under the Normal curve.– 1.96 is called the appropriate “critical value”.

Page 37: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Test Statistics

The Student’s t Distribution is used when n is small.

It approximates the Standard Normal, z-distribution as n gets large.

Page 38: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Test Statistics

We know that 95% of all values are between 2 standard deviations of the mean.

That is, 95% of all values are between the z-score of 1.96 and the z-score of -1.96.

So if we get a sample outcome whose z-score is greater than 1.96 (in absolute value), we know that it it is unlikely to belong to the population of which the null hypothesis is a parameter.

Page 39: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Suppose– n = 110– = 26.4– x = 8.1– H0: = 0

– Ha: 0

3.22

11026.4

01.8

z

Page 40: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

3.22

11026.4

01.8

z

Page 41: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Exercise

A company makes cellphones using components from two countries: Ecuador and Canadaguay. Here are data on days of cellphone durability.

Your retail shop buys 100 cellphones because the manufacturer claims they were made in Ecuador. On average, they stop working after 279 days of use.

Is this difference (279 days versus 300 days) significant? Is it a fluke or does it mean something?

# days till broken

Ecuador 300 100

Canadaguay 100 50

Page 42: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Exercise

The null hypothesis is that the phone typically lasts 300 days.

Alternatively, it’s a lower quality phone.

The z-score can tell us how far this observation is from the mean.

Look up in table A the probability of observing a z-score as small as this or smaller.

100100

300279

n

xz

300:0 H

1.2

300: aH

%79.1

0179.0

valueP

Page 43: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Exercise

Suppose the parameters were, instead

Now, is this difference (279 days versus 300 days) significant? Is it a fluke or does it mean something?

# days till broken

Ecuador 300 200

05.1 %69.14

1469.0

valueP

100200

300279

n

xz

Page 44: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Exercise

Suppose average durability of the 100 cellphones was, instead, 90 days.

Now, is this difference (90 days versus 300 days) significant? Is it a fluke or does it mean something?

# days till broken

Ecuador 300 200

21%0

0000.0

valueP

100100

30090

n

xz

Page 45: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

We found that if the true mean is 45, the Pr of observing a sample mean as large as 49.9496 is 0.68%.Notice that here H0: = 45Ha: > 45

This is a one-sided alternative hypothesis

Page 46: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 47: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 48: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 49: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Look this up in Table D, 20-1 degrees of freedom.

We have to use the Student’s t because n is small.

Page 50: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 51: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Tests for Population Mean

1. State the hypothesis

2. Calculate the test statistics

3. Find the P-value

4. State your conclusion in the context of your specific setting

Page 52: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 53: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

C = 1- for two-sided tests

Page 54: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

= 0.0068 x = 0.8404 H0: = 0.86 n = 3

Look in Table D for the z-score on a two-tailed 1% significance level (look in the 0.005 column) for df = 3-1.

Is it smaller (in absolute value) than - 4.99?

99.4-

30.0068

86.08404.0

t

Page 55: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

= 0.0068 x = 0.8404 H0: = 0.86 n = 3 The 99% CI is

( 0.8014 , 0.8794 )

cii 3 0.8404 0.0068, level(99)

ntxntx nn *1

*1 ,

Look up the t* for df=3-1, upper tail probability 0.005

Page 56: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 57: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 58: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

P-values versus a fixed

If the z-score is - 4.99, the corresponding p-value is 0.0000006The p-value is the smallest level of at which the data are significant.

Remember that C = 1- for two-sided tests, and that bigger Confidence means wider CI. “The smallest level of ” then mean the largest C and widest CI that will still contain the hypothesized value.

Page 59: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

H0: x

p-value

Page 60: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

If the P-value is larger than the chosen significance level , we say that the statistic is not significant.

If the P-value is smaller than the chosen significance level , we say that the statistic is not significant.

Page 61: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Using Significance Tests

Page 62: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

. tabstat guess grade diff if position<8

stats | guess grade diff---------+------------------------------ mean | 76 98.41428 22.41428----------------------------------------

. tabstat guess grade diff if position>=8

stats | guess grade diff---------+------------------------------ mean | 64.375 82.49375 18.11875----------------------------------------

Is it true that, on average, people who finish earlier tend to do better?

(Notice causality is not determined).

Page 63: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Significance Tests

H0 is our hypothesis: how plausible is it, given the data, our statistic, and its sampling variation?– If a priori H0 seems true, very small p-values will

be needed to convince people that H0 are wrong. A small p-value means that your

estimated statistic is so far from H0

that it’s unlikely that your statistics isderived from a population where H0 istrue.

H0

Page 64: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Significance Tests

H0 is our hypothesis: what are the consequences of rejecting H0.– If rejecting H0 led to huge changes in our

behavior, with large costs, we’ll need to be very convinced.

H0

Page 65: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed
Page 66: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Significance Tests

Decide on a significance level, .– Remember = 1 - C, where C is the confidence

level

Check if the P-value is below your pre-decided significance level.

H0: x

p-value

H0: x

p-value

Page 67: Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed

Significance Tests

Check for the practical significance (the actual size of the number) of a statistic that is statistically significant.

Do exploratory data analysis.– Check for outliers.– Check for the Normality of the data.

Report confidence intervals.

Excel and icosahedron exercise 1