s ection 2.2 s tatistical i nference from s ample to p opulation

35
SECTION 2.2 STATISTICAL INFERENCE FROM SAMPLE TO POPULATION

Upload: shavonne-hood

Post on 23-Dec-2015

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

SECTION 2.2

STATISTICAL INFERENCE FROM SAMPLE TO POPULATION

Page 2: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

BIG IDEA OF THE DAY

Chapter 1 sampling from a process Observing a small number of attempts from an

infinite number of possible attempts. Buzz and Doris do the experiment forever.

Chapter 2 sampling from a finite population A limited (finite) number of individuals Sample a portion from the finite population

How does this affect the chance model?

Page 3: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

Have you ever pretended to be talking on a cell phone to avoid interacting with people around you?

Pew Research Center surveyed a random sample of 1,858 American cell phone users and found 13% admitted to faking cell phone call in the past 30 days.

iphone AppFake-A-Call ™By Excelltech Inc.

Page 4: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

What are the: Observational units?Variable of interest?Population? Sample? Parameter of interest?Statistic for this study?

Page 5: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

Does this survey convince you that more than 1 in 10 cell phone users in the U.S. has engaged in such fake cell phone use in the past 30 days?

If not, what could be another explanation for the survey results?

Page 6: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

The sample result (13%) is greater than 1 in 10 (or 10%).

It’s possible that 10% of the population of cell phone users have faked a call and the researchers just happened to have a higher percentage by the luck of the draw.

How plausible (believable) is this explanation for the higher sample percentage?

Page 7: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

Research Question: Do more than 1 in 10 cell phone users in the U.S. admit to engaging in fake cell phone use in the past 30 days?

What is the Null Hypothesis?  10% of all American cell phone users admit to

faking a call in the past 30 days

What is the Alternative Hypothesis? More than 10% of all American cell phone users

admit to faking a call in the past 30 days

Page 8: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

Notice that the null and alternative hypotheses are statements about the unknown parameter.

We don’t know what proportion of all cell phone users is. (We don’t know the parameter.)

We only have information from 1858 cell phone users. (We know the statistic.)

Do the 1858 responders give us meaningful information about this proportion for the entire population of all cell phone users (the parameter)?

Page 9: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

As before, assume the null hypothesis true If the observed statistic is unlikely to have

happened by chance alone, evidence against the null hypothesis in favor of the alternative hypothesis.

The p-value assesses the probability we would get a sample proportion as large as 0.13 if in fact the proportion of all cell phone users faking calls is only 0.10.

Page 10: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

If the null hypothesis is true, what is the probability that the first person selected will admit to faking a call?This probability is equal to the parameter

(the proportion of all cell phone users who admit to faking a call), under the null hypothesis, which is 0.10.

Page 11: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

WHAT IS THE IMPACT OF SAMPLING FROM A FINITE POPULATION RATHER THAN A PROCESS? If the first person admits to faking a call,

what is the probability that the second person selected also admits faking a call?Sampling “without replacement.” Once someone is selected, they aren’t replaced

before the next person is selected. If there were 255 million cell phone users when

the first person is selected, then there are 254,999,999 cell phone users when the second person is selected.

The probability that the second person selected also feels this way: 25,499,999/254,999,999 ≈ 0.099999996.

Page 12: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

When the population size is large (more than 10 or 20 times the size of the sample), still consider random sampling from a finite population to be equivalent to random sampling from a process.

Is it safe to use a model of random sampling for sampling from a finite population in this study?

Model the probability of success as the same for each observational unit in our sample.

Under the null hypothesis, every person selected has a 10% chance of admitting to faking cell phone calls in the past 30 days.

Page 13: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

We can conduct the same type of simulation we used in chapter 1 on processes for sampling from a population.

Let’s go to an applet and try this. Remember we are testing to see if the

population proportion of cell phone fakers is more than 10% and the results of our poll showed that 13% of 1858 respondents admitted faking.

Page 14: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

We have convincing evidence that the sample proportion of 0.13 didn’t just “happen by chance.”

Thus we have very strong evidence that the population proportion of cell phone users who will admit faking a call is larger than 0.10.

Page 15: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

SUMMARY

Null Hypothesis: 10% of all cell phone users admit to faking a call in the past 30 days

Alternative Hypothesis: More than 10% of cell phone users admit to taking a call in the past 30 days

We simulate and find a tiny p-value (≈ 0).

Thus we have very strong evidence that the population proportion of cell phone users who will admit faking a call is larger than 0.10.

Page 16: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

Random Sampling Error Still possible that the researchers were

unlucky. However, the probability of this is so

low, a more believable explanation is that the population proportion does indeed exceed 0.10.

Random sampling also allows us to estimate how much “random sampling error” we expect.

Page 17: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

The sampling error is roughly where n is the number of observational units (sample size) in the sample.

Therefore, the sampling error in our poll is about Therefore, we can expect our sample percentage to be within 2.3% of the population proportion.

The 10% we were testing is outside of this range. 

Page 18: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

The sampling error is roughly where n is the number of observational units in the sample.

What if we only sample 500 people? The sampling error in this smaller poll is

about Therefore, we can expect our sample percentage to be within 4.5% of the population proportion.

The 10% we were testing is NOT outside of this range.

Page 19: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.2: FAKE CELL PHONE CALLS

We reduce variability (or random sampling error) with larger sample sizes.

Simple random sampling gives a lot of predictability in sample proportions

This predictability depends strongly on the sample size, not on the population size as long as the population size is large. 

There is also the possibility of nonsampling error. We will talk about this in the next section.

Page 20: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

Let’s work on Exploration 2.2: Gettysburg Address Revisited

Page 21: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

SECTION 2.3: NONSAMPLING ERRORS

Simple random sampling is an unbiased way to take a sample, but we did see that there still could be random sampling error.

We can calculate random sampling error. However, other things can go wrong when

you use a sample to infer something about a population. These other things are lumped together in what we call nonsampling errors.

Page 22: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXAMPLE 2.3: THE BRADLEY EFFECT

In 1982, Tom Bradley (D) ran against George Deukmejian (R). (Duke-may-jon)

Polls showed that Bradley had a significant lead on Deukmejian shortly before the election as well as in exit polls. However, Deukmejian narrowly defeated Bradley.

Page 23: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

THE BRADLEY EFFECT

After the election, research suggested that a smaller percentage of white voters had voted for Bradley than polls had predicted and a very large proportion of voters who, in the polls, claimed to be undecided, had voted for Deukmejian.

Some people may answer polling questions in the way they think the interviewer wants them to answer—the politically correct way.

Some argue that that is what happened in this election, a number of white voters said they would vote for Bradley to an interviewer, but in the anonymity of the voting booth, voted for the Deukmejian candidate

Page 24: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

OBAMA VS CLINTON: NEW HAMPSHIRE 2008

The same sort of thing happened in the New Hampshire Democratic Presidential Primary in 2008.

Polls showed Barak Obama with a significant lead over Hillary Clinton. (41% to 28% with a sample size of 778.)

Clinton won that election with 39% of the vote compared to Obama’s 36%.

Page 25: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

SOME MORE DETAILS

The poll used random digit dialing to get respondents for the poll.

Only 9% of those whose phone numbers were chosen actually responded to the question. Others either didn’t answer their phone or refused to answer the question.

Page 26: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

OUR MODEL MAKES THE FOLLOWING ASSUMPTIONS

1. Random digit dialing is a reasonable way to get a sample of likely voters.

2. The 9% of individuals reached by phone who agree to participate are like the 91% who didn’t.

3. Voters who said they plan to vote in the upcoming Democratic primary will vote in the upcoming primary.

4. Respondents answers to who they say they will vote for, matches who they actually vote for in the primary.

Page 27: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

Assumption #1. Random digit dialing is a reasonable way to get a sample of likely voters Random digit dialing is roughly equivalent to

a simple random sample of all New Hampshire residents who have a landline or cell phone, except for slightly over-representing individuals who have more than one phone. Random digit dialing is a common survey technique in cases where a sampling frame (list of all members of the population) is unavailable.

Page 28: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

Assumption #2. The 9% of individuals reached by phone who agree to participate are like the 91% who didn’t The assumption is that the respondents are

like the non-respondents. Although the response rate was very low, it is line with many polls and other surveys conducted by phone. So, though it is possible for non-respondents to be the cause of the bias observed, many other political surveys conducted around the same time had similar response rates, but no bias. Of course, there is no guarantee that the 9% are representative.

Page 29: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

Assumption #3 Voters who said they plan to vote in the upcoming Democratic primary will vote in the upcoming primary It is typical to ask voters whether they plan

to vote in the upcoming election/primary. But, there is no guarantee that they actually will.

Page 30: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

Assumption #4 Respondents answers to who they say they will vote for matches who they actually vote for in the primary. There is no guarantee that people won’t do

something different in the voting booth than they say they will do when on the phone.

They could just change their mind or they could not be honest with the polling interviewer.

Page 31: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

The American Association for Public Opinion Research conducted an independent investigation and concluded the following were among the most likely explanations for the discrepancies: People changed their opinion about who they were

voting for at the last minute. (Assumption #4) People in favor of Hillary Clinton were more likely to

be non-respondents. (Assumption #2) Social desirability based on the race of the

interviewer. (Assumption #4). Black telephone interviewers were more likely to generate

respondents who were in favor of Obama than were white interviewers. This is an example of the Bradley effect.

Clinton was listed before Obama on every ballot (Assumption #4)

Page 32: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

If assumption 4 is not valid, as seemed to have been plausible here, then even a pure random sample would have still exhibited this discrepancy in the results.

Simple random samples should produce a representative sample, but do nothing to control for the action of the individuals in the sample.

Having respondents change their minds or misrepresent their answers are examples of nonsampling errors, reasons why the statistic may not be close to the parameter that are separate from sampling errors.

Page 33: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

DO YOU AGREE WITH THIS STATEMENT?

“Quality of life lies in knowledge, in culture. Values are what constitute true quality of life, the supreme quality of life, even above food, shelter and clothing.” 

Thomas Jefferson

Page 34: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

HOW ABOUT THIS ONE?

“Quality of life lies in knowledge, in culture. Values are what constitute true quality of life, the supreme quality of life, even above food, shelter and clothing.” 

Fidel Castro

Page 35: S ECTION 2.2 S TATISTICAL I NFERENCE FROM S AMPLE TO P OPULATION

EXPLORATION 2.3

Let’s work on Exploration 2.3. I won’t be giving you a survey like it says in

the exploration. We will just discuss them.