unit 5: inference for categorical variables lecture 1: inference...

23
Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101 Thomas Leininger June 12, 2013

Upload: others

Post on 16-Feb-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

  • Unit 5: Inference for categorical variablesLecture 1: Inference for proportions

    Statistics 101

    Thomas Leininger

    June 12, 2013

  • Many research questions involve proportions

    Who will win the election?

    http:// elections.huffingtonpost.com/ 2012/ results

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 2 / 23

    http://elections.huffingtonpost.com/2012/results

  • Single population proportion

    Question

    Two scientists want to know if a certain drug is effective against highblood pressure. The first scientist wants to give the drug to 1000 peo-ple with high blood pressure and see how many of them experiencelower blood pressure levels. The second scientist wants to give thedrug to 500 people with high blood pressure, and not give the drugto another 500 people with high blood pressure, and see how manyin both groups experience lower blood pressure levels. Which is thebetter way to test this drug?

    (a) All 1000 get the drug

    (b) 500 get the drug, 500 don’t

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 3 / 23

  • Single population proportion

    Results from the GSS

    The GSS asks the same question, below is the distribution ofresponses from the 2010 survey:

    All 1000 get the drug 99500 get the drug 500 don’t 571Total 670

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 4 / 23

  • Single population proportion

    Parameter and point estimate

    We would like to estimate the proportion of all Americans who have agood intuition about experimental design, i.e. would answer “500 getthe drug 500 don’t?” What are the parameter of interest and the pointestimate?

    Parameter of interest: Proportion of all Americans who have agood intuition about experimental design.

    Point estimate: Proportion of sampled Americans who have agood intuition about experimental design.

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 5 / 23

  • Single population proportion

    Inference on a proportion

    What percent of all Americans have a good intuition about experimen-tal design, i.e. would answer “500 get the drug 500 don’t?”

    We can answer this research question using a confidenceinterval, which we know is always of the form

    point estimate ±MEAnd we also know that ME = critical value × standard error ofthe point estimate.

    SEp̂ =?

    Standard error of a sample proportion

    SEp̂ =

    √p (1 − p)

    n

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 6 / 23

  • Single population proportion Identifying when a sample proportion is nearly normal

    Sample proportions are also nearly normally distributed

    Central limit theorem for proportionsSample proportions will be nearly normally distributed with mean equal

    to the population mean, p, and standard error equal to√

    p (1−p)n .

    p̂ ∼ Nmean = p,SE =

    √p (1 − p)

    n

    But of course this is true only under certain conditions...

    any guesses?

    Note: If p is unknown (most cases), we use p̂ when doing a CI and p0 when

    doing a HT.

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 7 / 23

  • Single population proportion Confidence intervals for a proportion

    Back to experimental design...

    The GSS found that 571 out of 670 (85%) of Americans answeredthe question on experimental design correctly. Estimate (using a 95%confidence interval) the proportion of all Americans who have a goodintuition about experimental design?

    Given: n = 670, p̂ = 0.85. First check conditions.

    1. Independence:

    2. Success-failure:

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 8 / 23

  • Single population proportion Confidence intervals for a proportion

    Question

    We are given that n = 670, p̂ = 0.85, we also just learned that the

    standard error of the sample proportion is SE =√

    p(1−p)n . Which of

    the below is the correct calculation of the 95% confidence interval?

    (a) 0.85 ± 1.96 ×√

    0.85×0.15670

    (b) 0.85 ± 1.65 ×√

    0.85×0.15670

    (c) 0.85 ± 1.96 × 0.85×0.15√670

    (d) 571 ± 1.96 ×√

    571×99670

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 9 / 23

  • Single population proportion Choosing a sample size when estimating a proportion

    Choosing a sample size

    How many people should you sample in order to cut the margin of errorof a 95% confidence interval down to 1%.

    ME = z? × SE

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 10 / 23

  • Single population proportion Choosing a sample size when estimating a proportion

    What if there isn’t a previous study?

    ... use p̂ = 0.5

    why?

    if you don’t know any better, 50-50 is a good guess

    p̂ = 0.5 gives the most conservative estimate – highest possiblesample size

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 11 / 23

  • Single population proportion Hypothesis testing for a proportion

    CI vs. HT for proportions

    Success-failure condition:CI: At least 10 observed successes and failures (use p̂)HT: At least 10 expected successes and failures (use p0)

    Standard error:

    CI: calculate using observed sample proportion: SE =√

    p̂(1−p̂)n

    HT: calculate using the null value: SE =√

    p0(1−p0)n

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 12 / 23

  • Single population proportion Hypothesis testing for a proportion

    The GSS found that 571 out of 670 (85%) of Americans answeredthe question on experimental design correctly. Do these data provideconvincing evidence that more than 80% of Americans have a goodintuition about experimental design?

    H0 : p = 0.80 HA : p > 0.80

    SE =

    Z =

    p − value =sample proportions

    0.8 0.85

    Conclusion:

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 13 / 23

  • Single population proportion Hypothesis testing for a proportion

    Question

    11% of 1,001 Americans responding to a 2006 Gallup survey statedthat they have objections to celebrating Halloween on religiousgrounds. At 95% confidence level, the margin of error for this survey ais ±3%. A news piece on this study’s findings states: “More than 10%of all Americans have objections on religious grounds to celebratingHalloween.” At 95% confidence level, is this news piece’s statementjustified?

    (a) Yes

    (b) No

    (c) Cannot tell

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 14 / 23

  • Small sample inference for a proportion Carnival Game

    Suppose we want to set up a carnival game at the NC state fair thisyear. Can we estimate the proportion of times people can throw a balland hit a target?

    https:// commons.wikimedia.org/ wiki/ File:Archery Target 80cm.svg

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 15 / 23

    https://commons.wikimedia.org/wiki/File:Archery_Target_80cm.svg

  • Small sample inference for a proportion Carnival Game

    Let’s build a CI

    Conditions:1 Independence: We can assume that each guess is independent

    of another.2 Sample size: Are the number of successes and failures both

    larger than 10?

    So what do we do?

    http:// lock5stat.com/ statkey/

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 16 / 23

    http://lock5stat.com/statkey/

  • Small sample inference for a proportion Paul the octopus

    Famous predictors

    Before this guy...There was this guy...

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 17 / 23

  • Small sample inference for a proportion Paul the octopus

    Paul the Octopus - psychic?

    Paul the Octopus predicted 8 World Cup games, and predictedthem all correctly

    Does this provide convincing evidence that Paul actually haspsychic powers?

    How unusual would this be if he was just randomly guessing(with a 50% chance of guessing correctly)?Hypotheses:H0 :HA :

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 18 / 23

  • Small sample inference for a proportion Paul the octopus

    Conditions

    1 Independence: We can assume that each guess is independentof another.

    2 Sample size: The number of expected successes and losses areboth smaller than 10.

    8 × 0.5 = 0.4

    So what do we do?

    Since the sample size isn’t large enough to use CLT based methods,we can use a simulation method instead.

    How could we simulate this hypothesis test?

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 19 / 23

  • Small sample inference for a proportion Paul the octopus

    Application exercise:Simulation testing for one proportion

    Which of the following methods is best way to calculate the p-valueof the hypothesis test evaluating if Paul the Octopus’ predictions areunusually higher than random guessing?

    (a) Flip a coin 8 times, record the proportion of times where all 8tosses were heads. Repeat this many times, and calculate theproportion of simulations where all 8 tosses were heads.

    (b) Roll a die 8 times, record the proportion of times where all 8 rollswere 6s. Repeat this many times, and calculate the proportion ofsimulations where all 8 rolls were 6s.

    (c) Flip a coin 10,000 times, record the proportion of heads. Repeatthis many times, and calculate the proportion of simulationswhere more than 50% of tosses are heads.

    (d) Flip a coin 10,000 times, calculate the proportion of heads.

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 20 / 23

  • Small sample inference for a proportion Paul the octopus

    Simulate

    Question

    Flip a coin 8 times. Did you get all heads?

    (a) Yes

    (b) No

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 21 / 23

  • Small sample inference for a proportion Paul the octopus

    paul 0.5

    p-value = 0.0037

    yes

    02

    46

    8

    Randomization distribution

    Frequency

    0.0 0.2 0.4 0.6 0.8 1.0

    0500

    1500

    2500

    3500

    observed 1

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 22 / 23

  • Small sample inference for a proportion Paul the octopus

    Conclusions

    Question

    Which of the following is false?

    (a) If in fact Paul was randomly guessing, the probability that hewould get the result of all 8 games correct is 0.0037.

    (b) Reject H0, the data provide convincing evidence that Paul didbetter than randomly guessing.

    (c) We may have made a Type I error.

    (d) The probability that Paul is psychic is 0.0037.

    Statistics 101 (Thomas Leininger) U5 - L1: Inf. for proportions June 12, 2013 23 / 23

    Single population proportionIdentifying when a sample proportion is nearly normalConfidence intervals for a proportionChoosing a sample size when estimating a proportionHypothesis testing for a proportion

    Small sample inference for a proportionCarnival GamePaul the octopus