Transcript
  • One-Sample t-Test

    Scenario:

    We are testing if the mean for a population is equal to a value.

    We dont know the variance for the population, so it must be estimated with 2.

    Assume the data are normally distributed or, if 30, we can use the CLT.

    Hypothesis test:

    0: = 0 vs. 1: 0 0: = 0 vs. 1: > 0 0: = 0 vs. 1: < 0

    A logical test statistic:

    =0

    But what is its distribution?

  • Distribution

    When the population standard deviation is unknown, use the sample standard deviation as an estimate:

    ==1 ( )

    2

    1

    Substituting for :

    /

    /

    If either:

    o The sample size is small ( < 30) but the underlying population distribution is normal

    or

    o The sample size is large ( 30)

    Then:

    / has a -distribution with 1 degrees of freedom ()

  • Distribution

    One logical question might be why does this thing that

    looks a lot like the Z-statistic not have a normal distribution?

    The simplest answer is that the population standard deviation

    is being estimated by the sample standard deviation.

    It also a sampling distribution, like the sample mean does,

    though its distribution looks much different from the

    sampling distribution for the means.

    It is a skewed distribution.

    It is related to what is called the chi-squared distribution.

    The added variability due to having to estimate the variance makes the test statistic have a distribution with fatter tails.

    Essentially, there is more uncertainty in the test, which is captured in the distribution.

    =

    / has a -distribution with 1 degrees of freedom ()

  • Distribution

    Students distribution William Gossett and Guinness

    Similar to the standard normal distribution

    values range from to = 0 and symmetric about = 0 Instead of , spread/shape defined by degrees of freedom (). This is its only parameter. As increases, distribution approaches the standard normal distribution

    =

    +1

    2

    2

    1 +2

    +1

    2,

    where is the gamma function, and is the degrees of freedom.

    = 0

    =

    2, for > 2

  • Distribution

    What are Degrees of Freedom ()?

    Number of data values in the sample that are free to vary when estimating parameters

    Suppose we know that the sample mean of 5 values is equal to 10. In other words, = 10 is an estimate of the parameter based on a sample of = 5 values. However, we dont actually know the individual values of the sample. If we were to guess the 5 values, note that we would be free to guess any value for the first 4. However, once weve guessed 4 numbers, the last number

    must be chosen such that the average comes out to 10.

    For example: ____ ____ ____ ____ ____

    =8 + 23 + 12 + 4 + 5

    5= 10 5 = 3

    = 1 = 5 1 = 4

    8 23 12 4 3

    FreeFreeFree Free Fixed 4 degrees of freedom

  • One-Sample t-Test (Example)Suppose that non-psychology UCM students mean IQ is 110 and their IQs are normally distributed. We randomly sample 9 students from UCMs psych department and give them an IQ test. The average IQ of the sample is 117 with variance 121. Assuming psych student IQs are also normally distributed. Are the psych students IQ levels significantly larger than the average non-psych students IQ? Use = .05.

    What are the null and alternative hypotheses?

    0: = 110 vs. 1: > 110

    Because the population sd is not known, a t-statistic is used:

    =0

    2

    =117110

    121 9=

    7

    11 3 1.91

    Critical value method:

    = 1 = 9 1 = 8

    One-tailed test with = .05

    = 1.860. The area beyond the critical value (1.860) is the critical region where we would reject 0.

    Since > , we reject 0.

    Thus, we can conclude that the mean IQ levels of UCM psych students is larger than UCM non-psych students.

  • Confidence Interval for ( )

    Suppose we are interested in the average high score of millions all over the world who play a very popular computer game.

    Unfortunately, the server does not keep a record of high scores, so we cannot simply determine the average score of the entire

    population (true mean ). We do not know the population standard deviation either. However, all individuals do know their own high score, and we also happen to know that the population high scores are not normally distributed. We take a

    random sample of 121 players and calculate their mean high score to be 5000 and standard deviation to be 1000.

    What is the 95% confidence interval for ?

    o Population distribution not normal

    o = 1000 ( unknown)

    o = 121 ( 30; Central Limit Theorem holds)

    o = 5000

  • Confidence Interval for () As with all confidence intervals, we need to know what the point estimate,

    appropriate multiplier, and standard errors are.

    The point estimate is the estimate of the population parameter. Here, is estimated by .

    The standard error is the standard deviation of the distribution of the point estimate. Here we are estimating the standard error, we showed that the standard error of is =

    . When we dont know , this is estimated with

    .

    Finally, the appropriate multiplier is a value determined by the distribution of the point estimate and the desired level of confidence. For the t-distribution, a value of /2, is the appropriate multiplier for a

    100(1-)% confidence interval, where = 1.

    100(1-)% CI for 1 2(): /2,

  • /2,1 /2,1

    Confidence Interval for ()

    % Confidence Level =

    100 = 1 = 1

    100

    = /2,1 <

    / < /2,1

    /2,1 <

    / < /2,1

    /2,1

    < < + /2,1

    2

    2

    / has a distribution with = 1

    % Confidence Interval

    < <

    , = /2,1

    /2,1: critical value

    : standard error

    /2,1

    : margin of error

  • 0.025,5 = 2.57 0.025,5 = 2.57

    = 0.95

    2= 0.025

    2= 0.025

    0.025 = 1.96 0.025 = 1.96

    2= 0.025

    = 0.95

    2= 0.025

    Confidence Interval for ( vs. )

    95% Confidence Interval using ( = 5) 95% Confidence Interval using

    , = /2,1

    = 2.57

    , = /2

    = 1.96

  • Confidence Interval for ( )

    Suppose we are interested in the average high score of millions all over the world who play a very popular computer game.

    Unfortunately, the server does not keep a record of high scores, so we cannot simply determine the average score of the entire

    population (true mean ). We do not know the population standard deviation either. However, all individuals do know their own high score, and we also happen to know that the population high scores are not normally distributed. We take a

    random sample of 121 players and calculate their mean high score to be 5000 and standard deviation to be 1000.

    What is the 95% confidence interval for ?

    o Population distribution not normal

    o = 1000 ( unknown)

    o = 121 ( 30)

    o = 5000

    = 0.05 (95% confidence)

    = 1 = 121 1 = 120

    95% (): , = /2,1

    = 5000 1.980

    1000

    121= [4820, 5180]

    95% (): , = 2

    = 5000 1.960

    1000

    121= [4821.82, 5178.18]

    / has -distribution but is well approximated by standard normal

    /2,1 = 0.025,120 = 1.980

  • Confidence Interval for ()

    Suppose we are interested in the average high score of millions all over the world who play a very popular computer game.

    Unfortunately, the server does not keep a record of high scores, so we cannot simply determine the average score of the entire

    population (true mean ). We do not know the population standard deviation either. However, all individuals do know their own high score, and we also happen to know that the population high scores are normally distributed. We take a

    random sample of 25 players and calculate their mean high score to be 5000 and standard deviation to be 1000.

    What is the 99% confidence interval for ?

    o Population distribution normal

    o = 1000 ( unknown)

    o = 25 ( < 30)

    o = 5000

    = 0.01 (99% confidence)

    = 1 = 25 1 = 24

    99% (): , = /2,1

    = 5000 2.797

    1000

    25= [4440.6, 5559.4]

    / has -distribution

    /2,1 = 0.005,24 = 2.797

  • Confidence Interval for () Founded in 1998, Telephia provides a wide variety of information on cellular phone use. In 2006, Telephia reported that, on average, United Kingdom

    (UK) subscribers with 3G phones spent an average of 8.3 hours per month listening to full-track music on their cell phones. Suppose we

    hypothesize that US subscribers are different from UK subscribers in their phone usage. Say we draw a random sample of size 8 from the US

    population of 3G subscribers. Further suppose (unrealistically) that the distribution of time usage follows a normal distribution. Suppose we are

    interested in constructing a 95% confidence interval for the mean usage for US subscribers and using that to test our hypothesis. What would the

    95% confidence interval about the population mean time of US subscribers look like? With =. , can we conclude that US subscribers have a different mean time usage than UK subscribers?

    Sample: 5, 6, 0, 4, 11, 9, 2, 3

    What are the null and alternative hypotheses?

    0: = 8.3 vs. 1: 8.3

    What is , and what is s?

    = 5

    = 3.625

    What is the confidence interval?

    = 0.05 (95% confidence)

    = 1 = 8 1 = 7

    95% CI: 5 2.3653.625

    8= 5 3.031 = 1.969, 8.031

    What is the conclusion?

    The CI does not contain 8.3, so we reject 0.

    Substantively, this means that we conclude that US 3G subscribers mean time usage is statistically significantly different from 8.3 hours per month (UK subscribers mean time).

    /2,1 = 0.025,7 = 2.365


Top Related