handout seven

Upload: doomy-jones

Post on 21-Feb-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 Handout Seven

    1/55

    Introduction to Probability and Statistics

    Handout #7

    Instructor: Lingzhou Xue

    TA: Daniel Eck

    The pdf file for this class is available on the class web page.

    1

  • 7/24/2019 Handout Seven

    2/55

    Chapter 8

    Fundamental Sampling Distributions and

    Data Descriptions

    2

  • 7/24/2019 Handout Seven

    3/55

    Sample Mean X and Sample Variance S2.

    Histogram and Box Plot.

    Central Limit Theorem (CLT).

    2, t, and F Distributions.

    3

  • 7/24/2019 Handout Seven

    4/55

    Example 1: Sample Distribution

    The sample distribution is the distribution resulting from the

    collection of actual data. A major characteristic of a sample is

    that it contains a finite (countable) number of scores, the num-

    ber of scores represented by the letter n. For example, supposethat the following data were collected:

    15 14 15 18 15 20 15 16 17 14 17 13 11 14 18 12 17 12 21 8

    14 17 14 12 13 15 15 16 17 14 16 13 14 15 18 16 16 17 14 15

    16 15 17 12 14 14 13 13 13 14

    These numbers constitute a sample distribution.

    4

  • 7/24/2019 Handout Seven

    5/55

    Histogram

    x

    De

    nsity

    8 10 12 14 16 18 20

    0.0

    0

    0.0

    5

    0.1

    0

    0.1

    5

    0.2

    0

    Sample Distribution.

  • 7/24/2019 Handout Seven

    6/55

    In addition to the frequency distribution, the sample distribu-

    tion can be described with numbers, called statistics. Examples

    of statistics are the mean, median, mode, standard deviation,

    range, and correlation coefficient, among others.

    If a different sample was taken, different scores would result.

    However, there would also be some consistency in that while the

    statistics would not be exactly the same, they would be simi-

    lar. To achieve order in this chaos, statisticians have developed

    probability models.

  • 7/24/2019 Handout Seven

    7/55

    Histogram

    x

    Density

    8 10 12 14 16 18 20

    0.0

    0

    0.0

    5

    0.1

    0

    0.1

    5

    0.2

    0

    Histogram

    x

    Density

    10 12 14 16 18 20 22

    0.0

    0

    0.0

    5

    0.1

    0

    0

    .15

    Histogram

    x

    Density

    8 10 12 14 16 18 20

    0.0

    0

    0.0

    5

    0.1

    0

    0.1

    5

    Histogram

    x

    Density

    10 12 14 16 18 20

    0.0

    0

    0.0

    5

    0.1

    0

    0.1

    5

    0.2

    0

  • 7/24/2019 Handout Seven

    8/55

    Random Sampling

    5

  • 7/24/2019 Handout Seven

    9/55

    Population

    A population consists of the totality of the observations with

    which we are concerned.

    It is the entire group we are interested in, which we wish to

    describe or draw conclusions about.

    Sample

    A sample is subset of a population.

    6

  • 7/24/2019 Handout Seven

    10/55

    2008 Presidential Race from CNN.

    7

  • 7/24/2019 Handout Seven

    11/55

    Example 2

    If you wanted to find out the percentage of students at UMNwho enjoy reading Time. If we randomly select 20% of the pop-

    ulation, this selection would be the sample in this experiment.

    Therefore, the population would be all of the students who

    attend UMN.

    8

  • 7/24/2019 Handout Seven

    12/55

    A simple random sample of size n consists ofn individuals fromthe population chosen in such a way that every set ofn individuals

    has an equal chance to be the sample actually selected.

    Random Sample

    Let X1, X2, . . . , X n be n independent random variables, each

    having thesameprobability distributionf(x). DefineX1, X2, . . . , X n

    to be a random sample of size n from the population f(x) and

    write its joint probability distribution as

    g(x1, x2, . . . , xn) =f(x1)f(x2) f(xn).

    9

  • 7/24/2019 Handout Seven

    13/55

    Some Important Statistics

    10

  • 7/24/2019 Handout Seven

    14/55

    Statistic

    A statistic is a function of random variables that does not de-

    pend upon any unknown parameter.

    Sample Mean & Sample Variance

    If X1, X2, . . . , X n represent a random sample of size n, then the

    sample mean is defined by the statistic

    X=1

    n

    ni=1

    Xi

    and the sample variance is defined by the statistic

    S2 = 1

    n 1n

    i=1

    (Xi X)2.

    11

  • 7/24/2019 Handout Seven

    15/55

    Example 3

    A comparison of coffee prices at 4 randomly selected grocery

    stores in San Diego showed increases from the previous month

    of 12, 15, 17, and 20 cents for 1-pound bag. Find the variance

    of this random sample of price increases.Solution:

    x=1

    n

    4i=1

    xi = 16 cents.

    s2 = 14 1

    4

    i=1

    (xi x)2 =343

    .

    12

  • 7/24/2019 Handout Seven

    16/55

    Theorem

    If S2 is the variance of a random sample of size n, we may write

    S2 = 1

    n(n

    1)

    n

    n

    i=1

    X2i

    n

    i=1

    Xi

    2

    .

    Proof:

    13

  • 7/24/2019 Handout Seven

    17/55

    Example 4

    Find the sample mean and variance of the data 3, 4, 5, 6, 6, and

    7, representing the number of trout caught by a random sample

    of 6 fishermen on June 19, 1996, at Lake Muskoka.

    Solution:

    6i=1

    x2i = 171,6

    i=1

    xi= 31, n= 6.

    Hence,

    s2 = 15 6[6 171 312] =136 .

    Thus the sample standard deviation s=

    13/6 = 1.47.

    14

  • 7/24/2019 Handout Seven

    18/55

    Example 5

    The numbers of incorrect answers on a true-false competency

    test for a random sample of 15 students were recorded as follows:

    2, 1, 3, 0, 1, 3, 6, 0, 3, 4. Find

    the sample mean;

    the sample variance.

    15

  • 7/24/2019 Handout Seven

    19/55

    Mode

    The mode in a list of numbers refers to the list of numbers thatoccur most frequently. A trick to remember this one is to

    remember that mode starts with the same first two letters that

    most does. Most frequently - Mode. Youll never forget that

    one!

    Median

    The median is the middle value in your list. When the totals of

    the list are odd, the median is the middle entry in the list after

    sorting the list into increasing order. When the totals of the

    list are even, the median is equal to the sum of the two middle

    (after sorting the list into increasing order) numbers divided by

    two. Thus, remember to line up your values, the middle number

    is the median! Be sure to remember the odd and even rule.

    16

  • 7/24/2019 Handout Seven

    20/55

    Data Displays and Graphical Methods

    17

  • 7/24/2019 Handout Seven

    21/55

    Box Plot

    A box plot (also known as a box-and-whisker diagram or plot orcandlestick chart) is a convenient way of graphically depicting the

    five-number summary, which consists of 25% percentile (lower

    quartile or first quartile (Q1)), median, 75% percentile (upper

    quartile or third quartile (Q3)) and adjust values; in addition,

    the boxplot indicates which observations, if any, are considered

    unusual, or outliers.

    Outlier

    Outliers are observations that are considered to be unusually far

    from the bulk of the data. Technically, one may view an outlieras being an observation that represents a rare event. If the

    distance from the box exceeds 1.5 times the interquartile range,

    Q3 Q1 (in either direction), the observation may be labeled anoutlier.

    18

  • 7/24/2019 Handout Seven

    22/55

    Box Plot

    19

  • 7/24/2019 Handout Seven

    23/55

    Example 6

    The following set of numbers are the amount of marbles fifteen

    different boys own (they are arranged from least to greatest).

    18 27 34 52 54 59 61 68 78 82 85 87 91 93 100.

    Find the median.

    Find the lower quartile.

    Find the upper quartile.

    Find the interquartile range.

    20

  • 7/24/2019 Handout Seven

    24/55

    Box-and-Whisker Plot for Example 6.

    21

  • 7/24/2019 Handout Seven

    25/55

    Sampling Distribution of Means

    22

  • 7/24/2019 Handout Seven

    26/55

    Sampling Distribution

    The probability distribution of a statistic is called a sampling

    distribution.

    23

  • 7/24/2019 Handout Seven

    27/55

    If we are sampling from a population with unknown distribu-

    tion, either finite or infinite, the sampling distribution of X will

    be approximately normal with mean and variance 2/n pro-

    vided that the sample size is large (n >30).

    Central Limit Theorem

    If X is the mean of a random sample of size n taken from a

    population with mean and finite variance 2, then the limiting

    form of the distribution of

    Z=

    X /

    n

    as n , is the standard normal distribution N(0, 1).

    24

  • 7/24/2019 Handout Seven

    28/55

    Example 7

    Let Xbe the sample mean of a random sample of size 100 drawn

    from an exponential distribution with its graph given by

    f(x) =1

    4ex/4, x >0

    Exponential p.d.f with = 4.

    25

  • 7/24/2019 Handout Seven

    29/55

    Decide which of the graphs labeled (a)-(d) would most closely

    resemble the sampling distribution of the sample mean X. Ex-

    plain briefly your reasoning.

  • 7/24/2019 Handout Seven

    30/55

    Example 8

    An electrical firm manufactures light bulbs that have a length of

    life that is approximately normally distributed, with mean equal

    to 800 hours and a standard deviation of 40 hours. Find the

    probability that a random sample of 16 bulbs will have an average

    life of less than 775 hours?Solution:

    26

  • 7/24/2019 Handout Seven

    31/55

  • 7/24/2019 Handout Seven

    32/55

    Solution:

    28

  • 7/24/2019 Handout Seven

    33/55

    Sampling Distribution: Difference Between Two Averages

    If independent samples of size n1 and n2 are drawn at random

    from two populations, discrete or continuous, with means 1and 2, and variances

    21 and

    22, respectively, then the sampling

    distribution of the differences of means, X1X2, is approximatelynormally distributed with mean and variance given by

    X1X2 =1 2, and 2X1X2 =

    21n1

    +22n2

    .

    Hence

    Z=

    (X1

    X2)

    X1

    X2

    21n1

    +22n2

    is approximately a standard normal variable.

    29

  • 7/24/2019 Handout Seven

    34/55

    Example 10

    The television picture tubes of manufacture A have a mean life-time of 6.5 years and a standard deviation 0.9 year, while those

    of manufacturer B have a mean lifetime of 6.0 years and a stan-

    dard deviation of 0.8 year. What is the probability that a random

    sample of 36 tubes from manufacturer A will have a mean life-

    time that is at least 1 year more than the mean lifetime of asample of 49 tubes from manufacturer B?

    Solution:

  • 7/24/2019 Handout Seven

    35/55

    Example 11

    The mean score for freshmen on an aptitude test at a certain

    college is 540, with a standard deviation of 50. What is the

    probability that two groups of students selected at random, con-

    sisting of 64 and 100 students, respectively, will differ in theirmean scores by

    1. more than 10 points?

    2. an amount between 5 and 10 points?

    30

  • 7/24/2019 Handout Seven

    36/55

    Solution:

    31

  • 7/24/2019 Handout Seven

    37/55

    Sampling Distribution of S2

    32

  • 7/24/2019 Handout Seven

    38/55

    Sampling Distribution of S2

    If S2 is the variance of a random sample of size n taken from a

    normal population having the variance 2, then the statistic

    2 =(n 1)S2

    2 =

    ni=1

    (Xi X)22

    has a chi-squared distribution with =n

    1 degrees of free-

    dom.

    33

  • 7/24/2019 Handout Seven

    39/55

    Example 12

    Find the probability that a random sample of 21 observations,from a normal population with variance 2 = 5, will have a

    variance s2

    1. greater than 2.065;

    2. between 2.065 and 3.6445.

    Solution:

    34

  • 7/24/2019 Handout Seven

    40/55

    tDistribution

    35

  • 7/24/2019 Handout Seven

    41/55

    tDistribution

    Let Z be a standard normal random variable and V a chi-

    squared random variable with degrees of freedom. If Z and V

    are independent, then the distribution of the random variable

    T, where

    T = ZV /

    is given by the density function

    h(t) =[(+ 1)/2]

    (/2)

    (1 +t2

    )(+1)/2,

    < t 2(v)). That is, 2

    represent the 2-value above which we find an area equal to .

    Note for t(v)

    In the textbook, we use =P(T > t(v)). That is, t represent

    the t-value above which we find an area equal to .

    Note for f(v1,v2)

    In the textbook, we have = P(F > f(v1, v2)). That is, f

    represent the f-value above which we find an area equal to .

    42

  • 7/24/2019 Handout Seven

    48/55

    Example 13b

    Consider T = Xs/n for a random sample of size n= 8.

    Calculate P(T < 2.517) and P(2.998< T

  • 7/24/2019 Handout Seven

    49/55

    F-Distribution

    Let U and V be two independent random variables having chi-

    squared distribution with 1 and 2 degrees of freedom, respec-

    tively. Then the distribution of the random variable F = U/1

    V /2is

    given by the density

    f(x) =

    [(1+2)/2](1/2)1/2

    (1/2)(2/2)x(1/2)1

    (1+1x/2)(1+2)/2

    , x >0;

    0, x 0.

    This is known as the Fdistribution with 1 and 2 degrees offreedom.

    44

  • 7/24/2019 Handout Seven

    50/55

    0 1 2 3 4 5

    0.

    0

    0.

    5

    1.

    0

    1.

    5

    2.

    0

    F Distributions

    x

    f(x)

    v1=100, v2=100

    v1=6, v2=10

    v1=10, v2=30

    The F-Distribution curves.

    45

  • 7/24/2019 Handout Seven

    51/55

    Theorem

    IfS21 and S22 are the variances of independent random samples

    of size n1 and n2 taken from normal populations with variances

    21 and 22, respectively, then

    F =S21/

    21

    S22/22

    =22S

    21

    21S22

    has an F-distribution with 1 =n1 1 and 2 = n2 1 degreesof freedom.

    46

  • 7/24/2019 Handout Seven

    52/55

    What Is the F-Distribution Used for?

    The F-distribution is used in two-sample situations to draw in-

    ferences about the population variances. However, the F-

    distribution is applied to many other types of problems in which

    the sample variances are involved. In fact, the F-distribution is

    called the variance ratio distribution.

    47

  • 7/24/2019 Handout Seven

    53/55

    Example 15

    IfS

    2

    1 andS

    2

    2 represent the variances of independent random sam-ples of size n1 = 31 and n2 = 25, taken from normal populationswith variances 21 = 20 and

    22 = 10, respectively, find:

    1. P

    S22 3.88

    .

    Solution:

    48

  • 7/24/2019 Handout Seven

    54/55

    Solution:

    1.

    P(S22 36.4152

    = 1 0.05= 0.95

    49

  • 7/24/2019 Handout Seven

    55/55

    2.

    P

    S21S22

    >3.88

    =P

    S21S22

    22

    21>3.88

    22

    21

    =P

    S21S22

    2221

    >3.88 12

    =P

    F(30,24) >1.94

    = 0.05