handout seven
TRANSCRIPT
-
7/24/2019 Handout Seven
1/55
Introduction to Probability and Statistics
Handout #7
Instructor: Lingzhou Xue
TA: Daniel Eck
The pdf file for this class is available on the class web page.
1
-
7/24/2019 Handout Seven
2/55
Chapter 8
Fundamental Sampling Distributions and
Data Descriptions
2
-
7/24/2019 Handout Seven
3/55
Sample Mean X and Sample Variance S2.
Histogram and Box Plot.
Central Limit Theorem (CLT).
2, t, and F Distributions.
3
-
7/24/2019 Handout Seven
4/55
Example 1: Sample Distribution
The sample distribution is the distribution resulting from the
collection of actual data. A major characteristic of a sample is
that it contains a finite (countable) number of scores, the num-
ber of scores represented by the letter n. For example, supposethat the following data were collected:
15 14 15 18 15 20 15 16 17 14 17 13 11 14 18 12 17 12 21 8
14 17 14 12 13 15 15 16 17 14 16 13 14 15 18 16 16 17 14 15
16 15 17 12 14 14 13 13 13 14
These numbers constitute a sample distribution.
4
-
7/24/2019 Handout Seven
5/55
Histogram
x
De
nsity
8 10 12 14 16 18 20
0.0
0
0.0
5
0.1
0
0.1
5
0.2
0
Sample Distribution.
-
7/24/2019 Handout Seven
6/55
In addition to the frequency distribution, the sample distribu-
tion can be described with numbers, called statistics. Examples
of statistics are the mean, median, mode, standard deviation,
range, and correlation coefficient, among others.
If a different sample was taken, different scores would result.
However, there would also be some consistency in that while the
statistics would not be exactly the same, they would be simi-
lar. To achieve order in this chaos, statisticians have developed
probability models.
-
7/24/2019 Handout Seven
7/55
Histogram
x
Density
8 10 12 14 16 18 20
0.0
0
0.0
5
0.1
0
0.1
5
0.2
0
Histogram
x
Density
10 12 14 16 18 20 22
0.0
0
0.0
5
0.1
0
0
.15
Histogram
x
Density
8 10 12 14 16 18 20
0.0
0
0.0
5
0.1
0
0.1
5
Histogram
x
Density
10 12 14 16 18 20
0.0
0
0.0
5
0.1
0
0.1
5
0.2
0
-
7/24/2019 Handout Seven
8/55
Random Sampling
5
-
7/24/2019 Handout Seven
9/55
Population
A population consists of the totality of the observations with
which we are concerned.
It is the entire group we are interested in, which we wish to
describe or draw conclusions about.
Sample
A sample is subset of a population.
6
-
7/24/2019 Handout Seven
10/55
2008 Presidential Race from CNN.
7
-
7/24/2019 Handout Seven
11/55
Example 2
If you wanted to find out the percentage of students at UMNwho enjoy reading Time. If we randomly select 20% of the pop-
ulation, this selection would be the sample in this experiment.
Therefore, the population would be all of the students who
attend UMN.
8
-
7/24/2019 Handout Seven
12/55
A simple random sample of size n consists ofn individuals fromthe population chosen in such a way that every set ofn individuals
has an equal chance to be the sample actually selected.
Random Sample
Let X1, X2, . . . , X n be n independent random variables, each
having thesameprobability distributionf(x). DefineX1, X2, . . . , X n
to be a random sample of size n from the population f(x) and
write its joint probability distribution as
g(x1, x2, . . . , xn) =f(x1)f(x2) f(xn).
9
-
7/24/2019 Handout Seven
13/55
Some Important Statistics
10
-
7/24/2019 Handout Seven
14/55
Statistic
A statistic is a function of random variables that does not de-
pend upon any unknown parameter.
Sample Mean & Sample Variance
If X1, X2, . . . , X n represent a random sample of size n, then the
sample mean is defined by the statistic
X=1
n
ni=1
Xi
and the sample variance is defined by the statistic
S2 = 1
n 1n
i=1
(Xi X)2.
11
-
7/24/2019 Handout Seven
15/55
Example 3
A comparison of coffee prices at 4 randomly selected grocery
stores in San Diego showed increases from the previous month
of 12, 15, 17, and 20 cents for 1-pound bag. Find the variance
of this random sample of price increases.Solution:
x=1
n
4i=1
xi = 16 cents.
s2 = 14 1
4
i=1
(xi x)2 =343
.
12
-
7/24/2019 Handout Seven
16/55
Theorem
If S2 is the variance of a random sample of size n, we may write
S2 = 1
n(n
1)
n
n
i=1
X2i
n
i=1
Xi
2
.
Proof:
13
-
7/24/2019 Handout Seven
17/55
Example 4
Find the sample mean and variance of the data 3, 4, 5, 6, 6, and
7, representing the number of trout caught by a random sample
of 6 fishermen on June 19, 1996, at Lake Muskoka.
Solution:
6i=1
x2i = 171,6
i=1
xi= 31, n= 6.
Hence,
s2 = 15 6[6 171 312] =136 .
Thus the sample standard deviation s=
13/6 = 1.47.
14
-
7/24/2019 Handout Seven
18/55
Example 5
The numbers of incorrect answers on a true-false competency
test for a random sample of 15 students were recorded as follows:
2, 1, 3, 0, 1, 3, 6, 0, 3, 4. Find
the sample mean;
the sample variance.
15
-
7/24/2019 Handout Seven
19/55
Mode
The mode in a list of numbers refers to the list of numbers thatoccur most frequently. A trick to remember this one is to
remember that mode starts with the same first two letters that
most does. Most frequently - Mode. Youll never forget that
one!
Median
The median is the middle value in your list. When the totals of
the list are odd, the median is the middle entry in the list after
sorting the list into increasing order. When the totals of the
list are even, the median is equal to the sum of the two middle
(after sorting the list into increasing order) numbers divided by
two. Thus, remember to line up your values, the middle number
is the median! Be sure to remember the odd and even rule.
16
-
7/24/2019 Handout Seven
20/55
Data Displays and Graphical Methods
17
-
7/24/2019 Handout Seven
21/55
Box Plot
A box plot (also known as a box-and-whisker diagram or plot orcandlestick chart) is a convenient way of graphically depicting the
five-number summary, which consists of 25% percentile (lower
quartile or first quartile (Q1)), median, 75% percentile (upper
quartile or third quartile (Q3)) and adjust values; in addition,
the boxplot indicates which observations, if any, are considered
unusual, or outliers.
Outlier
Outliers are observations that are considered to be unusually far
from the bulk of the data. Technically, one may view an outlieras being an observation that represents a rare event. If the
distance from the box exceeds 1.5 times the interquartile range,
Q3 Q1 (in either direction), the observation may be labeled anoutlier.
18
-
7/24/2019 Handout Seven
22/55
Box Plot
19
-
7/24/2019 Handout Seven
23/55
Example 6
The following set of numbers are the amount of marbles fifteen
different boys own (they are arranged from least to greatest).
18 27 34 52 54 59 61 68 78 82 85 87 91 93 100.
Find the median.
Find the lower quartile.
Find the upper quartile.
Find the interquartile range.
20
-
7/24/2019 Handout Seven
24/55
Box-and-Whisker Plot for Example 6.
21
-
7/24/2019 Handout Seven
25/55
Sampling Distribution of Means
22
-
7/24/2019 Handout Seven
26/55
Sampling Distribution
The probability distribution of a statistic is called a sampling
distribution.
23
-
7/24/2019 Handout Seven
27/55
If we are sampling from a population with unknown distribu-
tion, either finite or infinite, the sampling distribution of X will
be approximately normal with mean and variance 2/n pro-
vided that the sample size is large (n >30).
Central Limit Theorem
If X is the mean of a random sample of size n taken from a
population with mean and finite variance 2, then the limiting
form of the distribution of
Z=
X /
n
as n , is the standard normal distribution N(0, 1).
24
-
7/24/2019 Handout Seven
28/55
Example 7
Let Xbe the sample mean of a random sample of size 100 drawn
from an exponential distribution with its graph given by
f(x) =1
4ex/4, x >0
Exponential p.d.f with = 4.
25
-
7/24/2019 Handout Seven
29/55
Decide which of the graphs labeled (a)-(d) would most closely
resemble the sampling distribution of the sample mean X. Ex-
plain briefly your reasoning.
-
7/24/2019 Handout Seven
30/55
Example 8
An electrical firm manufactures light bulbs that have a length of
life that is approximately normally distributed, with mean equal
to 800 hours and a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will have an average
life of less than 775 hours?Solution:
26
-
7/24/2019 Handout Seven
31/55
-
7/24/2019 Handout Seven
32/55
Solution:
28
-
7/24/2019 Handout Seven
33/55
Sampling Distribution: Difference Between Two Averages
If independent samples of size n1 and n2 are drawn at random
from two populations, discrete or continuous, with means 1and 2, and variances
21 and
22, respectively, then the sampling
distribution of the differences of means, X1X2, is approximatelynormally distributed with mean and variance given by
X1X2 =1 2, and 2X1X2 =
21n1
+22n2
.
Hence
Z=
(X1
X2)
X1
X2
21n1
+22n2
is approximately a standard normal variable.
29
-
7/24/2019 Handout Seven
34/55
Example 10
The television picture tubes of manufacture A have a mean life-time of 6.5 years and a standard deviation 0.9 year, while those
of manufacturer B have a mean lifetime of 6.0 years and a stan-
dard deviation of 0.8 year. What is the probability that a random
sample of 36 tubes from manufacturer A will have a mean life-
time that is at least 1 year more than the mean lifetime of asample of 49 tubes from manufacturer B?
Solution:
-
7/24/2019 Handout Seven
35/55
Example 11
The mean score for freshmen on an aptitude test at a certain
college is 540, with a standard deviation of 50. What is the
probability that two groups of students selected at random, con-
sisting of 64 and 100 students, respectively, will differ in theirmean scores by
1. more than 10 points?
2. an amount between 5 and 10 points?
30
-
7/24/2019 Handout Seven
36/55
Solution:
31
-
7/24/2019 Handout Seven
37/55
Sampling Distribution of S2
32
-
7/24/2019 Handout Seven
38/55
Sampling Distribution of S2
If S2 is the variance of a random sample of size n taken from a
normal population having the variance 2, then the statistic
2 =(n 1)S2
2 =
ni=1
(Xi X)22
has a chi-squared distribution with =n
1 degrees of free-
dom.
33
-
7/24/2019 Handout Seven
39/55
Example 12
Find the probability that a random sample of 21 observations,from a normal population with variance 2 = 5, will have a
variance s2
1. greater than 2.065;
2. between 2.065 and 3.6445.
Solution:
34
-
7/24/2019 Handout Seven
40/55
tDistribution
35
-
7/24/2019 Handout Seven
41/55
tDistribution
Let Z be a standard normal random variable and V a chi-
squared random variable with degrees of freedom. If Z and V
are independent, then the distribution of the random variable
T, where
T = ZV /
is given by the density function
h(t) =[(+ 1)/2]
(/2)
(1 +t2
)(+1)/2,
< t 2(v)). That is, 2
represent the 2-value above which we find an area equal to .
Note for t(v)
In the textbook, we use =P(T > t(v)). That is, t represent
the t-value above which we find an area equal to .
Note for f(v1,v2)
In the textbook, we have = P(F > f(v1, v2)). That is, f
represent the f-value above which we find an area equal to .
42
-
7/24/2019 Handout Seven
48/55
Example 13b
Consider T = Xs/n for a random sample of size n= 8.
Calculate P(T < 2.517) and P(2.998< T
-
7/24/2019 Handout Seven
49/55
F-Distribution
Let U and V be two independent random variables having chi-
squared distribution with 1 and 2 degrees of freedom, respec-
tively. Then the distribution of the random variable F = U/1
V /2is
given by the density
f(x) =
[(1+2)/2](1/2)1/2
(1/2)(2/2)x(1/2)1
(1+1x/2)(1+2)/2
, x >0;
0, x 0.
This is known as the Fdistribution with 1 and 2 degrees offreedom.
44
-
7/24/2019 Handout Seven
50/55
0 1 2 3 4 5
0.
0
0.
5
1.
0
1.
5
2.
0
F Distributions
x
f(x)
v1=100, v2=100
v1=6, v2=10
v1=10, v2=30
The F-Distribution curves.
45
-
7/24/2019 Handout Seven
51/55
Theorem
IfS21 and S22 are the variances of independent random samples
of size n1 and n2 taken from normal populations with variances
21 and 22, respectively, then
F =S21/
21
S22/22
=22S
21
21S22
has an F-distribution with 1 =n1 1 and 2 = n2 1 degreesof freedom.
46
-
7/24/2019 Handout Seven
52/55
What Is the F-Distribution Used for?
The F-distribution is used in two-sample situations to draw in-
ferences about the population variances. However, the F-
distribution is applied to many other types of problems in which
the sample variances are involved. In fact, the F-distribution is
called the variance ratio distribution.
47
-
7/24/2019 Handout Seven
53/55
Example 15
IfS
2
1 andS
2
2 represent the variances of independent random sam-ples of size n1 = 31 and n2 = 25, taken from normal populationswith variances 21 = 20 and
22 = 10, respectively, find:
1. P
S22 3.88
.
Solution:
48
-
7/24/2019 Handout Seven
54/55
Solution:
1.
P(S22 36.4152
= 1 0.05= 0.95
49
-
7/24/2019 Handout Seven
55/55
2.
P
S21S22
>3.88
=P
S21S22
22
21>3.88
22
21
=P
S21S22
2221
>3.88 12
=P
F(30,24) >1.94
= 0.05