sta 291 fall 2009
DESCRIPTION
STA 291 Fall 2009. Lecture 6 Dustin Lueker. Measures of Variation. Statistics that describe variability Two distributions may have the same mean and/or median but different variability Mean and Median only describe a typical value, but not the spread of the data Range Variance - PowerPoint PPT PresentationTRANSCRIPT
STA 291Fall 2009
Lecture 6Dustin Lueker
Measures of Variation Statistics that describe variability
◦ Two distributions may have the same mean and/or median but different variability Mean and Median only describe a typical value,
but not the spread of the data
◦ Range◦ Variance◦ Standard Deviation◦ Interquartile Range
All of these can be computed for the sample or population
2STA 291 Fall 2009 Lecture 6
Range Difference between the largest and smallest
observation◦ Very much affected by outliers
A misrecorded observation may lead to an outlier, and affect the range
The range does not always reveal different variation about the mean
3STA 291 Fall 2009 Lecture 6
Example Sample 1
◦ Smallest Observation: 112◦ Largest Observation: 797◦ Range =
Sample 2◦ Smallest Observation: 15033◦ Largest Observation: 16125◦ Range =
4STA 291 Fall 2009 Lecture 6
Deviations The deviation of the ith observation xi from
the sample mean is the difference between them, ◦ Sum of all deviations is zero◦ Therefore, we use either the sum of the absolute
deviations or the sum of the squared deviations as a measure of variation
5
x)( xxi
STA 291 Fall 2009 Lecture 6
Variance of n observations is the sum of the squared deviations, divided by n-1
Sample Variance
6
22 ( )
1
ix xs
n
STA 291 Fall 2009 Lecture 6
Example
7
Observation Mean Deviation SquaredDeviation
1
3
4
7
10
Sum of the Squared Deviations
n-1
Sum of the Squared Deviations / (n-1)
STA 291 Fall 2009 Lecture 6
Interpreting Variance About the average of the squared
deviations◦ “average squared distance from the mean”
Unit◦ Square of the unit for the original data
Difficult to interpret◦ Solution
Take the square root of the variance, and the unit is the same as for the original data Standard Deviation
8STA 291 Fall 2009 Lecture 6
Properties of Standard Deviation s ≥ 0
◦ s = 0 only when all observations are the same If data is collected for the whole population
instead of a sample, then n-1 is replaced by n
s is sensitive to outliers
9STA 291 Fall 2009 Lecture 6
Variance and Standard Deviation Sample
◦ Variance
◦ Standard Deviation
Population◦ Variance
◦ Standard Deviation
10
22 ( )
1
ix xs
n
2( )
1
ix xs
n
22 ( )ix
N
2( )ix
N
STA 291 Fall 2009 Lecture 6
Population Parameters and Sample Statistics Population mean and population standard
deviation are denoted by the Greek letters μ (mu) and σ (sigma)◦ They are unknown constants that we would like to
estimate Sample mean and sample standard deviation are
denoted by and s◦ They are random variables, because their values vary
according to the random sample that has been selected
11
x
STA 291 Fall 2009 Lecture 6
Empirical Rule If the data is approximately symmetric and
bell-shaped then◦ About 68% of the observations are within one
standard deviation from the mean◦ About 95% of the observations are within two
standard deviations from the mean◦ About 99.7% of the observations are within
three standard deviations from the mean
12STA 291 Fall 2009 Lecture 6
Example SAT scores are scaled so that they have an
approximate bell-shaped distribution with a mean of 500 and standard deviation of 100◦ About 68% of the scores are between
◦ About 95% of the scores are between
◦ If you have a score above 700, you are in the top % What percentile would this be?
13STA 291 Fall 2009 Lecture 6
Example According to the National Association of Home
Builders, the U.S. nationwide median selling price of homes sold in 1995 was $118,000◦ Would you expect the mean to be larger, smaller, or
equal to $118,000?
◦ Which of the following is the most plausible value for the standard deviation?
(a) –15,000, (b) 1,000, (c) 45,000, (d) 1,000,000
14STA 291 Fall 2009 Lecture 6
Coefficient of Variation Standardized measure of variation
◦ Idea A standard deviation of 10 may indicate great
variability or small variability, depending on the magnitude of the observations in the data set
CV = Ratio of standard deviation divided by mean◦ Population and sample version
15STA 291 Fall 2009 Lecture 6
Example Which sample has higher relative
variability? (a higher coefficient of variation)◦ Sample A
mean = 62 standard deviation = 12 CV =
◦ Sample B mean = 31 standard deviation = 7 CV =
STA 291 Fall 2009 Lecture 6 16