sta 291 fall 2009

16
STA 291 Fall 2009 Lecture 6 Dustin Lueker

Upload: zody

Post on 04-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

STA 291 Fall 2009. Lecture 6 Dustin Lueker. Measures of Variation. Statistics that describe variability Two distributions may have the same mean and/or median but different variability Mean and Median only describe a typical value, but not the spread of the data Range Variance - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STA 291 Fall 2009

STA 291Fall 2009

Lecture 6Dustin Lueker

Page 2: STA 291 Fall 2009

Measures of Variation Statistics that describe variability

◦ Two distributions may have the same mean and/or median but different variability Mean and Median only describe a typical value,

but not the spread of the data

◦ Range◦ Variance◦ Standard Deviation◦ Interquartile Range

All of these can be computed for the sample or population

2STA 291 Fall 2009 Lecture 6

Page 3: STA 291 Fall 2009

Range Difference between the largest and smallest

observation◦ Very much affected by outliers

A misrecorded observation may lead to an outlier, and affect the range

The range does not always reveal different variation about the mean

3STA 291 Fall 2009 Lecture 6

Page 4: STA 291 Fall 2009

Example Sample 1

◦ Smallest Observation: 112◦ Largest Observation: 797◦ Range =

Sample 2◦ Smallest Observation: 15033◦ Largest Observation: 16125◦ Range =

4STA 291 Fall 2009 Lecture 6

Page 5: STA 291 Fall 2009

Deviations The deviation of the ith observation xi from

the sample mean is the difference between them, ◦ Sum of all deviations is zero◦ Therefore, we use either the sum of the absolute

deviations or the sum of the squared deviations as a measure of variation

5

x)( xxi

STA 291 Fall 2009 Lecture 6

Page 6: STA 291 Fall 2009

Variance of n observations is the sum of the squared deviations, divided by n-1

Sample Variance

6

22 ( )

1

ix xs

n

STA 291 Fall 2009 Lecture 6

Page 7: STA 291 Fall 2009

Example

7

Observation Mean Deviation SquaredDeviation

1

3

4

7

10

Sum of the Squared Deviations

n-1

Sum of the Squared Deviations / (n-1)

STA 291 Fall 2009 Lecture 6

Page 8: STA 291 Fall 2009

Interpreting Variance About the average of the squared

deviations◦ “average squared distance from the mean”

Unit◦ Square of the unit for the original data

Difficult to interpret◦ Solution

Take the square root of the variance, and the unit is the same as for the original data Standard Deviation

8STA 291 Fall 2009 Lecture 6

Page 9: STA 291 Fall 2009

Properties of Standard Deviation s ≥ 0

◦ s = 0 only when all observations are the same If data is collected for the whole population

instead of a sample, then n-1 is replaced by n

s is sensitive to outliers

9STA 291 Fall 2009 Lecture 6

Page 10: STA 291 Fall 2009

Variance and Standard Deviation Sample

◦ Variance

◦ Standard Deviation

Population◦ Variance

◦ Standard Deviation

10

22 ( )

1

ix xs

n

2( )

1

ix xs

n

22 ( )ix

N

2( )ix

N

STA 291 Fall 2009 Lecture 6

Page 11: STA 291 Fall 2009

Population Parameters and Sample Statistics Population mean and population standard

deviation are denoted by the Greek letters μ (mu) and σ (sigma)◦ They are unknown constants that we would like to

estimate Sample mean and sample standard deviation are

denoted by and s◦ They are random variables, because their values vary

according to the random sample that has been selected

11

x

STA 291 Fall 2009 Lecture 6

Page 12: STA 291 Fall 2009

Empirical Rule If the data is approximately symmetric and

bell-shaped then◦ About 68% of the observations are within one

standard deviation from the mean◦ About 95% of the observations are within two

standard deviations from the mean◦ About 99.7% of the observations are within

three standard deviations from the mean

12STA 291 Fall 2009 Lecture 6

Page 13: STA 291 Fall 2009

Example SAT scores are scaled so that they have an

approximate bell-shaped distribution with a mean of 500 and standard deviation of 100◦ About 68% of the scores are between

◦ About 95% of the scores are between

◦ If you have a score above 700, you are in the top % What percentile would this be?

13STA 291 Fall 2009 Lecture 6

Page 14: STA 291 Fall 2009

Example According to the National Association of Home

Builders, the U.S. nationwide median selling price of homes sold in 1995 was $118,000◦ Would you expect the mean to be larger, smaller, or

equal to $118,000?

◦ Which of the following is the most plausible value for the standard deviation?

(a) –15,000, (b) 1,000, (c) 45,000, (d) 1,000,000

14STA 291 Fall 2009 Lecture 6

Page 15: STA 291 Fall 2009

Coefficient of Variation Standardized measure of variation

◦ Idea A standard deviation of 10 may indicate great

variability or small variability, depending on the magnitude of the observations in the data set

CV = Ratio of standard deviation divided by mean◦ Population and sample version

15STA 291 Fall 2009 Lecture 6

Page 16: STA 291 Fall 2009

Example Which sample has higher relative

variability? (a higher coefficient of variation)◦ Sample A

mean = 62 standard deviation = 12 CV =

◦ Sample B mean = 31 standard deviation = 7 CV =

STA 291 Fall 2009 Lecture 6 16