statistics 270 - lecture 3. last class: types of quantitative variable, histograms, measures of...
TRANSCRIPT
Statistics 270 - Lecture 3
• Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall finish these today
• Will have completed Chapter 1
• Assignment #1: Chapter 1, questions: 6, 20b, 26, 36b-d, 48, 60
• Some suggested problems:• Chapter 1: 1, 5, 13 or 14 (DO histogram), 19, 26, 29, 33
60 70 80 90 100
02
46
81
01
21
4
Exam 1 Grades
Co
un
t
Exam Grades from a Stats Class
60 70 80 90 100
02
46
81
0
Exam 1 Grades
Co
un
t
Exam Grades for Undergraduates
60 70 80 90 100
02
46
8
Exam 1 Grades
Co
un
t
Exam Grades for Graduate Students
Measures of Spread (cont.)
• 5 number summary often reported:
• Min, Q1, Q2 (Median), Q3, and Max
• Summarizes both center and spread
• What proportion of data lie between Q1 and Q3?
Box-Plot
• Displays 5-number summary graphically
• Box drawn spanning quartiles
• Line drawn in box for median
• Lines extend from box to max. and min values.
• Some programs draw whiskers only to 1.5*IQR above and below the quartiles
6065
7075
8085
Undergrads
Exam
Sco
res
• Can compare distributions using side-by-side box-plots
• What can you see from the plot?
6070
8090
Undergrads Grads
Exam
Score
s
Other Common Measure of Spread: Sample Variance
• Sample variance of n observations:
• Can be viewed as roughly the average squared deviation of observations from the sample mean
• Units are in squared units of data
1
)(1
2
2
n
xxs
n
ii
Sample Standard Deviation
• Sample standard deviation of n observations:
• Can be viewed as roughly the average deviation of observations from the sample mean
• Has same units as data
1
)(1
2
n
xxs
n
ii
Exercise
• Compute the sample standard deviation and variance for the Muzzle Velocity Example
• Variance and standard deviation are most useful when measure of center is
• As observations become more spread out, s : increases or decreases?
• Both measures sensitive to outliers
• 5 number summary is better than the mean and standard deviation for describing (I) skewed distributions; (ii) distributions with outliers
Population and Samples
• Important to distinguish between the population and a sample from the population
• A sample consisting of the entire population is called a
• What is the difference between the population mean and the sample mean?
• The population variance ( or std. deviation) and that of the population
• Population median and sample median?
Empirical Rule for Bell-Shaped Distributions
• Approximately• 68% of the data lie in the interval• 95% of the data lie in the interval • 95% of the data lie in the interval
• Can use these to help determine range of typical values or to identify potential outliers
sx
sx 2
sx 3
Example…Putting this all together
• A geyser is a hot spring that becomes unstable and erupts hot gases into the air. Perhaps the most famous of these is Wyoming's Old Faithful Geyser.
• Visitors to Yellowstone park most often visit Old Faithful to see it erupt. Consequently, it is of great interest to be able to predict the interval time of the next eruption.
Example…Putting this all together
• Consider a sample of 222 interval times between eruptions (Weisberg, 1985). The first few lines of the available data are:
• Goal: Help predict the interval between eruptionsConsider a variety of plots that may shed some light upon the nature of the intervals between eruptions
Day of Study Length of Eruption (Minutes)
Interval Between Eruption (Minutes)
1 4.4 78 1 3.9 74 1 4.0 68 1 4.0 76 . . .
.
.
.
.
.
.
Example…Putting this all together
• Goal: Help predict the interval between eruptions
• Consider a histogram to shed some light upon the nature of the intervals between eruptions
40 50 60 70 80 90
010
2030
40
Eruption Intervals (Minutes)
Fre
quen
cy
Histogram of Old Faithful Eruption Intervals
Example…Putting this all together
Summary Statistics: Minimum 42 1st Quartile 60 Median 75 Mean 71.01 3rd Quartile 81 Maximum 95 Standard Deviation 12.80
40
50
60
70
80
90
Eru
ptio
n I
nte
rva
ls (
Min
ute
s)
Boxplot of Old Faithful Eruption Intervals
Example…Putting this all together
• What does the box-plot show?
• Is a box-plot useful at showing the main features of these data?
• What does the empirical rule tell us about 95% of the data? Is this useful?
• We will come back to this in a minute…
Scatter-Plots
• Help assess whether there is a relationship between 2 continuous variables,
• Data are paired
• (x1, y1), (x2, y2), ... (xn, yn)
• Plot X versus Y
• If there is no natural pairing…probably not a good idea!
• What sort of relationships might we see?
Duration (Minutes)
Eru
ptio
n In
terv
als
(Min
utes
)
2 3 4 5
4050
6070
8090
Scatterplot of Eruption Interval vs Duration
Example…Putting this all together
• What does this plot reveal?
Duration (Minutes)
Eru
ptio
n I
nte
rva
ls (
Min
ute
s)
2 3 4 5
40
50
60
70
80
90
Scatterplot of Eruption Interval vs Duration
Example…Putting this all togetherSummary Statistics: Eruption Less Than 3 Minutes Minimum 42 1st Quartile 60 Median 51 Mean 54.46 3rd Quartile 58
Maximum 78 Standard Deviation 6.30 Number of Observations
67
Eruption Greater Than or Equal to 3 Minutes Minimum 53 1st Quartile 74 Median 78 Mean 78.16 3rd Quartile 82.5 Maximum 95 Standard Deviation 6.89 Number of Observations
155
40 50 60 70 80
01
02
03
04
0
Eruption Intervals (Minutes)
Fre
qu
en
cy
50 60 70 80 90
01
02
03
04
0
Eruption Intervals (Minutes)
Fre
qu
en
cy
Example…Putting this all together
• Suppose an eruption of 2.5 minutes had just taken place. What would you estimate the length of the next interval to be?
• Suppose an eruption of 3.5 minutes had just taken place. What would you estimate the length of the next interval to be?