statistics 270 - lecture 3. last class: types of quantitative variable, histograms, measures of...

23
Statistics 270 - Lecture 3

Upload: juliet-wilson

Post on 13-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Statistics 270 - Lecture 3

Page 2: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

• Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall finish these today

• Will have completed Chapter 1

• Assignment #1: Chapter 1, questions: 6, 20b, 26, 36b-d, 48, 60

• Some suggested problems:• Chapter 1: 1, 5, 13 or 14 (DO histogram), 19, 26, 29, 33

Page 3: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

60 70 80 90 100

02

46

81

01

21

4

Exam 1 Grades

Co

un

t

Exam Grades from a Stats Class

Page 4: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

60 70 80 90 100

02

46

81

0

Exam 1 Grades

Co

un

t

Exam Grades for Undergraduates

Page 5: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

60 70 80 90 100

02

46

8

Exam 1 Grades

Co

un

t

Exam Grades for Graduate Students

Page 6: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Measures of Spread (cont.)

• 5 number summary often reported:

• Min, Q1, Q2 (Median), Q3, and Max

• Summarizes both center and spread

• What proportion of data lie between Q1 and Q3?

Page 7: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Box-Plot

• Displays 5-number summary graphically

• Box drawn spanning quartiles

• Line drawn in box for median

• Lines extend from box to max. and min values.

• Some programs draw whiskers only to 1.5*IQR above and below the quartiles

6065

7075

8085

Undergrads

Exam

Sco

res

Page 8: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

• Can compare distributions using side-by-side box-plots

• What can you see from the plot?

6070

8090

Undergrads Grads

Exam

Score

s

Page 9: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Other Common Measure of Spread: Sample Variance

• Sample variance of n observations:

• Can be viewed as roughly the average squared deviation of observations from the sample mean

• Units are in squared units of data

1

)(1

2

2

n

xxs

n

ii

Page 10: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Sample Standard Deviation

• Sample standard deviation of n observations:

• Can be viewed as roughly the average deviation of observations from the sample mean

• Has same units as data

1

)(1

2

n

xxs

n

ii

Page 11: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Exercise

• Compute the sample standard deviation and variance for the Muzzle Velocity Example

Page 12: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

• Variance and standard deviation are most useful when measure of center is

• As observations become more spread out, s : increases or decreases?

• Both measures sensitive to outliers

• 5 number summary is better than the mean and standard deviation for describing (I) skewed distributions; (ii) distributions with outliers

Page 13: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Population and Samples

• Important to distinguish between the population and a sample from the population

• A sample consisting of the entire population is called a

• What is the difference between the population mean and the sample mean?

• The population variance ( or std. deviation) and that of the population

• Population median and sample median?

Page 14: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Empirical Rule for Bell-Shaped Distributions

• Approximately• 68% of the data lie in the interval• 95% of the data lie in the interval • 95% of the data lie in the interval

• Can use these to help determine range of typical values or to identify potential outliers

sx

sx 2

sx 3

Page 15: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all together

• A geyser is a hot spring that becomes unstable and erupts hot gases into the air. Perhaps the most famous of these is Wyoming's Old Faithful Geyser.

• Visitors to Yellowstone park most often visit Old Faithful to see it erupt. Consequently, it is of great interest to be able to predict the interval time of the next eruption.

Page 16: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all together

• Consider a sample of 222 interval times between eruptions (Weisberg, 1985). The first few lines of the available data are:

• Goal: Help predict the interval between eruptionsConsider a variety of plots that may shed some light upon the nature of the intervals between eruptions

Day of Study Length of Eruption (Minutes)

Interval Between Eruption (Minutes)

1 4.4 78 1 3.9 74 1 4.0 68 1 4.0 76 . . .

.

.

.

.

.

.

Page 17: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all together

• Goal: Help predict the interval between eruptions

• Consider a histogram to shed some light upon the nature of the intervals between eruptions

40 50 60 70 80 90

010

2030

40

Eruption Intervals (Minutes)

Fre

quen

cy

Histogram of Old Faithful Eruption Intervals

Page 18: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all together

Summary Statistics: Minimum 42 1st Quartile 60 Median 75 Mean 71.01 3rd Quartile 81 Maximum 95 Standard Deviation 12.80

40

50

60

70

80

90

Eru

ptio

n I

nte

rva

ls (

Min

ute

s)

Boxplot of Old Faithful Eruption Intervals

Page 19: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all together

• What does the box-plot show?

• Is a box-plot useful at showing the main features of these data?

• What does the empirical rule tell us about 95% of the data? Is this useful?

• We will come back to this in a minute…

Page 20: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Scatter-Plots

• Help assess whether there is a relationship between 2 continuous variables,

• Data are paired

• (x1, y1), (x2, y2), ... (xn, yn)

• Plot X versus Y

• If there is no natural pairing…probably not a good idea!

• What sort of relationships might we see?

Duration (Minutes)

Eru

ptio

n In

terv

als

(Min

utes

)

2 3 4 5

4050

6070

8090

Scatterplot of Eruption Interval vs Duration

Page 21: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all together

• What does this plot reveal?

Duration (Minutes)

Eru

ptio

n I

nte

rva

ls (

Min

ute

s)

2 3 4 5

40

50

60

70

80

90

Scatterplot of Eruption Interval vs Duration

Page 22: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all togetherSummary Statistics: Eruption Less Than 3 Minutes Minimum 42 1st Quartile 60 Median 51 Mean 54.46 3rd Quartile 58

Maximum 78 Standard Deviation 6.30 Number of Observations

67

Eruption Greater Than or Equal to 3 Minutes Minimum 53 1st Quartile 74 Median 78 Mean 78.16 3rd Quartile 82.5 Maximum 95 Standard Deviation 6.89 Number of Observations

155

40 50 60 70 80

01

02

03

04

0

Eruption Intervals (Minutes)

Fre

qu

en

cy

50 60 70 80 90

01

02

03

04

0

Eruption Intervals (Minutes)

Fre

qu

en

cy

Page 23: Statistics 270 - Lecture 3. Last class: types of quantitative variable, histograms, measures of center, percentiles and measures of spread…well, we shall

Example…Putting this all together

• Suppose an eruption of 2.5 minutes had just taken place. What would you estimate the length of the next interval to be?

• Suppose an eruption of 3.5 minutes had just taken place. What would you estimate the length of the next interval to be?