3.3 measures of spread

31
3.3 Measures of Spread Chapter 3 - Tools for Analyzing Data Learning goal: calculate and interpret measures of spread Due now: p. 159 #4, 5, 6, 8, 10-13 MSIP / Home Learning: p. 168 #2b, 3b, 4, 6, 7, 10 What is more important: potential or consistency?

Upload: oswald

Post on 24-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

What is more important: potential or consistency?. 3.3 Measures of Spread. Chapter 3 - Tools for Analyzing Data Learning goal: calculate and interpret measures of spread Due now: p . 159 #4, 5, 6, 8, 10-13 MSIP / Home Learning: p. 168 #2b, 3b, 4, 6, 7, 10. What is spread?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 3.3 Measures of Spread

3.3 Measures of Spread

Chapter 3 - Tools for Analyzing DataLearning goal: calculate and interpret measures of spreadDue now: p. 159 #4, 5, 6, 8, 10-13MSIP / Home Learning: p. 168 #2b, 3b, 4, 6, 7, 10

What is more important: potential or consistency?

Page 2: 3.3 Measures of Spread

What is spread?

Measures of central tendency do not always tell you everything!

The histograms have identical mean and median, but the spread is different

Spread is how closely the values cluster around the middle value

Cou

nt

1234567

data2 3 4 5 6 7 8 9

data Histogram

Cou

nt

1

2

3

4

sp2 4 6 8 10

data Histogram

Page 3: 3.3 Measures of Spread

Why worry about spread? Less spread means you have greater

confidence that values will fall within a particular range

Important for making predictions

Page 4: 3.3 Measures of Spread

Measures of Spread We will also study 3 Measures of Spread:

Range Interquartile Range (IQR) Standard Deviation (Std.Dev.)

All 3 measure how spread out data is Smaller value = less spread (more consistent) Larger value = more spread (less consistent)

Page 5: 3.3 Measures of Spread

Measures of Spread 1) Range = (max) – (min)

Indicates the size of the interval that contains 100% of the data

2) Interquartile Range IQR = Q3 – Q1, where Q1 is the lower half median Q3 is the upper half median Indicates the size of the interval that contains the

middle 50% of the data

Page 6: 3.3 Measures of Spread

Quartiles Example 26 28 34 36 38 38 40 41 41 44 45 46 51 54 55

Q2 = 41 Median Q1 = 36 Lower half median Q3 = 46 Upper half median IQR = Q3 – Q1 = 46 – 36 = 10 The middle 50% of the data is within 10 units If a quartile occurs between 2 values, it is

calculated as the average of the two values

Page 7: 3.3 Measures of Spread

A More Useful Measure of Spread Range is very basic

Does not take clusters nor outliers into account Interquartile Range is somewhat useful

Takes clusters into account Visual in Box-and-Whisker Plot

Standard deviation is very useful Average distance from the mean for all data

points

Page 8: 3.3 Measures of Spread

Deviation The mean of these numbers is 48 Deviation = (data) – (mean) The deviation for 24 is 24 - 48 = -24 -24

12 24 36 48 60 72 84

36 The deviation for 84 is 84 - 48 = 36

Page 9: 3.3 Measures of Spread

Calculating Standard Deviation1. Find the mean (average)2. Find the deviation for each data point

data point – mean3. Square the deviations (data point – mean)2

4. Average the squares of the deviations (this is called the variance)5. Take the square root of the variance

Page 10: 3.3 Measures of Spread

Example of Standard Deviation 26 28 34 36 mean = (26 + 28 + 34 + 36) ÷ 4 = 31 σ² = (26–31)² + (28-31)² + (34-31)² + (36-31)² 4 σ² = 25 + 9 + 9 + 25 4 σ² = 17 σ = √17 = 4.1

Page 11: 3.3 Measures of Spread

Standard Deviation σ² (lower case sigma

squared) is used to represent variance

σ is used to represent standard deviation

σ is commonly used to measure the spread of data, with larger values of σ indicating greater spread

we are using a population standard deviation

n

xxi

2

Page 12: 3.3 Measures of Spread

Standard Deviation with Grouped Data grouped mean = (2×2 + 3×6 + 4×6 + 5×2) / 16 = 3.5 deviations:

2: 2 – 3.5 = -1.5 3: 3 – 3.5 = -0.5 4: 4 – 3.5 = 0.5 5: 5 – 3.5 = 1.5

σ² = 2(-1.5)² + 6(-0.5)² + 6(0.5)² + 2(1.5)² 16 σ² = 0.7499 σ = √0.7499 = 0.9

i

ii

fxxf 2

Hours of TV 2 3 4 5

Frequency 2 6 6 2

Page 13: 3.3 Measures of Spread

Measure of Spread - Recap Measures of Spread are numbers indicating how spread

out data is Smaller value for any measure of spread means data is

more consistent 1) Range = Max – Min 2) Interquartile Range: IQR = Q3 – Q1, where

Q1 = first half median Q3 = second half median

3) Standard Deviationi. Find mean (average)ii. Find all deviations (data) – (mean)iii./iv. Square all and avg them (this is variance or σ2)v. Take the square root to get std. dev. σ

Page 14: 3.3 Measures of Spread

MSIP / Home Learning Read through the examples on pages 164-

167 Complete p. 168 #2b, 3b, 4, 6, 7, 10 You are responsible for knowing how to do

simple examples by hand (~6 pieces of data) We will use technology (Fathom/Excel) to

calculate larger examples Have a look at your calculator and see if you

have this feature (Σσn and Σσn-1)

Page 15: 3.3 Measures of Spread

Different Types of Std. Dev.

n

xxi

2

n

xxi

2

Page 16: 3.3 Measures of Spread

3.4 Normal Distribution

Chapter 3 – Tools for Analyzing DataLearning goal: Determine the % of data within intervals of a Normal DistributionDue now: p. 168 #2b, 3b, 4, 6, 7, 10MSIP / Home Learning: p. 176 #1, 3b, 6, 8-10

“Think of how stupid the average person is, and realize half of them are stupider than that.” -George Carlin

Page 17: 3.3 Measures of Spread

Histograms

Histograms can be skewed...

Right-skewed Left-skewed

Page 18: 3.3 Measures of Spread

Histograms

... or symmetricalC

ount

1

2

3

4

5

a3 4 5 6 7 8 9 10 11

Collection 1 Histogram

Page 19: 3.3 Measures of Spread

Normal? A normal distribution is a histogram that is

symmetrical and has a bell shape Used quite a bit in statistical analysis Also called a Gaussian Distribution Symmetrical with equal mean, median and mode

that fall on the line of symmetry of the curve

Page 20: 3.3 Measures of Spread

A Real Example the heights of 600 randomly chosen Canadian

students from the “Census at School” data set the data approximates a normal distribution

0.005

0.010

0.015

0.020

0.025

0.030

0.035

Den

sity

100 120 140 160 180 200 220 240Heightcm

Density = x mean s normalDensity

600 Student Heights Histogram

Page 21: 3.3 Measures of Spread

The 68-95-99.7% Rule Area under curve is 1 (i.e. it represents 100%

of the population surveyed) Approx 68% of the data falls within 1

standard deviation of the mean Approx 95% of the data falls within 2

standard deviations of the mean Approx 99.7% of the data falls within 3

standard deviations of the mean http://davidmlane.com/hyperstat/A25329.html

Page 22: 3.3 Measures of Spread

Distribution of Data

34% 34%

13.5% 13.5%

2.35% 2.35%

68%

95%

99.7%

x x + 1σ x + 2σ x + 3σx - 1σx - 2σx - 3σ

),(~ 2xNX

0.15%0.15%

Page 23: 3.3 Measures of Spread

Normal Distribution Notation

The notation above is used to describe the Normal distribution where x is the mean and σ² is the variance (square of the standard deviation)

e.g. X~N (70,82) describes a Normal distribution with mean 70 and standard deviation 8 (our class at midterm?)

),(~ 2xNX

Page 24: 3.3 Measures of Spread

An example Suppose the time before burnout for an LED

averages 120 months with a standard deviation of 10 months and is approximately Normally distributed. What is the length of time a user might expect an LED to last with:

a) 68% confidence? b) 95% confidence? So X~N(120,102)

Page 25: 3.3 Measures of Spread

An example cont’d 68% of the data will be within 1 standard deviation of the

mean This will mean that 68% of the bulbs will be between

120–10 = 110 months and 120+10 = 130 months So 68% of the bulbs will last 110 - 130 months

95% of the data will be within 2 standard deviations of the mean

This will mean that 95% of the bulbs will be between120 – 2×10 = 100 months and 120 + 2×10 = 140 months

So 95% of the bulbs will last 100 - 140 months

Page 26: 3.3 Measures of Spread

Example continued… Suppose you wanted to know how long

99.7% of the bulbs will last? This is the area covering 3 standard

deviations on either side of the mean This will mean that 99.7% of the bulbs will be

between 120 – 3×10 months and 120 + 3×10 So 99.7% of the bulbs will last 90-150 months This assumes that all the bulbs are produced

to the same standard

Page 27: 3.3 Measures of Spread

Example continued…

34% 34%

13.5% 13.5%

2.35% 2.35%

95%

99.7%

120 140 15010090months monthsmonthsmonths months

µ represents the population meanWhat % of bulbs last between:µ - σ and µ + 2σ? 34 + 34 + 13.5 = 81.5%µ - 2σ and µ? 13.5 + 34 = 47.5%

Page 28: 3.3 Measures of Spread

Percentage of data between two values The area under any normal curve is 1 The percent of data that lies between two

values in a normal distribution is equivalent to the area under the normal curve between these values

See examples 2 and 3 on page 175

Page 29: 3.3 Measures of Spread

Why is the Normal distribution so important? Many psychological and educational

variables are distributed approximately normally: height, reading ability, memory, IQ, etc.

Normal distributions are statistically easy to work with All kinds of statistical tests are based on it

Lane (2003)

Page 30: 3.3 Measures of Spread

MSIP / Home Learning

Complete p. 176 #1, 3b, 6, 8-10 http://onlinestatbook.com/

Page 31: 3.3 Measures of Spread

References

Lane, D. (2003). What's so important about the normal distribution? Retrieved October 5, 2004 from http://davidmlane.com/hyperstat/normal_distribution.html

Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page