3.1 measure of centerjga001/chapter 3.pdf · 3.1 - 1 3.1 measure of center calculate the mean for a...

3.1 - 1

3.1 Measure of Center

Calculate the mean for a given data set

Find the median, and describe why the median is sometimes preferable to the mean

Find the mode of a data set

Describe how skewness affects these measures of center

3.1 - 2

Measure of Center

Measure of Center

the value at the center or middle of a data set

The three common measures of center are the mean, the median, and the mode.

3.1 - 3

Mean

the measure of center obtained by adding the data values and then dividing the total by the number of values

What most people call an average also called the arithmetic mean.

3.1 - 4

Notation

Greek letter sigma used to denote the sum of a set of values.

x is the variable usually used to represent the data values.

n represents the number of data values in a sample.

3.1 - 5

Example of summation

• If there are n data values that are denoted as:

Then:

nxxx ,,, 21

nxxxx 21

3.1 - 6

Example of summation

• data

Then:

367

6462625348322521

x

21,25,32,48,53,62,62,64

3.1 - 7

Sample Mean

x = n

x

is pronounced „x-bar‟ and denotes the mean of a set of sample values

x

3.1 - 8

Example of Sample Mean

• data

Then:

21,25,32,48,53,62,62,64

9.45875.458

367

8

6462625348322521

x

3.1 - 9

Notation

µ Greek letter mu used to denote the

population mean

N represents the number of data values in a population.

3.1 - 10

Population Mean

N µ =

x

Note: here x represents the data values in the

population

3.1 - 11

Advantages

Is relatively reliable: means of samples

drawn from the same population don‟t vary

as much as other measures of center

Takes every data value into account

Mean

3.1 - 12

Mean

Disadvantage

Is sensitive to every data value, one

extreme value can affect it dramatically;

is not a resistant measure of center

Example:

21,25,32,48,53,62,62,64 → 9.45x

21,25,32,48,53,62,62,300 → 4.75x

3.1 - 13

Median

Median

the measure of center which is the middle

value when the original data values are

arranged in order of increasing (or

decreasing) magnitude

3.1 - 14

Finding the Median

1. If the number of data values is odd, the median is the number located in the exact middle of the list. Its position in the list is:

First sort the values (arrange them in order), the follow one of these

thn

2

1

3.1 - 15

Finding the Median

2. If the number of data values is even, the median is found by computing the mean of the two middle numbers which are those that lie on either side of the data value in the position:

thn

2

1

3.1 - 16

Example of Median

• 6 data values:

5.40 1.10 0.42 0.73 0.48 1.10

• Sorted data:

0.42 0.48 0.73 1.10 1.10 5.40

(even number of values – no exact middle)

915.02

1.173.0median

3.1 - 17

Example of Median

• 7 data values:

5.40 1.10 0.42 0.73 0.48 1.10 0.66

73.0median

• Sorted data:

0.42 0.48 0.66 0.73 1.10 1.10 5.40

3.1 - 18

Median

Median is not affected by an

extreme value - is a resistant

measure of the center

Example:

21,25,32,48,53,62,62,64

21,25,32,48,53,62,62,300

Median is 50.5 for both data sets.

3.1 - 19

Median

From Example 3.3, page 91

3.1 - 20

Mode

the value that occurs with the greatest

frequency

Data set can have one, more than one, or no mode

3.1 - 21

Mode

Mode is the only measure of central

tendency that can be used with nominal data

Bimodal two data values occur with the same greatest frequency

Multimodal more than two data values occur with the same greatest frequency

No Mode no data value is repeated

3.1 - 22

a) 5.40 1.10 0.42 0.73 0.48 1.10

b) 27 27 27 55 55 55 88 88 99

c) 1 2 3 6 7 8 9 10

Mode - Examples

Mode is 1.10

Bimodal - 27 & 55

No Mode

3.1 - 23

• These data values represent weight gain or loss in kg for a random sample of18 college freshman (negative data values indicate weight loss)

11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2

• Do these values support the legend that college students gain 15 pounds (6.8 kg) during their freshman year? Explain

3.1 - 24

• Sample Mean

kg 9.118

34

n

x

• Median

-5 -2 -2 -2 -2 0 0 1 1 2 2 3 3 4 5 7 8 11

kg 5.12/)21(median

• Mode: -2

3.1 - 25

• All of the measures of center are below 6.8 kg (15 pounds)

• Based on measures of center, these data values do not support the idea that college students gain 15 pounds (6.8 kg) during their freshman year

CONCLUSION

3.1 - 26

• First, enter the list of data values

•Then select 2nd STAT (LIST) and arrow right to MATH option 3:mean( or 4: median(

•and input the desired list

Mean/Median with Graphing Calculator

3.1 - 27

Example of Computing the Mean Using Calculator

Sorted amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979

Note: this data is related to Three Mile Island nuclear power plant

Accident in 1979.

x = n

x = 149.2

3.1 - 28

Example of Computing the Mean Using Calculator

Median is 150

3.1 - 29

Symmetric distribution of data is symmetric if the

left half of its histogram is roughly a mirror image of its right half

Skewed

distribution of data is skewed if it is not symmetric and extends more to one side than the other

Skewed and Symmetric

3.1 - 30

Skewed to the left

(also called negatively skewed) have a longer left tail, mean and median are to the left of the mode

Skewed to the right

(also called positively skewed) have a longer right tail, mean and median are to the right of the mode

Skewed Left or Right

3.1 - 31

Distribution Skewed Left

• Mean is smaller than the median.

3.1 - 32

Symmetric Distribution

• Mean, median, mode approximately equal.

3.1 - 33

Distribution Skewed Right

• Mean is larger than the median.

3.1 - 34

Example data set

5 5 5 5 5 10 10 10 10 10 10 15 15 15 15 15

20 20 20 20 25 25 25 30 30 30 35 35 40 45

• Mean:

• Median:

Distribution is skewed right.

18.730

560

n

xx

152

1515

3.1 - 35

3.2 Measures of Variability

The range

What is a deviation?

The standard deviation and the variance.

3.1 - 36

Why is it important to understand variation?

• A measure of the center by itself can be misleading

• Example:

Two nations with the same median family income are very different if one has extremes of wealth and poverty and the other has little variation among families (see the following table).

3.1 - 37

Example of variation

Data Set A Data Set B

50,000 10,000

60,000 20,000

70,000 70,000

80,000 120,000

90,000 130,000

MEAN 70,000 70,000

MEDIAN 70,000 70,000

Data set B has more variation about the mean

3.1 - 38

Histograms: example of variation

Data set B has more variation about the mean (Target).

3.1 - 39

How do we quantify variation?

3.1 - 40

Definition

The range of a set of data values is the difference between the maximum data value and the minimum data value.

Range = (maximum value) – (minimum value)

3.1 - 41

Range = 30 - 6 = 24

Example of range.

Data:

27 28 25 6 27 30 26

3.1 - 42

Range (cont.)

This shows that the range is very sensitive to extreme values; therefore not as useful as other measures of variation.

Ignoring the outlier of 6 in the previous data set gives data 27 28 25 27 30 26

Range = 30 - 25 = 5

3.1 - 43

Deviation

The deviation for a given data value is the distance between the data value and the mean, except that the deviation can be negative while a distance is always positive.

3.1 - 44

Deviation

A deviation for a given data value is the difference between the data value and the mean of the data set. If x is the data value, 1. For a sample, the deviation of x is

2. For a population, the deviation of x is

xx

x

3.1 - 45

Deviation

The deviation can be positive, negative, or zero. 1. If the data value is larger than the mean, the

deviation will be positive.

2. If the data value is smaller than the mean, the deviation will be negative.

3. If the data value equals the mean, the deviation will be zero.

3.1 - 46

Example:

• data

Mean

8,5,12,8,9,15,21,16,3

78.109

3162115981258

n

xx

3.1 - 47

Data Value Deviation

8

5

12

8

9

15

21

16

3

78.278.108

78.578.105

78.278.108

78.178.109

78.778.103

22.578.1016

22.1078.1021

22.478.1015

22.178.1012

3.1 - 48

Population Variance

N

x2

2

The population variance is the mean of the squared deviations in the population

3.1 - 49

Population Standard Deviation

N

x2

The population standard deviation is the square root of the population variance.

3.1 - 50

Sample Variance

1

2

2

n

xxs

The sample variance is

3.1 - 51

Sample Variance

Note that the sample variance is only approximately the mean of the squared deviations in the sample because we use n-1 instead of n.

3.1 - 52

Sample Variance

A statistic is an unbiased estimator of a parameter if its mean value equals the parameter it is trying to estimate.

Using n-1 instead of n makes the sample variance an unbiased estimator of the population variance.

3.1 - 53

Sample Standard Deviation

1

2

n

xxs

The sample standard deviation is the square root of the sample variance.

3.1 - 54

Steps to calculate the sample standard deviation

1. Calculate the sample mean

2. Find the squared deviations from the sample mean for each sample data value:

3. Add the squared deviations

4. Divide the sum in step 3 by n-1

5. Take the square root of the quotient in step 4

2)( xx

x

3.1 - 55

Example: Standard Deviation

Given the data set:

8, 5, 12, 8, 9, 15, 21, 16, 3

Find the standard deviation

3.1 - 56


• Find the mean

78.109

3162115981258

n

xx

3.1 - 57

Data Value Squared Deviations

From the Mean

8

5

12

8

9

15

21

16

3

73.7)78.108( 2

41.33)78.105( 2

73.7)78.108( 2

17.3)78.109( 2

53.60)78.103( 2

25.27)78.1016( 2

45.104)78.1021( 2

81.17)78.1015( 2

49.1)78.1012( 2

3.1 - 58


• Add the squared deviations (last column in the table above)

53.6025.2745.10481.1717.373.749.141.3373.7

57.263

3.1 - 59

• Divide the sum by 9-1=8:

• Take the square root:

95.328/57.263

74.595.32

7.5s


3.1 - 60

Sample Standard Deviation (Computational Formula)

1

/22

n

nxxs

3.1 - 61


Data:

2 1 1 1 1 1 1 4 1 2 2 1 2 3

3 2 3 1 3 1 3 1 3 2 2

Determine the standard deviation

using the previous formula

3.1 - 62


• We need to find each the following:

n

)( 22 xx

x

3.1 - 63

Data Table (25 data values)

TOTALS: 47 109

3.1 - 64


• Thus:

25n

109 )( 22 xx

47 x

3.1 - 65


• And:

24

25/47109

1

/ 222

n

nxxs

9.086.0

3.1 - 66

Standard Deviation - Important Properties

The standard deviation is a measure of variation of all values from the mean.

The value of the standard deviation s is never negative and usually not zero.

The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others).

Unlike variance, the units of the standard deviation s are the same as the units of the original data values.

3.1 - 67

Example: page 116

3.1 - 68

Colony A

7361134 range

Colony B

9167158 range

ANSWER

3.1 - 69

28(b) Which colony has the greater variability according to the range?

Example: page 116

ANSWER: colony B

3.1 - 70

Use the previous example and calculate the standard deviation for each colony with a calculator.

3.1 - 71

Data is stored in Lists. Locate and press

the STAT button on the

calculator. Choose EDIT. The calculator

will display the first three of six lists

(columns) for entering data. Simply type

your data and press ENTER. Use your

arrow keys to move between lists.

3.1 - 72

• Enter STAT then arrow right to CALC to get

• then press ENTER

3.1 - 73

Calculator Example

When 1-Var Stats appears on the home screen, enter the name of the list containing the data. You can do this by entering List (= 2nd STAT) and choosing which list has the desired data.

3.1 - 74

• 1-Var Stats

• NOTE: Previous example data will give different values than these.

3.1 - 75

Colony A

standard deviation = 21.9

Colony B

standard deviation = 26.4

ANSWER

3.1 - 76

• Compare histograms (SPSS)

3.1 - 77

3.3 Working with grouped data

• Calculate the weighted mean for a given data set

•Estimate the mean from grouped data

•Estimate the variance and standard deviation from grouped data

3.1 - 78

Weighted Mean

When data values are assigned different weights, we can compute a weighted mean.

Data values:

Corresponding weights:

nxxxx ,..., , , 321

nwwww ,..., , , 321

3.1 - 79

Computing the weighted mean

• Multiply each data value by its corresponding weight:

•Sum these products.

•Divide the result by the sum of the weights.

iixw

3.1 - 80

Weighted Mean

n

nn

i

iiw

www

xwxwxw

w

xwx

21

2211

3.1 - 81

Example: Weighted Mean

Suppose homework/quiz average is weighted 10%, 2 exams are weighted 60%, and final exam is weighted 30%.

If a student makes homework/quiz average 87, exam scores of 80 and 92, and final exam score 85, compute the weighted average.

3.1 - 82


ANSWER:

8.85

00.1

)85(30.0)92(30.0)80(30.0)87(10.0

3.1 - 83


3.1 - 84

Employment Hourly Mean Wage

($)

(weight) x (data value)

12,380 60.32 746,761.60

18,580 60.25 1,119,445.00

9,540 59.39 566580.60

35,550 57.98 2,061,189.00

10,130 55.95 566773.50

Weights Data Values

180,86

70.749,060,5

i

iiw

w

xwx

= $58.72

3.1 - 85

Estimating the mean from grouped data

• Given a frequency distribution, how do we compute the mean?

HEIGHT

(inches)

FREQUENCY

59-60 3

61-62 3

63-64 4

65-66 7

67-68 6

69-70 1

71-72 1

Heights of 25 Women:

3.1 - 86


• Assume all sample values are at the class midpoints

3.1 - 87

Assume all sample values are at the class

midpoints. HEIGHT

(inches)

FREQUENCY

59-60 3

61-62 3

63-64 4

65-66 7

67-68 6

69-70 1

71-72 1

Class midpoints:

59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5


3.1 - 88


• Multiply each class midpoint by its corresponding frequency

•Add the result

•Divide by the sum of the frequencies (total number of data values)

3.1 - 89

Class midpoints:

59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5


Frequencies:

3, 3, 4, 7, 6, 1, 1

25

)5.71(1)5.69(1)5.67(6)5.65(7)5.63(4)5.61(3)5.59(3

Estimated mean:

inches 9.6425/5.1621

3.1 - 90

Estimating the mean from a frequency distribution: Class midpoint = Frequency =

•Note: is “mu-hat” where the hat denotes the fact the mu is not exact, but approximate.

n

nn

i

ii

fff

fmfmfm

f

fm

21

2211

im if

3.1 - 91

Estimating the variance from a frequency distribution:

i

ii

f

fm

)ˆ(ˆ

22

Estimating the standard deviation from a frequency distribution:

i

ii

f

fm

)ˆ(ˆ

2

3.1 - 92

HEIGHT

(inches)

Frequency Class

Midpoints

59-60 3 59.5 86.2

61-62 3 61.5 33.9

63-64 4 63.5 7.4

65-66 7 65.5 2.9

67-68 6 67.5 41.8

69-70 1 69.5 21.5

71-72 1 71.5 44.0

inches 9.64ˆ

ii fm 2)ˆ(

inches 1.325

7.237 )ˆ(ˆ

2

i

ii

f

fm

3.1 - 93

3.4 Measures of Position

Find percentiles for small and large data sets

Calculate z-scores and explain why we use them

Use z-scores to detect outliers.

3.1 - 94

Let p be any integer between 0 and

100. The pth percentile of a data set is a value for which p percent of the values in the data set are less than or equal to this value.

Percentile

3.1 - 95

Sort the data from small to large. If you are finding the pth percentile of a

sample of size n, calculate:

which is p percent of n

Steps to find the pth percentile for small data sets

np

i

100

3.1 - 96

If i is an integer, the pth percentile is the

mean of the data values in positions i and i+1. If i is not an integer, round up and use

the value in this position as the pth percentile.

Steps to find the pth percentile for small data sets (cont)

3.1 - 97

Example of Finding Percentile

• Find the 25th and 75th percentiles of these 12 data values

36 37 38 39 44 44 47 50 53 57 65 69

3.1 - 98


25% of 12

75% of 12

312100

25

i

912100

75

i

3.1 - 99


• The data can be grouped as follows:

36 37 38 39 44 44 47 50 53 57 65 69

25% of the data is below 38.5 (the mean of 38 and 39).

The 25th percentile is 38.5

3rd position

3.1 -

100


• The data can be grouped as follows:

36 37 38 39 44 44 47 50 53 57 65 69

75% of the data is below 55 (the mean of 53 and 57).

The 75th percentile is 55

9th position

3.1 -

101


• Find the 25th and 75th percentiles of these 7 data values:

36 38 39 44 47 50 65

3.1 -

102


25% of 7

75% of 7

75.17100

25

i

25.57100

75

i

3.1 -

103


36 38 39 44 47 50 65

2nd position


1.75 round up to position 2

3.1 -

104


36 38 39 44 47 50 65

6th position


5.25 round up to position 6

3.1 -

105


Page 132

3.1 -

106


2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

Note: 15 data values

Data:

3.1 -

107


16(a) To find position, 5% of 15

16(b) To find position, 95% of 15

1 toup rounded 75.015100

5

i

15 toup rounded 25.1415100

95

i

3.1 -

108


2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

16(a) 5th percentile is 2.0 million

16(b) 95th percentile is 14.7 million

Data:

3.1 -

109

z Score (or standardized value)

the number of standard deviations that a given value x is above or below the mean

Z score

3.1 -

110

Sample Population

Z score Formulas

s

xxz

xz

3.1 -

111

Interpreting Z Scores

1. A z-score has no units.

2. Whenever a value is greater than the mean, its z score is positive

3. Whenever a value is less than the mean, its z score is negative

3.1 -

112

Example of Finding Z Score

Page 132

3.1 -

113

Example of Finding Z score

2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

Using calculator we get that

Data:

4.3 and 1.5 sx

3.1 -

114


2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

18(a) z-score for fish oil (data value is 4.2)

26.0

4.3

1.52.4

z

3.1 -

115


2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

19(a) z-score for Ginseng (data value is 8.8)

11.1

4.3

1.58.8

z

3.1 -

116

Outliers

An outlier is an extreme data value.

We will define a data value as “extreme” if it is at least three standard deviations from the mean.

3.1 -

117

Outliers and z Scores

Data values are not unusual (exteme) if

Data values are unusual or outliers if

22 z

3or 3 zz

3.1 -

118

Interpreting Z Scores

page 131: bell shaped distribution

3.1 -

119

3.5 Chebyshev‟s Rule and the Empirical Rule

Calculate percentages using Chebyshev‟s Rule

Find percentages and data values using the Empirical Rule

3.1 -

120

Chebyshev‟s Rule

The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1–1/K2, where K is any positive number greater than 1.

For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean.

For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean.

3.1 -

121

Example of Chebyshev‟s Rule

Page 139, problem 11(a)

A data distribution has a mean of 500 and a standard deviation of 100. Suppose we do not know whether the distribution is bell-shaped. (a) Estimate the proportion of data that falls between 300 and 700.

3.1 -

122


Page 139, problem 11(a) ANSWER: data values obey First compute k using given

skx 003

700300 x

2 700100500 003 kk

100 and 500 sx

700 skx

3.1 -

123


Page 139, problem 11(a) ANSWER: Since k =2, And Chebyshev’s rule says that at least 75% of the data falls between 300 and 700.

4

3

4

11

2

11

11

22

k

3.1 -

124

The Empirical Rule

For data sets having a distribution that is approximately bell shaped, the following properties apply:

About 68% of all values fall within 1 standard deviation of the mean.

About 95% of all values fall within 2 standard deviations of the mean.

About 99.7% of all values fall within 3 standard deviations of the mean.

3.1 -

125

For data sets having a distribution that is approximately bell shaped, the following properties apply:

About 68% of all data values obey

About 95% of all data values obey

About 99.7% of all data values obey

11 z

22 z

33 z

The Empirical Rule

3.1 -

126

The Empirical Rule

3.1 -

127

The Empirical Rule

3.1 -

128

The Empirical Rule

3.1 -

129

The Empirical Rule

Explain these percentages.

page 136: bell shaped distribution

3.1 -

130

The Empirical Rule

For the green regions: 1. 50% of the data lies to the left of z=0. 2. 34% (half of 68%) of the data lies between z=-1 and

z=0. Therefore, 16% (=50%-34%) of the data is to the left of z=-1.

3. 47.5% (half of 95%) of the data lies between z=-2 and z=0. Therefore, 2.5% (=50%-47.5%) of the data is to the left of z=-2.

4. Subtracting areas gives that 13.5%=16%-2.5% of the data lies between z=-2 and z=-1.

5. Using symmetry, 13.5% of the data also lies between

z=1 and z=2.

3.1 -

131

Example of Empirical Rule

Page 139, problem 12(a) A data distribution has a mean of 500 and a standard deviation of 100. Assume that the distribution is bell-shaped. (a) Estimate the proportion of data that falls between 300 and 700.

3.1 -

132

Page 139, problem 12(a) ANSWER: data values obey As in problem 11, we are given so that the data in the interval are within 2 standard deviations of the mean

700300 x

100 and 500 sx


3.1 -

133

Page 139, problem 12(a) ANSWER: Here we are also given that the distribution is bell-shaped. Using the empirical rule, approximately 95% of the data lies between 300 and 700.


3.1 -

134

Note the difference in problems 11 and 12: For problem 11 we are not told that the distributioin is bell-shaped and we can only say that “at least” 75% of data is between 300 and 700 (using Chebyshev’s Rule). We cannot use the Empirical Rule in problem 11.

Empirical Rule vs. Chebyshev‟s Rule

3.1 -

135

Example

Page 141, problem 24(a) In San Francisco the mean and standard deviation of the wind speed in January is (Note these are population parameters) Assume that the distribution of the wind speed is bell-shaped. (a) Estimate the proportion of times that the wind speed is between 1.2 mph and 13.2 mph.

mph 7.2 andmph 2.7

3.1 -

136

Let the variable x represent wind speed so that Note that the mean 7.2 is the midpoint of this interval. Calculate how many standard deviations from the mean this interval represents. Use the formula:

2.132.1 x

Example

deviation standard

mean valuenumerical k

3.1 -

137

Example

83.07.2

7.2 1.2

k 83.0

7.2

7.2 13.2

k

so that the data in the interval are within 0.83 standard deviations of the mean.

ANSWER

The empirical rule implies that less than 68% of the time the windspeed is between 1.2 mph and 13.2 mph.

Note: convice yourself of this by sketching areas below the bell-shaped distribution.

3.1 -

138

Example

Page 141, problem 24(b)

(b) Estimate the proportion of times that the wind speed is less than 1.2 mph.

3.1 -

139

Example

Page 141, problem 24(b)

1. Since 0.0 mph is 1 standard deviation below the mean of 7.2 mph, the empirical rule implies that 34% (half of 68%) of the time, the windspeed is between 0.0 mph and 7.2 mph.

2. Subtracting 34% from 50% gives that 16% of the time the windspeed is less than 0.0 mph.

3. Since 1.2 mph is greater than 0.0 mph but less than 7.2 mph, we can say that at least 16% of the time but no more than 50% of the time the windspeed is less than 0.0 mph

Note: convice yourself of this by sketching areas below the bell-shaped

distribution.

3.1 -

140

3.6 Robust Measures

• Find quartiles and the interquartile range

•Calculate the five number summary of a data set

•Construct a boxplot for a given data set

•Apply robust detection of outliers

3.1 -

141

Quartiles

Q1 (First Quartile) is the 25th percentile

Q2 (Second Quartile) is the 50th percentile or the median

Q3 (Third Quartile) is the 75th percentile

Are measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group.

3.1 -

142

Example of Quartiles

1 Q

• Given the 24 data values (sorted):

36 37 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 69 75

Find , ,

2 Q3 Q

3.1 -

143

For first quartile (25th percentile), position:

therefore the first quartile is the mean of the data values in positions 6 and 7

624100

25

i


0.422

4341

2 76

1

xx

Q

3.1 -

144

For second quartile, (50th percentile), position:

therefore the second quartile is the mean of the data values in positions 12 and 13

1224100

50

i

5.532

5453

2 1312

2

xx

Q


3.1 -

145

For third quartile, (75th percentile), position:

therefore the third quartile is the mean of the data values in positions 18 and 19

1824100

75

i

0.602

6159

2 1918

3

xx

Q


3.1 -

146

Interquartile Range

The Interquartile Range (IQR) is the difference between the third quartile and the first quartile which measures the spread of the middle 50% of the data:

It is considered a “robust” measure of variability because it is not affected by outliers in the data (bottom 25% and top 25% of data are ignored).

13IQR QQ

3.1 -

147

0.60 and 0.42 31 QQ

Example of IQR

• Given the 24 data values (sorted):

36 37 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 69 75

we found that

0.180.420.60IQR 13 QQ

3.1 -

148

0.60 and 0.42 31 QQ

Example of IQR

• Introduce outliers into previous data set:

2 5 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 100 200

we still have:

0.180.420.60IQR 13 QQ

3.1 -

149

For a set of data, the 5-number

summary consists of the

minimum value; the first quartile

Q1; the median (or second

quartile Q2); the third quartile,

Q3; and the maximum value.

5-Number Summary

3.1 -

150

Example of Five Number Summary

Given Data (sorted):

128 130 133 137 138 142 142 144 147 149

151 151 151 155 156 161 163 163 166

3.1 -

151

• Minimum data value is 128

• First quartile location

• round up to get

138 51 xQ


75.419100

25

i

3.1 -

152

• Second quartile location

• round up to get

149 102 xQ


5.919100

50

i

3.1 -

153

• Third quartile location

• round up to get

156 153 xQ


25.1419100

75

i

3.1 -

154

• Max data value is 166

• Five Number Summary

min max

128 138 149 156 166


3 Q2 Q1 Q

3.1 -

155

Using a five number summary,

a data value is an outlier if

1. It is located 1.5(IQR) or more

below the first quartile

2. It is located 1.5(IQR) or more

above the third quartile

Robust Detection of Outliers

3.1 -

156

• Given the data set: 2 5 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 100 200

has a five number summary:

2 42.0 53.5 60.0 200

0.180.420.60IQR 13 QQ


3.1 -

157

Calculate: 1.5(IQR)=27

1.5(IQR) below the first quartile:

42 - 27=15

1.5(IQR) above the third quartile:

60 + 27=87

Therefore, 2,5,100,200 are outliers


3.1 -

158

A boxplot (or box-and-whisker-

diagram) is a graph of a data set

that consists of a line extending

from the minimum value to the

maximum value, and a box with

lines drawn at the first quartile,

Q1; the median; and the third

quartile, Q3.

Boxplot

3.1 -

159

Example of Boxplot

3.1 -

160

Sorted amounts of Strontium-90 (in millibecquerels) in a random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979.

128 130 133 137 138 142 142 144 147 149

151 151 151 155 156 161 163 163 166 172

Example of Boxplot

3.1 -

161

• Five Number Summary

128 140 150 158.5 172

• Boxplot?

Next slide: page 148 constructing a boxplot by hand

or calculator or SPSS

Example of Boxplot

3.1 -

162

Constructing a Boxplot

Page 148 (see example 3.41)

3.1 -

163

Example of Boxplot

• Boxplot

3.1 -

164

• Enter the data in a list:

Calculator Five Number Summary

3.1 -

165

• Go to STAT - CALC and choose 1-Var Stats


3.1 -

166

• On the HOME screen, when 1-Var Stats appears, type the list containing the data.


3.1 -

167

• Arrow down to the five number summary (last five items in the list)


3.1 -

168

• CLEAR out the graphs under y = (or turn them off).

• Enter the data into the calculator lists. (choose STAT, #1 EDIT and type in entries)

Calculator Boxplot

3.1 -

169

• Press 2nd STATPLOT and choose #1 PLOT 1. Be sure the plot is ON, the second box-and-whisker icon is highlighted, and that the list you

will be using is indicated next to Xlist.

Calculator Boxplot

3.1 -

170

• To see the box-and-whisker plot, press ZOOM and #9 ZoomStat. Press the TRACE key to see on-screen data about the box-and-whisker plot.

Calculator Boxplot

3.1 -

171

Boxplot - Symmetric Distribution

Normal Distribution: Heights from a Random Sample of Women

1223 QQQQ NOTE:

value)data(min value)data(max 23 QQ

3.1 -

172

Boxplot - Skewed Distribution

Skewed Distribution: Salaries (in thousands of dollars) of NCAA Football Coaches

3.1 measure of centerjga001/chapter 3.pdf · 3.1 - 1 3.1 measure of center calculate the mean for a...

Documents