381 descriptive statistics-iii (measures of central tendency) qsci 381 – lecture 5 (larson and...

19
38 1 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

Upload: rafe-taylor

Post on 26-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381 Descriptive Statistics-III

(Measures of Central Tendency)

QSCI 381 – Lecture 5(Larson and Farber, Sects 2.3 and 2.5)

Page 2: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Introduction is a

value that represents the typical, or central, entry in a data set.

There are three commonly used measures of central tendency: The Mean The Median The Mode.

Page 3: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

The Mean-I The sample mean:

The population mean:1

iNi

x

1in

i

x x thwhere is the value of for the data entry,

is the sample mean, and

is the number of data points in the sample.

ix x i

x

n

thwhere is the value of for the data entry,

is the population mean, and

is the number of data points in the population.

ix x i

N

Page 4: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

The Mean-II Consider the data set consisting of a

sample of the diameters of 6 trees in a stand: 29cm, 31cm, 43cm, 31cm, 12cm, 33cm

Calculate the mean:

(29 31 43 31 12 33) 179x

16 179 / 6 29.83x

1in

i

x x

Page 5: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

The Mean-III Why we like the mean

Unique. Based on every data point in the data

set. Well suited to statistical treatment.

Why we dislike the mean Can be sensitive to “outlying”

observations.

Page 6: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

The Median Sort the data and average the

central values. Six values:

Five values:

12,29,31,33,43,51

12,29,31,43,51

32

31

Page 7: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

The Mode Find the frequency of each data entry and

identify the data entry with the greatest frequency.

Unlike the median and mean, the mode is not always uniquely defined. If a data set has two modes, it is referred to as being bimodal.

Page 8: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Which Measure is Best? There is no clear answer to this question.

The mean can be influenced by outliers while the mode may not be particularly “typical”.

Statistical inference based on the median and the mode is somewhat difficult.

0

2

4

6

8

10

12

14

Total Number of Offspring

Fre

qu

en

cy

ModeMedian

Mean Outlier?

Page 9: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Computing the Mean of a Group of Data Points

Suppose the data are in the form of frequencies, i.e., for each i, we have xi and fi, where fi is number of data entries for which x equals xi, then:

i ii

ii

x fx

f

In Excel use: “sumproduct(a1:a10,b1:b10)/sum(b1:b10)”where the xi’s are stored in column A and the fi’s are stored in column B.

Page 10: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Shapes of Distributions-I A frequency distribution is

when a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are mirror images.

0

50

100

150

200

250

300

350

400

450

-37.5 -32.5 -27.5 -22.5 -17.5 -12.5 -7.5 -2.5 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5

Mean, Median, Mode

Page 11: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Shapes of Distributions-II A frequency distribution is (or

rectangular) when the number of entries in each class is equal (a uniform distribution is symmetric).

Mean, Median, Mode

0

50

100

150

200

250

-37.5 -32.5 -27.5 -22.5 -17.5 -12.5 -7.5 -2.5 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5

Page 12: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

0

50

100

150

200

250

300

350

400

450

500

1 3.5 6 8.5 11 13.5 16 18.5 21 23.5 26 28.5 31 33.5 36 38.5

Shapes of Distributions-III A frequency distribution is (or

positively skewed) if its tail extends to the right (mode < median < mean).

Mean

Tail

Mode Median

Page 13: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Shapes of Distributions-IV A frequency distribution is (or

negatively skewed) if its tail extends to the left (mode > median > mean).

0

50

100

150

200

250

300

350

400

450

500

1 3.5 6 8.5 11 13.5 16 18.5 21 23.5 26 28.5 31 33.5 36 38.5

Page 14: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Fractiles The is the difference between the

maximum and minimum data entries.

The : Q1, Q2, and Q3, divide a (ordered) data set into four equal parts.

The : P1, P2, ….P99 divide a (ordered) data set into 100 equal parts.

Collectively, Quartiles, Percentiles (and Deciles) are referred to as Fractiles.

Page 15: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

More on Quartiles The quartiles divide a data set at the

25th percentile, the 50th percentile, and the 75th percentile.

The 50th percentile is the median. The difference between the 75th and

25th percentiles is referred to as the

.

Page 16: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

0

20

40

60

80

100

120

0 2 4 6 8 10 12 14 16 18 20

Length (m)

Per

cen

tile

More on Percentiles

80%

15.2m

Interpretation: 80% of the bowheads caught are smaller than 15.2m

Page 17: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Box and Whisker Plots-I The information on the range and

the quartiles can be represented using a box and whisker plot.

Page 18: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Length (cm)

50 100 150

Box and Whisker Plots-II Find the five number summary of the data

(range, Q1,Q2,Q3). Construct a horizontal line that spans the data. Plot the five numbers above the horizontal

scale. Draw a box above the horizontal scale from Q1

to Q3 and draw a vertical line in the box at Q2.

Minimum MaximumQ1 Q2

Median

Q3

whisker

15105

Length (m)

Page 19: 381 Descriptive Statistics-III (Measures of Central Tendency) QSCI 381 – Lecture 5 (Larson and Farber, Sects 2.3 and 2.5)

381

Review of Symbols in this Lecture

th

total number of elements in the population.

total number of elements in the sample.

element in the data set.

population mean (the Greek letter mu).

sample mean (xbar).

i

N

n

x i

x