381 descriptive statistics-iii (measures of central tendency) qsci 381 – lecture 5 (larson and...
TRANSCRIPT
381 Descriptive Statistics-III
(Measures of Central Tendency)
QSCI 381 – Lecture 5(Larson and Farber, Sects 2.3 and 2.5)
381
Introduction is a
value that represents the typical, or central, entry in a data set.
There are three commonly used measures of central tendency: The Mean The Median The Mode.
381
The Mean-I The sample mean:
The population mean:1
iNi
x
1in
i
x x thwhere is the value of for the data entry,
is the sample mean, and
is the number of data points in the sample.
ix x i
x
n
thwhere is the value of for the data entry,
is the population mean, and
is the number of data points in the population.
ix x i
N
381
The Mean-II Consider the data set consisting of a
sample of the diameters of 6 trees in a stand: 29cm, 31cm, 43cm, 31cm, 12cm, 33cm
Calculate the mean:
(29 31 43 31 12 33) 179x
16 179 / 6 29.83x
1in
i
x x
381
The Mean-III Why we like the mean
Unique. Based on every data point in the data
set. Well suited to statistical treatment.
Why we dislike the mean Can be sensitive to “outlying”
observations.
381
The Median Sort the data and average the
central values. Six values:
Five values:
12,29,31,33,43,51
12,29,31,43,51
32
31
381
The Mode Find the frequency of each data entry and
identify the data entry with the greatest frequency.
Unlike the median and mean, the mode is not always uniquely defined. If a data set has two modes, it is referred to as being bimodal.
381
Which Measure is Best? There is no clear answer to this question.
The mean can be influenced by outliers while the mode may not be particularly “typical”.
Statistical inference based on the median and the mode is somewhat difficult.
0
2
4
6
8
10
12
14
Total Number of Offspring
Fre
qu
en
cy
ModeMedian
Mean Outlier?
381
Computing the Mean of a Group of Data Points
Suppose the data are in the form of frequencies, i.e., for each i, we have xi and fi, where fi is number of data entries for which x equals xi, then:
i ii
ii
x fx
f
In Excel use: “sumproduct(a1:a10,b1:b10)/sum(b1:b10)”where the xi’s are stored in column A and the fi’s are stored in column B.
381
Shapes of Distributions-I A frequency distribution is
when a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are mirror images.
0
50
100
150
200
250
300
350
400
450
-37.5 -32.5 -27.5 -22.5 -17.5 -12.5 -7.5 -2.5 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5
Mean, Median, Mode
381
Shapes of Distributions-II A frequency distribution is (or
rectangular) when the number of entries in each class is equal (a uniform distribution is symmetric).
Mean, Median, Mode
0
50
100
150
200
250
-37.5 -32.5 -27.5 -22.5 -17.5 -12.5 -7.5 -2.5 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5
381
0
50
100
150
200
250
300
350
400
450
500
1 3.5 6 8.5 11 13.5 16 18.5 21 23.5 26 28.5 31 33.5 36 38.5
Shapes of Distributions-III A frequency distribution is (or
positively skewed) if its tail extends to the right (mode < median < mean).
Mean
Tail
Mode Median
381
Shapes of Distributions-IV A frequency distribution is (or
negatively skewed) if its tail extends to the left (mode > median > mean).
0
50
100
150
200
250
300
350
400
450
500
1 3.5 6 8.5 11 13.5 16 18.5 21 23.5 26 28.5 31 33.5 36 38.5
381
Fractiles The is the difference between the
maximum and minimum data entries.
The : Q1, Q2, and Q3, divide a (ordered) data set into four equal parts.
The : P1, P2, ….P99 divide a (ordered) data set into 100 equal parts.
Collectively, Quartiles, Percentiles (and Deciles) are referred to as Fractiles.
381
More on Quartiles The quartiles divide a data set at the
25th percentile, the 50th percentile, and the 75th percentile.
The 50th percentile is the median. The difference between the 75th and
25th percentiles is referred to as the
.
381
0
20
40
60
80
100
120
0 2 4 6 8 10 12 14 16 18 20
Length (m)
Per
cen
tile
More on Percentiles
80%
15.2m
Interpretation: 80% of the bowheads caught are smaller than 15.2m
381
Box and Whisker Plots-I The information on the range and
the quartiles can be represented using a box and whisker plot.
381
Length (cm)
50 100 150
Box and Whisker Plots-II Find the five number summary of the data
(range, Q1,Q2,Q3). Construct a horizontal line that spans the data. Plot the five numbers above the horizontal
scale. Draw a box above the horizontal scale from Q1
to Q3 and draw a vertical line in the box at Q2.
Minimum MaximumQ1 Q2
Median
Q3
whisker
15105
Length (m)
381
Review of Symbols in this Lecture
th
total number of elements in the population.
total number of elements in the sample.
element in the data set.
population mean (the Greek letter mu).
sample mean (xbar).
i
N
n
x i
x