numerical descriptive measures · 2/13/2017 · dr. ahmed jaradat, agu 2. different sets of data...
TRANSCRIPT
1
Numerical Descriptive Measures
Measures of Central Tendency (Location)
The Arithmetic Mean (Average)
The Median
The Mode
Measures of Dispersion or Variability
The Range
The variance
The Standard Deviation
The Coefficient of Variation
Measures of Relative Standing
Quartiles and Percentiles
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
2.4 Descriptive Statistics:
Measures of Central Tendency
Definitions:
Population: is the totality of all subjects that have certain common
characteristics
Parameter: is a characteristic or numerical value obtained from the
population.
Sample: is a subset or part of the population
Statistic: is a characteristic or numerical value obtained from the
sample
Notation:
Variable of interest is denoted by capital letters, say X. Specific values
of the random variable will be denoted by lowercase letters x.
x1, x2, …, xN Population of size N
x1, x2, …, xn Sample of size n
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 2
Different sets of data have the same
distribution shape
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 3
4
This is the most popular and useful measure of central location
Example 1: Sample mean
The mean of the sample of six measurements represent random blood
glucose levels from a group of first year medical students (mmol/litre)
4.7, 3.6, 3.8, 3.2, 4.7, 4.0 is given by
Mean Population N
x
N
x Ni
N
1i ,...
μ 21
xx
Mean Sample ,n
x
n
xx ni
n
1i
...21 xx
The Arithmetic Mean (Average)
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
tsMeasuremen ofNumber
tsMeasuremen of SumMean
4.06
4.04.73.23.83.64.7 6
xxxxxx
6
xX 654321i
6
1i
5
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Example 2: Population Mean
Consider the final grades of STA 231: Biostatistics:
( 41, 41, 45, 45, 47, …., 100,100). The population mean is:
Example 3: Mean in repeated measurements
When many of the measurements have the same value, the measurements can be
summarized in a frequency table. Suppose the Biostatistics scores of a sample of
30 students in a class were recorded as follows:
181
181x...
2x
1x
181
ix181
1iμ
Score 55 66 70 83 88 90
Number of Students 4 8 7 4 5 2
30
30x...
2x
1x
30
ix30
1ix
7330
90(2)...66(8)55(4)
6.76181
100100...4745454141
Properties of the Mean1. All values are used
2. Uniqueness. For a given data set there is one and only one
arithmetic mean.
3. Simplicity. The arithmetic mean is easily understood and easy to
compute and interpret.
4. Mean is sensitive to the extreme values.
5. The sum of the deviations from the mean is 0.
6
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
7
The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude.
The [(n+1)/2]th largest observation if n is odd
The average of (n/2)th and [(n/2)+1]th largest observations if n is even
Example:
Seven patient s survival times were recorded (in months):
28, 60, 26, 32, 30, 26, 29. Find the median survival time.
First, sort the times, then, locate the value in the middle
n=7: Odd number of observations
26, 26, 28, 29, 30, 32, 60
Median=29
The Median
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 8
Suppose one patient’s survival time of 31 was added to the group
recorded before. 28, 60, 26, 32, 30, 26, 29, 31.
Find the median survival time.
First, sort the times, then, locate the values in the middle
n=8: Even number of observations
26, 26, 28, 29, 30, 31, 32, 60
Median=(29+30)/2=29.5
Properties of the median
1. Uniqueness. For a given data set there is one and only one median.
2. Simplicity. The median is easily to compute.
3. It is not affected by extreme values.
4. It may not be an actual observation in the data set.
5. It can be applied in ordinal level.
Example Cont’d
9
The mode is the most frequently occurring value among all the measurements in a sample.
Example:
The manager of a men’s store observes the waist size (in inches) of
trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40.
The mode of this data set is 34 in.
This information seems valuable (for example, for the design of a new
display in the store), much more than “the median is 33.5 in.”
The modal class: 74.5-79.5
Mode=(79.5+74.5)/2=77
The Mode
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
0
10
20
30
40
50
60
Nu
mb
er o
f S
tud
en
ts
Grade
The Mode Set of data may have one mode (or modal class), or two or more
modes.
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 10
0
10
20
30
40
50
60
Nu
mb
er
of
Stu
den
ts
Grade
Unimodal
0
5
10
15
20
25
30
35
40
45
Nu
mb
er
of
Stu
den
ts
Grade
Multimodal
0
10
20
30
40
50
60
Nu
mb
er
of
Stu
den
ts
Grade
Bimodal
The Mode
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 11
Properties of the Mode:
1. Very quick and easy to determine.2. Is an actual value of the data.3. Sometimes, it is not unique.4. It can be used when the data are qualitative as well as
quantitative. 5. Not affected by extreme scores. 6. Can change dramatically from sample to sample.7. Modes in particular are probably best applied to nominal
data. 8. It may not exist.
Relationship Between Mean, Median and Mode
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 12
Mean=Median=Mode Mean<Median<ModeMean>Median>Mode
If a distribution is symmetrical, the mean, median and mode coincide. If
a distribution is non symmetrical, and skewed to the left or to the
right, the three measures differ.
Properties of Central Tendency Measures
◦ Mean is sensitive to the tails, median and mode, not.
◦ Mode can be affected by little changes in the data, median and
mean, not.
◦ Mode and median can be find in a chart.
◦ The three measures are the same in a symmetric (Normal)
distribution.
What measurement to use?
◦ For skewed distributions, we use median.
◦ For statistical analysis or inference, we use mean.
13
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Type of Variable Best Measure of Central Tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 14
50
50
50
50
45
55
55
45
90
90
10
10
Mean = 50 Mean =50
40
40
60
60
Mean =50Mean =50
A measure of dispersion conveys information regarding the amount of variability present in a set of data.
Note:
1. If all the values are the same, there is no dispersion.
2. If all the values are different, there is a dispersion.
3. If the values close to each other, the amount of dispersion small.
4. If the values are widely scattered, the Dispersion is greater.
2.5 Descriptive Statistics:
Measures of Dispersion
15
Measure of dispersion:
Absolute:
Measure the dispersion in the original unit of the data. Variability
in two or more distributions can be compared provided they are
given in the same unit and have the same mean.
Relative:
Measure of dispersion is free from unit of measurement of data. It
is the ratio of a measure of absolute dispersion to the average,
from which absolute deviations are measured. It is called as
coefficient of variation.
The Range: The range of a set of measurements is the difference
between the largest and smallest measurements.
Its major advantage is the ease with which it can be computed.
Its major shortcoming is its failure to provide information on the
dispersion of the values between the two end points.
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
16
This measure of dispersion reflects the values of all the measurements.
The variance of a population of N measurements x1, x2, …, xN
having a mean m is defined as
The variance of a sample of n measurements x1, x2, …,xn
having a mean is defined as
N
)x( 2i
N1i2 m
x
1n
)xx(s
2i
n1i2
The Variance
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Example
Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 17
Xi Xi-10
8 -2
9 -1
10 0
11 1
12 2
Sum 50 0
Xi Xi-10
4 -6
7 -3
10 0
13 3
16 6
Sum 50 0
The mean of both populations is 10, but measurements in B are much
more dispersed than those in A. Therefore, another measure is
needed.
T he sum of deviations is zero in both cases.
The sum of squared deviations is used in calculating the variance.
18
Let us calculate the variance of the two populations
185
90
5
)1016()1013()1010()107()104( 222222B
25
10
5
)1012()1011()1010()109()108( 222222A
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Xi Xi-10 (Xi-10)2
8 -2 4
9 -1 1
10 0 0
11 1 1
12 2 4
Sum 50 0 10
Xi Xi-10 (Xi-10)2
4 -6 36
7 -3 9
10 0 0
13 3 9
16 6 36
Sum 50 0 90
19
The standard deviation of a set of measurements is the
square root of the variance of the measurements
The Coefficient of Variation
◦ The coefficient of variation of a set of measurements is
the standard deviation divided by the mean value.
This coefficient provides a proportionate measure of variation.
(100%)μ
σC.V. : variationoft coefficien Population
(100%)x
sC.V. : variationoft coefficien Sample
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
2
2
:deviationandardstPopulation
ss:deviationstandardSample
20
Interpreting Standard Deviation
The standard deviation can be used to
◦ compare the variability of several distributions
◦ make a statement about the general shape of a
distribution.
The coefficient of variation can be used to compare
variation between sets of data.
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 21
Example 2.5.3 Page 46
Suppose two samples of human males yield the following data:
C.V.(25 years)= (10/145)*100%=6.9%
C.V.(11 years)= (10/80)*100%=12.5%
The variation in11-years sample is more than the variation
in 25-years sample
Sample 1 Sample 2
Age 25 years 11 years
Mean weight 145 pounds 80 pounds
Standard deviation 10 pounds 10 pounds
2/12/2017