why statisticians were created measure of dispersion fetp india
TRANSCRIPT
Why statisticians were created
Measure of dispersion
FETP India
Competency to be gained from this lecture
Calculate a measure of variation that is adapted to the sample studied
Key issues
• Range• Inter-quartile variation• Standard deviation
Measures of spread, dispersion or variability
• The measure of central tendency provides important information about the distribution
• However, it does not provide information concerning the relative position of other data points in the sample
• Measure of spread, dispersion or variability address are needed
Range
Why one needs to measure variability
Students
Marks obtained
Biology Physics Chemistry
1 200 199 100
2 200 200 200
3 200 201 300
Mean 200 200 200
Variation Nil Slight Substantial
Range 0 2 200
Range
Every concept comes from a failure of the previous concept
• Mean is distorted by outliers• Median takes care of the outliers
Range
The range: A simple measure of dispersion
• Take the difference between the lowest value and the highest value
• Limitation: The range says nothing about the values
between extreme values The range is not stable: As the sample size
increases, the range can change dramatically
Statistics cannot be used to look at the range
Range
Example of a range
• Take a sample of 10 heights: 70, 95, 100, 103, 105, 107, 110, 112, 115
and 140 cms
• Lowest (Minimum) value 70cm
• Highest (Maximum) value 140cm
• Range 140 – 70 = 70cm
Range
Three different distributions with the same range (35 Kgs)
30 40 50 60 70
30 40 50 60 70
30 40 50 60 70
X X X X X X X X X
X X XX X XX X
X X
X
Even
Uneven
Clumped XXXXXXX
Range
The range increases with the sample size
Values Range
Initial set(5 values)
30 40 53 58 65 - - - 30 65 35
New set(3 more values)
30 40 53 58 65 48 51 64 30 65 35
New set(3 more values)
30 40 53 58 65 48 51 70 30 70 40
New set(3 more values)
30 40 53 58 65 28 51 70 28 70 42
Two ranges based on different sample sizes are not comparableRang
e
Percentiles and quartiles
• Percentiles Those values in a series of observations,
arranged in ascending order of magnitude, which divide the distribution into two equal parts
The median is the 50th percentile
• Quartiles The values which divide a series of observations,
arranged in ascending order, into 4 equal parts The median is the 2nd quartile
Inter-quartile range
First 25% 2nd 25% 3rd 25% 4th 25%
Q1Q2
(Median) Q3
Sorting the data in increasing order
• Median Middle value (if n is odd) Average of the two middle values (if n is
even) A measure of the “centre” of the data
• Quartiles divide the set of ordered values into 4 equal parts
The inter-quartile range
• The central portion of the distribution • Calculated as the difference between the
third quartile and the first quartile• Includes about one-half of the
observations• Leaves out one quarter of the observations • Limitations:
Only takes into account two values Not a mathematical concept upon which
theories can be developed
Inter-quartile range
The inter-quartile range: Example
• Values 29 , 31 , 24 , 29 , 30 , 25
• Arrange 24 , 25 , 29 , 29, 30 , 31
• Q1 Value of (n+1)/4=1.75 24+0.75 = 24.75
• Q3 Value of (n+1)*3/4=5.2 Q3 = 30+0.2 = 30.2
• Inter-quartile range = Q3 – Q1 = 30.2 – 24.75Inter-quartile range
Graphic representation of theinter-quartile range
Inter-quartile range
The mean deviation from the mean
• Calculate the mean of all values• Calculate the difference between each
value and the mean• Calculate the average difference
between each value and the mean• Limitations:
The average between negative and positive deviations may generate a value of 0 while there is substantial variation
Standard deviation
The mean deviation from the mean:Example
Data 10 20 30 40 50 60 70Mean = 280/7 = 40Mean deviation from mean10-40 20-40 ………-30 -20 -10 0 10 20 30 Sum = 0
Standard deviation
Absolute mean deviation from the mean
• Calculate the mean of all values• Calculate the difference between each
value and the mean and take the absolute value
• Calculate the average difference between each value and the mean
• Limitations: Absolute value is not good from a
mathematical point of viewStandard deviation
Absolute mean deviation from the mean: Example
Standard deviation
Data 10 20 30 40 50 60 70Mean = 280/7 = 40Mean deviation from mean10-40 20-40 ………-30 -20 -10 0 10 20 30Absolute values30 20 10 0 10 20 30 Mean deviation from mean = 120/7 = 17.1
Calculating the variance (1/2)
1. Calculate the mean as a measure of central location (MEAN)
2. Calculate the difference between each observation and the mean (DEVIATION)
3. Square the differences (SQUARED DEVIATION)• Negative and positive deviations will not
cancel each other out• Values further from the mean have a bigger
impactStandard deviation
Calculating the variance (2/2)
4. Sum up these squared deviations (SUM OF THE SQUARED DEVIATIONS)
5. Divide this SUM OF THE SQUARED DEVIATIONS by the total number of observations minus 1 (n-1) to give the VARIANCE
• Why divide by n - 1 ? Adjustment for the fact that the mean is just
an estimate of the true population mean Tends to make the variance larger
Standard deviation
The standard deviation
• Take the square root of the variance• Limitations:
Sensitive to outliers
)( 1
22
nn
xxnSD
ii
Standard deviation
Example
Patient No of X rays
Deviation from mean
Absolute deviation
Square deviation
Square of observation
s
A 10 10-9= 1 1 12 = 1 102 = 100
B 8 8-9= -1 1 -12 = 1 82 = 64
C 6 6-9= -3 3 -32 = 9 62 = 36
D 12 12-9 = 3 3 32 = 9 122 = 144
E 9 9-9 = 0 0 02 = 0 92 = 81
Total 45 0 8 20 425
Mean = 45/9 = 9 x-rays Mean deviation = 8/5 = 1.6 x-rays
Variance = (20/(5-1)) = 20/4 = 5 x-rays Standard deviation = 5 = 2.2
Properties of the standard deviation
• Unaffected if same constant is added to (or subtracted from) every observation
• If each value is multiplied (or divided) by a constant, the standard deviation is also multiplied (or divided) by the same constant
Standard deviation
Need of a measure of variation that is independent from the
measurement unit• The standard deviation is expressed in
the same unit as the mean: e.g., 3 cm for height, 1.4 kg for weight
• Sometimes, it is useful to express variability as a percentage of the mean e.g., in the case of laboratory tests, the
experimental variation is ± 5% of the mean
Standard deviation
The coefficient of variation
• Calculate the standard deviation• Divide by the mean
The standard deviation becomes “unit free”
• Coefficient of variation (%) = [S.D / Mean] x 100 (Pure number)
Standard deviation
Uses of the coefficient of variation
• Compare the variability in two variables studied which are measured in different units Height (cm) and weight (kg)
• Compare the variability in two groups with widely different mean values Incomes of persons in different socio-
economic groups
Standard deviation
A summary of measures of dispersion
Measure Advantages Disadvantages
Range •Obvious•Easy to calculate
•Uses only 2 observations•Increases with the sample size•Can be distorted by outliers
Inter-quartile range
•Not affected by extreme values
•Uses only 2 observations•Not amenable for further statistical treatment
Standard deviation
•Uses every value•Suitable for further analysis
•Highly influenced by extreme values
Choosing a measure of central tendency and a measure of
dispersion
Type of distribution
Measure of central tendency
Measure of dispersion
Normal •Mean •Standard deviation
Skewed •Median •Inter-quartile range
Exponential or logarithmic
•Geometric mean •Consult with the statistician
Key messages
• Report the range but be aware of its limitations
• Report the inter-quartile deviation when you use the median
• Report the standard deviation when you use a mean