©2003 thomson/south-western 1 chapter 3 – data summary using descriptive measures slides prepared...
TRANSCRIPT
©2003 Thomson/South-Western 1
Chapter 3 –Chapter 3 –
Data SummaryData SummaryUsing Using Descriptive Descriptive MeasuresMeasures
Slides prepared by Jeff Heyl, Lincoln UniversitySlides prepared by Jeff Heyl, Lincoln University©2003 South-Western/Thomson Learning™
Introduction toIntroduction to Business StatisticsBusiness Statistics, 6e, 6eKvanli, Pavur, KeelingKvanli, Pavur, Keeling
©2003 Thomson/South-Western 2
Types of Descriptive Types of Descriptive MeasuresMeasures
Measures of central tendencyMeasures of central tendency Measures of variationMeasures of variation Measures of positionMeasures of position Measures of shapeMeasures of shape
©2003 Thomson/South-Western 3
Measures of Central Measures of Central TendencyTendency
The MeanThe Mean The MedianThe Median The MidrangeThe Midrange The ModeThe Mode
©2003 Thomson/South-Western 4
The MeanThe MeanThe Mean is simply the average of the dataThe Mean is simply the average of the data
Each value in the sample is represented by x.Each value in the sample is represented by x.
Thus to get the mean simply add all the Thus to get the mean simply add all the values in the sample and divide by the values in the sample and divide by the number of values in the sample (n)number of values in the sample (n)
A Sample MeanA Sample Mean
xx = =xxnnxxnn
©2003 Thomson/South-Western 5
The Population MeanThe Population Mean
Each value in the population is Each value in the population is represented by x.represented by x.
Thus to get the population mean Thus to get the population mean (()) simply add all the values in the simply add all the values in the population and divide by the number of population and divide by the number of values in the population (N)values in the population (N)
==xxNNxxNN
©2003 Thomson/South-Western 6
The Accident Data SetThe Accident Data Set
xx = = 10.0 = = 10.06 + 9 + 7 + 23 +56 + 9 + 7 + 23 +555
xx = = 11.25 = = 11.256 + 9 + 7 + 236 + 9 + 7 + 2344
If we remove the last value If we remove the last value from the data set, thenfrom the data set, then
©2003 Thomson/South-Western 7
The MedianThe Median
The Median The Median (Md)(Md) of a set of data is of a set of data is the value in the center of the data the value in the center of the data values when they are arranged values when they are arranged from lowest to highestfrom lowest to highest
©2003 Thomson/South-Western 8
Accident DataAccident Data
Ordered array: Ordered array: 5, 6, 7, 9, 235, 6, 7, 9, 23
The value that has an equal number of The value that has an equal number of items to the right and left is the median items to the right and left is the median
If n is an odd number, If n is an odd number, MdMd is the center is the center data value of the ordered data setdata value of the ordered data set
Md = st ordered valueMd = st ordered valuenn + 1 + 122
MdMd = 7= 7
©2003 Thomson/South-Western 9
Even Numbered DataEven Numbered Data
Ordered array: Ordered array: 3, 8, 12, 143, 8, 12, 14
The value that has an equal number of The value that has an equal number of items to the right and left is the median items to the right and left is the median
If n is an even number, If n is an even number, MdMd is the average of is the average of the two center values of the ordered data setthe two center values of the ordered data set
MdMd = (8 + 12)/2 = 10= (8 + 12)/2 = 10
©2003 Thomson/South-Western 10
The MidrangeThe Midrange
The Midrange The Midrange (Mr)(Mr) provides an easy- provides an easy-to-grasp measure of central tendencyto-grasp measure of central tendency
Mr =Mr = LL + + HH22
©2003 Thomson/South-Western 11
Accident DataAccident Data
Ordered array: Ordered array: 5, 6, 7, 9, 235, 6, 7, 9, 23
Mr = = 14Mr = = 145 + 235 + 2322
Note: that the Midrange is severely affected Note: that the Midrange is severely affected by outliersby outliers
Compare Compare Mr Mr toto xx = 10 = 10 andand Md = 7 Md = 7
©2003 Thomson/South-Western 12
The ModeThe Mode
The Mode The Mode (Mo)(Mo) of a data set is of a data set is the value that occurs more than the value that occurs more than once and the most oftenonce and the most often
The Mode is not always a The Mode is not always a measure of central tendency; this measure of central tendency; this value need not occur in the value need not occur in the center of the datacenter of the data
©2003 Thomson/South-Western 13
Bellaire College ExampleBellaire College Example
Figure 3.2Figure 3.2
©2003 Thomson/South-Western 14
Bellaire College ExampleBellaire College Example
Figure 3.3Figure 3.3
©2003 Thomson/South-Western 15
Bellaire College ExampleBellaire College Example
Figure 3.4Figure 3.4
©2003 Thomson/South-Western 16
Level of Measurement and Level of Measurement and Measure of Central TendencyMeasure of Central Tendency
Summary of levels of measurement and appropriate measure Summary of levels of measurement and appropriate measure of central tendency. A of central tendency. A “Y”“Y” indicates this measure can be indicates this measure can be used with the corresponding level of measurement.used with the corresponding level of measurement.
Measure ofMeasure ofCentral TendencyCentral Tendency NominalNominal OrdinalOrdinal IntervalInterval RatioRatio
MeanMean YY YYMedianMedian YY YY YYMidrangeMidrange YY YYModeMode YY YY YY YY
Level of MeasurementLevel of Measurement
Table 3.1Table 3.1
©2003 Thomson/South-Western 17
Measures of VariationMeasures of Variation
Homogeneity refers to the degree of Homogeneity refers to the degree of similarity within a set of datasimilarity within a set of data
The more homogeneous a set of The more homogeneous a set of data is, the better the mean will data is, the better the mean will represent a typical valuerepresent a typical value
Variation is the tendency of data Variation is the tendency of data values to scatter about the mean,values to scatter about the mean, xx
©2003 Thomson/South-Western 18
Common Measures of Common Measures of VariationVariation
RangeRange VarianceVariance Standard DeviationStandard Deviation Coefficient of VariationCoefficient of Variation
©2003 Thomson/South-Western 19
The RangeThe Range
For the Accident data:For the Accident data:
Range =Range = H H -- L L = 23 - 5 = 18= 23 - 5 = 18
Rather crude measure but easy to Rather crude measure but easy to calculate and contains valuable calculate and contains valuable information in some situationsinformation in some situations
©2003 Thomson/South-Western 20
The Variance and The Variance and Standard DeviationStandard Deviation
Both measures describe the variation of Both measures describe the variation of the values about the meanthe values about the mean
55 -5-5 252566 -4-4 161677 -3-3 9999 -1-1 11
2323 1313 169169
((xx - - xx ) = 0 ) = 0 ((xx - - xx ) )22 = 220 = 220
Data Value (Data Value (xx)) ((xx - - x x )) (x - (x - xx ) )22
©2003 Thomson/South-Western 21
Sample VarianceSample Variance
ss22 = =((xx - - xx ) )22
nn - 1 - 1
Using the accident data:Using the accident data:
ss22 = = = 55.0 = = = 55.02202205 - 15 - 1
22022044
©2003 Thomson/South-Western 22
Sample Standard DeviationSample Standard Deviation
ss = =((xx - - xx ) )22
nn - 1 - 1
Using the accident data:Using the accident data:
ss = 55.0 = 7.416 = 55.0 = 7.416
©2003 Thomson/South-Western 23
Population Variance and Population Variance and Standard DeviationStandard Deviation
==((xx - - ))22
NN
22 = =((xx - - ))22
NN
©2003 Thomson/South-Western 24
The Coefficient of VariationThe Coefficient of Variation
The Coefficient of Variation The Coefficient of Variation (CV)(CV) is is used to compare the variation of used to compare the variation of two or more data sets where the two or more data sets where the values of the data differ greatlyvalues of the data differ greatly
CV = CV = 100 100ssxx
©2003 Thomson/South-Western 25
Machined Parts ExampleMachined Parts Example
Figure 3.6Figure 3.6
©2003 Thomson/South-Western 26
Measures of PositionMeasures of Position
Percentile (Quartile)Percentile (Quartile) Most common measure of positionMost common measure of position Quartiles are percentiles with the data Quartiles are percentiles with the data
divided into quartersdivided into quarters
Z-ScoreZ-Score The relative position of a data value The relative position of a data value
expressed in terms of the number of expressed in terms of the number of standard deviations above or below the standard deviations above or below the meanmean
©2003 Thomson/South-Western 27
Percentile ExamplePercentile Example
The The 3535th Percentile (Pth Percentile (P3535) is that value ) is that value
such that at most such that at most 35%35% of the data of the data values are less than Pvalues are less than P3535 and at most and at most
65%65% of the data values are greater of the data values are greater than Pthan P3535..
©2003 Thomson/South-Western 28
Aptitude Test ScoresAptitude Test Scores
2222 4444 5656 6868 78782525 4444 5757 6868 78782828 4646 5959 6969 80803131 4848 6060 7171 82823434 4949 6161 7272 83833535 5151 6363 7272 85853939 5353 6363 7474 88883939 5353 6363 7575 90904040 5555 6565 7575 92924242 5555 6666 7676 9696
Table 3.2Table 3.2 Ordered array of aptitude test scores Ordered array of aptitude test scores for 50 applicants (for 50 applicants (xx = 60.36, = 60.36, ss = 18.61) = 18.61)
©2003 Thomson/South-Western 29
PercentilePercentileTexon Industries DataTexon Industries Data
17.5 represents the position of the 17.5 represents the position of the 35th percentile35th percentile
n n • = 50 • .35 = 17.5• = 50 • .35 = 17.5PP100100
Number of data values, Number of data values, nn = 50 = 50Percentile, Percentile, PP = 35 = 35
©2003 Thomson/South-Western 30
Percentile Location RulesPercentile Location Rules
Rule 1:Rule 1: If n If n PP/100/100 is not a counting number, is not a counting number, round it up, and the Pth percentile round it up, and the Pth percentile will be the value in this position of will be the value in this position of the ordered datathe ordered data
Rule 2:Rule 2: If n If n PP/100/100 is a counting number, is a counting number, the Pth percentile is the average of the Pth percentile is the average of the number in this location (of the the number in this location (of the ordered data) and the number in the ordered data) and the number in the next largest locationnext largest location
©2003 Thomson/South-Western 31
Aptitude Scores ExampleAptitude Scores ExampleMs. Jensen received a score of Ms. Jensen received a score of 8383 on the on the aptitude test. What is her percentile value?aptitude test. What is her percentile value?
83 is the 45th largest value out of 50.83 is the 45th largest value out of 50.A guess of the percentile would be:A guess of the percentile would be:
P = • 100 = 90P = • 100 = 9045455050
Examining the surrounding values clarifies Examining the surrounding values clarifies the true percentilethe true percentile
PP ((nn • • PP)/100)/100 P P th Percentileth Percentile
8888 50 • .88 = 4450 • .88 = 44 (80 + 83)/2 = 82.5(80 + 83)/2 = 82.58989 50 • .89 = 44.550 • .89 = 44.5 45th value = 8345th value = 839090 50 • .90 = 4550 • .90 = 45 (83 + 85)/2 = 84(83 + 85)/2 = 84
Example 3.5Example 3.5
©2003 Thomson/South-Western 32
QuartilesQuartilesQuartiles are merely particular percentiles Quartiles are merely particular percentiles that divide the data into quarters, namely:that divide the data into quarters, namely:
QQ11 = 1st quartile = 25th percentile (= 1st quartile = 25th percentile (PP2525))
QQ22 = 2nd quartile = 50th percentile = 2nd quartile = 50th percentile
= median (= median (PP5050))
QQ33 = 3rd quartile = 75th percentile (= 3rd quartile = 75th percentile (PP7575))
©2003 Thomson/South-Western 33
Quartile ExampleQuartile Example
Using the applicant data, the first quartile is:Using the applicant data, the first quartile is:
Rounded up Rounded up QQ11 = 13th ordered value = 46 = 13th ordered value = 46
Similarly the third quartile is:Similarly the third quartile is:
PP100100n n • = (50)(.75) = 37.5 ≈ 38 and • = (50)(.75) = 37.5 ≈ 38 and QQ33 = 75 = 75
n n • = (50)(.25) = 12.5• = (50)(.25) = 12.5PP
100100
©2003 Thomson/South-Western 34
Interquartile RangeInterquartile Range
The interquartile range (IQR) is The interquartile range (IQR) is essentially the middle 50% of the essentially the middle 50% of the data setdata set
IQR = IQR = QQ33 - - QQ11
Using the applicant data, the IQR is:Using the applicant data, the IQR is:
IQR = 75 - 46 = 29IQR = 75 - 46 = 29
©2003 Thomson/South-Western 35
Z-ScoresZ-Scores Z-score determines the relative position Z-score determines the relative position
of any particular data value x and is of any particular data value x and is based on the mean and standard based on the mean and standard deviation of the data setdeviation of the data set
The Z-score is expresses the number of The Z-score is expresses the number of standard deviations the value x is from standard deviations the value x is from the meanthe mean
A negative Z-score implies that x is to the A negative Z-score implies that x is to the left of the mean and a positive Z-score left of the mean and a positive Z-score implies that x is to the right of the meanimplies that x is to the right of the mean
©2003 Thomson/South-Western 36
Z Score EquationZ Score Equation
zz = =xx - - xx
ss
For a score of 83 from the aptitude data set,For a score of 83 from the aptitude data set,
zz = = 1.22 = = 1.2283 - 60.6683 - 60.66
18.6118.61
For a score of 35 from the aptitude data set,For a score of 35 from the aptitude data set,
zz = = -1.36 = = -1.3635 - 60.6635 - 60.66
18.6118.61
©2003 Thomson/South-Western 37
Standardizing Sample DataStandardizing Sample Data
The process of subtracting the The process of subtracting the mean and dividing by the standard mean and dividing by the standard deviation is referred to as deviation is referred to as standardizing the sample data.standardizing the sample data.
The corresponding z-score is the The corresponding z-score is the standardized score.standardized score.
©2003 Thomson/South-Western 38
Measures of ShapeMeasures of Shape
SkewnessSkewness Skewness measures the tendency of Skewness measures the tendency of
a distribution to stretch out in a a distribution to stretch out in a particular directionparticular direction
KurtosisKurtosis Kurtosis measures the peakedness Kurtosis measures the peakedness
of the distributionof the distribution
©2003 Thomson/South-Western 39
SkewnessSkewness In a symmetrical distribution the mean, In a symmetrical distribution the mean,
median, and mode would all be the same median, and mode would all be the same value and value and SkSk = 0= 0
A positive A positive SkSk number implies a shape number implies a shape which is skewed right and thewhich is skewed right and the
mode < median < meanmode < median < mean In a data set with a negative In a data set with a negative SkSk value the value the
mean < median < modemean < median < mode
©2003 Thomson/South-Western 40
Skewness CalculationSkewness Calculation
Pearsonian coefficient of skewnessPearsonian coefficient of skewness
Sk =Sk =3(3(xx - Md) - Md)
ss
Values of Values of SkSk will always fall between -3 and will always fall between -3 and 33
©2003 Thomson/South-Western 41
Histogram of Symmetric DataHistogram of Symmetric DataF
req
ue
ncy
Fre
qu
en
cy
xx = Md = Mo = Md = MoFigure 3.7Figure 3.7
©2003 Thomson/South-Western 42
Histogram with Right Histogram with Right (Positive) Skew(Positive) Skew
Re
lati
ve
Fre
qu
enc
yR
ela
tiv
e F
req
uen
cy
ModeMode(Mo)(Mo)
MedianMedian(Md)(Md)
Sk > 0Sk > 0
MeanMean((xx )) Figure 3.8Figure 3.8
©2003 Thomson/South-Western 43
Histogram with Left Histogram with Left (Negative) Skew(Negative) Skew
ModeMode(Mo)(Mo)
MedianMedian(Md)(Md)
Re
lati
ve
Fre
qu
enc
yR
ela
tiv
e F
req
uen
cy
Sk < 0Sk < 0
MeanMean((xx ))Figure 3.9Figure 3.9
©2003 Thomson/South-Western 44
KurtosisKurtosis
Kurtosis is a measure of the Kurtosis is a measure of the peakedness of a distributionpeakedness of a distribution
Large values occur when there is a Large values occur when there is a high frequency of data near the high frequency of data near the mean and in the tailsmean and in the tails
The calculation is cumbersome and The calculation is cumbersome and the measure is used infrequentlythe measure is used infrequently
©2003 Thomson/South-Western 45
Chebyshev’s InequalityChebyshev’s Inequality1.1. At least At least 75%75% of the data values are between of the data values are between
xx - 2 - 2s and x + s and x + 22s, ors, orAt least At least 75%75% of the data values have a z- of the data values have a z-score value between score value between -2-2 and and 22
3.3. In general, at least In general, at least (1-1/(1-1/kk22) x) x 100%100% of the of the data values lie between x - ks and x data values lie between x - ks and x ++ ks for any kks for any k>1>1
2.2. At least 89% of the data values are between At least 89% of the data values are between x x - 3- 3s and x s and x + 3+ 3s, or s, or At least At least 75%75% of the data values have a z- of the data values have a z-score value between score value between -3-3 and and 33
©2003 Thomson/South-Western 46
Empirical RuleEmpirical RuleUnder the assumption of a bell Under the assumption of a bell shaped population:shaped population:
1.1. Approximately Approximately 68%68% of the data values lie of the data values lie between x between x -- s and x s and x ++ s (have z-scores s (have z-scores between between -1-1 and and 11))
2.2. Approximately Approximately 95%95% of the data values lie of the data values lie between x between x -- 22s and x s and x ++ 22s (have z-scores s (have z-scores between between -2-2 and and 22))
3.3. Approximately Approximately 99.7%99.7% of the data values lie of the data values lie between x between x -- 33s and x s and x ++ 33s (have z-scores s (have z-scores between between -3-3 and and 33))
©2003 Thomson/South-Western 47
A Bell-Shaped A Bell-Shaped (Normal) Population(Normal) Population
Figure 3.10Figure 3.10
©2003 Thomson/South-Western 48
Chebyshev’s Versus Chebyshev’s Versus EmpiricalEmpirical
Chebyshev’sChebyshev’sActualActual Inequality Inequality Empirical RuleEmpirical Rule
BetweenBetween PercentagePercentage PercentagePercentage PercentagePercentage
xx - - ss and and xx + + ss 66%66% —— ≈ 68%≈ 68%(33 out of 50)(33 out of 50)
xx - 2 - 2ss and and xx + 2 + 2ss 98%98% ≥ 75%≥ 75% ≈ 95%≈ 95%(49 out of 50)(49 out of 50)
xx - 3 - 3ss and and xx + 3 + 3ss 100%100% ≥ 89%≥ 89% ≈ 100%≈ 100%(50 out of 50)(50 out of 50)
Table 3.3Table 3.3
Md = 62Sk = -.26
©2003 Thomson/South-Western 49
Allied Manufacturing ExampleAllied Manufacturing ExampleIs the Empirical Rule Is the Empirical Rule applicable to this data?applicable to this data?
Probably yes.Probably yes.
Histogram is Histogram is approximately bell approximately bell shaped.shaped.
xx - 2 - 2ss = 10.275 and = 10.275 and xx + 2 + 2ss = 10.3284 = 10.3284
96 of the 100 data values fall between these limits 96 of the 100 data values fall between these limits closely approximating the 95% called for by the closely approximating the 95% called for by the Empirical RuleEmpirical Rule
©2003 Thomson/South-Western 50
Grouped DataGrouped Data
Class NumberClass Number Class (Age in years)Class (Age in years) FrequencyFrequency
11 20 and under 3020 and under 30 5522 30 and under 4030 and under 40 141433 40 and under 5040 and under 50 9944 50 and under 6050 and under 60 6655 60 and under 7060 and under 70 22
3636
Table 3.4Table 3.4
When raw data are not availableWhen raw data are not available
Estimate Estimate xx by assuming data values are equal to the by assuming data values are equal to the midpoint of their classmidpoint of their class
©2003 Thomson/South-Western 51
Grouped DataGrouped DataWhen raw data are not availableWhen raw data are not available
Estimate Estimate xx by assuming data values are equal to the by assuming data values are equal to the midpoint of their classmidpoint of their class
5 values at (20 + 30)/25 values at (20 + 30)/2 = 25= 2514 values at (30 + 40)/214 values at (30 + 40)/2 = 35= 35
9 values at (40 + 50)/59 values at (40 + 50)/5 = 45= 456 values at (50 + 60)/26 values at (50 + 60)/2 = 55= 552 values at (60 + 70)/22 values at (60 + 70)/2 = 65= 65
xx = =
xx = = 41.1 = = 41.1
(5)(25) + (14)(35) + (9)(45) + (6)(55) + (2)(65)(5)(25) + (14)(35) + (9)(45) + (6)(55) + (2)(65)3636
148014803636
©2003 Thomson/South-Western 52
Grouped DataGrouped DataWhen raw data are not availableWhen raw data are not available
Estimate Estimate ss22 by assuming data values are equal to the by assuming data values are equal to the midpoint of their class and using the normal methodmidpoint of their class and using the normal method
ss22 = =∑∑(each data value)(each data value)22 - ∑(each data value) - ∑(each data value)22//nn
nn - 1 - 1
ss22 = = 121.59 = = 121.59
ss = 121.59 = 11.03 = 121.59 = 11.03
65,100 - (1480)65,100 - (1480)22/36/36
3535
©2003 Thomson/South-Western 53
Grouped DataGrouped Data
Table 3.5Table 3.5
Summary of calculationsSummary of calculations
Class Class NumberNumber ClassClass ff mm ff • • mm ff • • mm22
11 20 and under 3020 and under 30 55 2525 125125 3,1253,12522 30 and under 4030 and under 40 1414 3535 490490 17,15017,15033 40 and under 5040 and under 50 99 4545 405405 18,22518,22544 50 and under 6050 and under 60 66 5555 330330 18,15018,15055 60 and under 7060 and under 70 22 6565 130130 8,4508,450
3636 ∑∑ff • • mm = 1,480 = 1,480 ∑∑ff • • mm22 = 65,100 = 65,100
©2003 Thomson/South-Western 54
Grouped DataGrouped Data
Figure 3.11Figure 3.11
©2003 Thomson/South-Western 55
Box PlotsBox Plots
Box plots are graphical representations of Box plots are graphical representations of data sets that illustrate the lowest data data sets that illustrate the lowest data value (value (LL), the first quartile (), the first quartile (QQ11), the median ), the median
((QQ22, MD), the third quartile (, MD), the third quartile (QQ33), the ), the
interquartile range (IQR), and the highest interquartile range (IQR), and the highest data value (data value (HH))
©2003 Thomson/South-Western 56
Box PlotsBox PlotsGiven the aptitude test data:Given the aptitude test data:
LL = 22= 22 QQ33 = 75= 75
QQ11 = 46= 46 IQRIQR = 75 - 46 = 29= 75 - 46 = 29
QQ22 = Md = 62= Md = 62 HH = 96= 96
| | | | | | | | |2020 3030 4040 5050 6060 7070 8080 9090 100100
LL = 22 = 22 QQ11 = 46 = 46 Md = 62Md = 62 QQ33 = 75 = 75 HH = 96 = 96
Figure 3.12Figure 3.12
xxxx
©2003 Thomson/South-Western 57
Box PlotsBox Plots
Figure 3.13Figure 3.13
©2003 Thomson/South-Western 58
Box PlotsBox Plots
Figure 3.14Figure 3.14
©2003 Thomson/South-Western 59
Box PlotsBox Plots
Figure 3.15Figure 3.15
©2003 Thomson/South-Western 60
Box PlotsBox Plots
Figure 3.16aFigure 3.16a
©2003 Thomson/South-Western 61
Box PlotsBox Plots
Figure 3.16bFigure 3.16b
©2003 Thomson/South-Western 62
Box PlotsBox Plots
Figure 3.17Figure 3.17
100100
8080
6060
4040
2020
Ap
pti
tud
e S
core
Ap
pti
tud
e S
core
Box Plots for Aptitude ScoresBox Plots for Aptitude Scores
SampleSample11 22