scales of measurement module1
TRANSCRIPT
Quantitative Methods inManagementModule 1: Scales Of Measurement,Data Measures
N J Jaissy
Objective
Introduction to StatisticsLearn of Nominal, Ordinal, Interval &Ratio scalesReview of central tendency measures &measures of dispersion
Mean, Median, ModeDispersionSkewness , Kurtosis
Reference Text booksReference Text books:
Levin & Rubin: Statistics for ManagementSrivastava, Shenoy & Sharma: QuantitativeTechniques for Management decisionsAnderson & Sweeney: Business Statistics
Other online resources:http://www.discover6sigma.orgKhan academy on YouTube
Data & Management
What do we measure?Why do we measure?How do we make sense of data?Why numbers?
“ You can manage, what you can measure;You can measure, what you can define;You can define what you understand.”
Statistics
Descriptive statistics
How to:Collect dataClassify dataPresent data
Inferential Statistics
How to interpret data:Make estimates,Test HypothesisDraw conclusionsMake decisions
To be covered in the initial modules To be covered in the later modules
Statistics
Descriptive statistics
How to:Collect dataClassify dataPresent data
Inferential Statistics
How to analyze data:Make estimates,Test HypothesisDraw conclusions
Types of data
QUANTITATIVE QUALITATIVE
Order No OrderE. g:Age, Weight, Height,Price, Number ofcustomers, Marketshare, Revenue $ etc
E. g:Maritalstatus,Gender,Color ofclothes
E. g:Customersatisfactionlevels, Rankin class
Scales & Data
Data scales & data measures depend onthe kind of data being evaluatedWhat information do you want from thisdata?
The answers to these questionsdetermine the ‘scale of measurement’
Scales of measurement
QUANTITATIVE QUALITATIVE
Order No OrderE. g:Age, Weight, Height,Price, Number ofcustomers, Marketshare etc
E. g:Marital status,Gender,
E. g:Customersatisfactionlevels, Rankin class
Nominal ScaleOrdinal Scale
Ratio / Interval Scale
Quantitative scalesRATIO
Absolute zero makesense ( eg: zeroweight)Ratios betweenmeasurements makesenseEg: a person at 100Kgis twice as heavy as at50kg
INTERVAL
Focus is on the‘Interval’ between 2measurementswhich is equal &measurableNo absolute zero(man made, artificial)Eg: Each year is 365days apart but theyear 2000 is notdouble the year 1000
Question Set 1
What scale is being used? Why?1. Background of students in class (BBA,
Engg. etc)2. Height of students in class3. Temperature in various Bangalore
districts4. Test scores in an exam & grading ( A+,
A, B+, B- etc)
Qualitative: Nominal
Quantitative: Ratio
Quantitative: Interval
Qualitative: Ordinal
Classifying Data
1. Data ArrayClassify the data according to ascendingor descending order
2. Frequency DistributionsClassify the data according to groups ofsimilar values
Measures of Central Tendency &Dispersion
These are key ‘summary’ measures tomake sense of data
Measures of Central tendency: Look atwhere the data is ‘centered around’ orthe middle point of the dataMeasures of Dispersion: Look at the‘dispersion’ or ‘spread’ or scatter of thedata
Measures of Central TendencyMean : The average of all the data points
Median: The data element in the ‘midpoint’
Mode : The most frequently occurringdata element
Which measure does one use?
It depends on what you want to find out!
Which measure of central tendencyis to be used?
Mode requires only a frequency count, it canbe applied to any set of data at the nominal,ordinal or interval level of data.Median requires an ordering of items formthe highest to the lowest or vice versa.Hence it can be obtained from an ordinal orinterval level of data and not from nominaldata like party affiliation, caste or religion.Mean is exclusively restricted to intervaldata such as income, age, wage rate & testscore.
Which measure of central tendencyis to be used?Research objective:
Mode is useful to find out most commoncategory. E.g. test score, caste, ageMean is useful for further mathematicalmanipulations.Median is useful to find out mid values.
Question Set 2:
1. A factory manager takes the weight of10 samples of ball bearings beingmanufactured. What is the mostcommonly occurring weight? What isthe average weight and what is themedian weight?
Weight (grams) 12.5 12 10 10 11 12.511.5 10 11 10.5
Mean = 11.1 Median = 11 Mode = 10
1. Measures of central tendency:MEANGrouped data is data that is classified into
groups of similar data values
Mean = sum of all observations / totalnumber of observations
Weighted Mean = observation * frequency/ total number of observations
Question Set 2:
1. Child care Community Nursery iseligible for a grant as long as the averageage of its children stays below 9. If thesedata represent the ages of all children inthe child care, do they qualify for thegrant?8 5 9 10 9 12 7 12 13 7 8
Average age = 9.09 yrs… NO
Question Set 2
3. The following are the number of daysworkers in a factory went on leave inJune. What is the average leave/workerin June?
# days leave # of workers1 22 53 74 25 16 0
Average leave perworker = 2.7 daysor approx 3 days
Question Set 23. The frequency distribution represents the time in
seconds needed to serve a sample of customers atSukh Sagar. What is the average time to servecustomers?
Time(seconds) Frequency20-29 630-39 1640-49 2150-59 2960-69 2570-79 2280-89 1190-99 7
100-109 4110-119 0120-129 2
Average = 61 sec
Question Set 2
4. Micromax ran six local newspaper ads last month. Thedetails of number of times subscribers saw the ad isshown below. What is the average number of times a
subscriber saw a Micromax ad?
0 8971 10822 13253 8144 3075 2536 198
No. of timessubscibers saw ad Freqency
Answer = 2 times
Geometric Mean
This is used when we want to find averagegrowth rates over time when the growthrates change. Eg: over ‘n’ years - computethe growth rates if the growth rates peryear are X1, X2, X3… Xn
Question Set 2
5. Rs 100 deposited in a bank at varyinginterest rates over 5 years. What is theaverage growth rate over 5 years?
interest rate - %Yr 1 7Yr 2 8Yr 3 10Yr 4 12Yr 5 18
Principal 100
10.93%
Measures of Central Tendency:MEDIAN
This is the data element in the center ofthe array.Data is arranged in order – ascending ordescendingThe middle most element is then chosenMedian = ( n + 1)/2 th item in data array
If there are even number of data elementsthen the average of the middle two dataelements is the median
Median for grouped data
L = Lower exact limit of the interval containingMd.
nb = number of scores below L.
nw= number of scores within the intervalcontaining Md.
i = the width of the interval (for ungrouped datai=1).
N = the Number of scores.
Where:
Question Set 3:1. Nilgiris compares the prices charged for a
packet of Parle G biscuits in all of its stores.What is the median price? What is the meanor average price?
Price - Rs10.89.8
10.912.413.311.415.510.812.210.5
Mean = Rs 11.76
Median = Rs 11.15
Question Set 3
2. The following data shows the weights offish caught in the Cauvery last month.What is the median weight of the fish.What is the average ( mean) weight?
Weight (Kg) Frequency0 - 24.9 525-49.9 1350-74.9 1675-99.9 8
100-124.9 6
Median = 59.375/ 59.325
Underlined is the exactanswer using the exactlower limit. Delta is smallgiven the differencebetween classes is tiny).
Question Set 33. The values of gold per gram sold in the black market
over the last 1 year are shown. What is the medianclass of gold sold? What is the number of the itemthat represents the median? What is the estimatedvalue of median value of gold?
Class ( $) Frequency100 - 149.5 12150- 199.5 14200 - 249.5 27250 - 299.5 58300 - 349.5 72350 - 399.5 63400 - 449.5 36450 - 499.5 18
Median = 327.083 /326.83
Underlined is the exactanswer using the exactlower limit (Delta issmall given thedifference betweenclasses is tiny)
Measures of central tendency: Mode
Distribution of application for admission toM.B.A. by discipline,
Discipline No. of studentsScience 55
Arts 60Commerce 101Engineering 45
Medical 5Total 266
Findings:The commerce category is predominant.
Measures of Central Tendency:MODEWhen data is grouped in classes, we assume the mode is
in the class with the highest frequency.
An approximation would be to use the mid point of the‘modal class interval’
Question set 41. The table shows the avg. monthly balances of
600 customers. What is the model of thechecking account balances? What is themedian? What is the most commonlyoccurring account balance?
Class in $ Frequency0-49.99 78
50 - 99.99 123100 - 149.99 187150 - 199.99 82200 - 249.99 51250 - 299.99 47300 - 349.99 13350 - 399.99 9400 - 449.99 6450 - 499.99 4
Mode = 119$
Median = 126.47
Question Set 42. The ages of residents of Twin Lakes
Retirement village have this frequencydistribution. What is the modal class andmodal value of the distribution?
Class Frequency47-51.9 452-56.9 957-61.9 1362-66.9 4267-71.9 3972-76.9 2077-81.9 9
Mode = 66.5Modal class = ( 62 – 66.9)
Measures of Dispersion
RangeInterquartile RangeStandard deviationCoefficient of variation
DispersionGives an indication of how the data is spread orits variability.If the data set is highly dispersed, the reliabilityof the ‘mean’ as indicative of the data reduces
Note: All the above data sets have a common mean but have widelydiffering values – and therefore different implications
Data 1 Data 2 Data 3 Data 4Mean /Average
Set 1 1 4 3 12 5
Set 2 0 6 7 7 5
Set 3 5 5 5 5 5
1. Measures of dispersion: RangeRange = highest data – lowest dataPercentile = if we divide the data into 100 parts,with each part having 1/100th of the data, eachpart is a percentile50th percentile – means 50 of the ‘percentile’data points or 50% of the data points fall belowthis point.
Interquartile Range
Quartile – we divide the data into 4 partssuch that 1/4th of data falls in each partNote: The widths of the quartile don’tneed to be the same!
Interquartile Range
Interquartile Ranges; Grouped data
Interquartile range: ExampleAn insurance company compiles data of 40of its customers showing the milestravelled. What is the interquartile range?
Number of Km driven3600 4200 4700 4900 5300 5700 67007300 7700 8100 8300 8400 8700 87008900 9300 9500 9500 9700 1000010300 10500 10700 10800 1100011300 11300 11800 12100 1270012900 13100 13500 13800 1460014900 16300 17200 18500 20300
Measures of Dispersion : Variance &Standard Deviation
How far does each data point vary fromthe mean or the average? = deltaThe sum of the square of this delta =VarianceStandard deviation = square root ofvariance
Population & Sample
Sample Formulae
Coefficient of Variation
Used when you wish to compare two populations variability( where means are very different) . The higher coefficient hashigher variability.
Question Set 6
1. Results on a purity test on a populationof 15 vials of a compound produced in aday were done. Results are shownbelow. What is the average impurity level& Standard Deviation?
% impurity 0.04 0.14 0.17 0.19 0.220.06 0.14 0.17 0.21 0.24 0.12 0.150.18 0.21 0.25
Hint: Are we looking at a sample or a population?
SD = 0.58%
Question Set 53. I’m evaluating National Insurance and Reliance’s
payouts over the last year. I do this by evaluating a setof accident claims from each company’s database. Ifthe payout is linked to the variability of the accidentclaims, which company will payout more?
National Reliance863 490903 540957 560
1041 5701138 5901204 6001354 6101624 6201698 6601745 6701802 6901883 630
National will payout more.National Insurance coefficientof variation = 28 which ishigher than Reliance’s = 10.
Grouped data: Population
Grouped Data: Samples
Note that in formulae for Samples, N is replaced by (n-1)
Question Set 52. Maruti Suzuki did a survey of a set of number
of families in Kengeri who would like to own acar next year. The results are tabulatedbelow ( average of husband’s & wife’s):. Whatis the variance & Standard Deviation?
No. of cars Frequency0.0 2.00.5 14.01.0 23.01.5 7.02.0 4.02.5 2.0
Standard deviation =0.55 car
Hint: Are we looking at asample or a population?
Question Set 54. An airline published data on the busiest airports in
the US. Assume this is population data. What is themean? What is the variance & Standard deviation ofthis data?
No. ofpassengersarriving &departing
No. ofairports
20 -under 30 830 - under 40 740 - under 50 150 - under 60 060 - under 70 370 -under 80 1
Mean = 38Variance = 251,SD = 15.8
Average no. of passengersper airport is 38 and theSD of no. passengersfrom this mean in thedata = 15.8
Question Set 55. A survey was done of the music listeners to a radio
station in a state in South India & North India. Data istabulated below: Which state has a wider range ofages of people listening to its music?
Age Frequency15 under 20 920 under 25 1625 under 30 2730 under 35 4435 under 40 4240 under 45 2345 under 50 750 under 55 2
Age Frequency15 under 20 220 under 25 525 under 30 930 under 35 2535 under 40 5540 under 45 6045 under 50 1250 under 55 2
South India North India
The South Indian state has a wider range of ages. Its coefficient ofvariability = 23% compared to North India which is 16%
Question Set 51. There are 2 training programs to be conducted within
a specific time.Based on data collected last month:Training A takes an average of 32.11 hours with a variance
of 68.09Training B takes an average of 19.75 hours with a variance
of 71.14Which training program will have less variability
performance wise?
Training program A which has a lowerCoefficient of variation (Coefficent ofvariation for A = 0.26 versus B = 0.43 )
Skewness: Measure of skewness
•Mean is very sensitive toextreme values and will tendtowards ‘tail end’•Median & Mode are less sensitive
Coefficient of Skewness
•‘Skewness’ refers to the Symmetry of the data.
•A greater magnitude of coefficient of skewness means thedata is more skewed or less symmetrical
•Implications: Highly skewed data may not give accurateresults as the frequency formulae used till now assumes noskew!
Mean > Median……. + ive skewMean < Median……. – ive skewMean = Median……. No skew
KurtosisRefers to the ‘peakedness’ of a frequency distributioncurveAlso indicates whether the data distributions are ‘heavytailed’ or ‘light tailed’