scales of measurement module1

Quantitative Methods inManagementModule 1: Scales Of Measurement,Data Measures

N J Jaissy

Objective

Introduction to StatisticsLearn of Nominal, Ordinal, Interval &Ratio scalesReview of central tendency measures &measures of dispersion

Mean, Median, ModeDispersionSkewness , Kurtosis

[email protected] 2

Reference Text booksReference Text books:

Levin & Rubin: Statistics for ManagementSrivastava, Shenoy & Sharma: QuantitativeTechniques for Management decisionsAnderson & Sweeney: Business Statistics

Other online resources:http://www.discover6sigma.orgKhan academy on YouTube

[email protected] 3

Data & Management

What do we measure?Why do we measure?How do we make sense of data?Why numbers?

[email protected] 4

“ You can manage, what you can measure;You can measure, what you can define;You can define what you understand.”

Statistics

Descriptive statistics

How to:Collect dataClassify dataPresent data

Inferential Statistics

How to interpret data:Make estimates,Test HypothesisDraw conclusionsMake decisions

[email protected] 5

To be covered in the initial modules To be covered in the later modules

Statistics

Descriptive statistics

How to:Collect dataClassify dataPresent data

Inferential Statistics

How to analyze data:Make estimates,Test HypothesisDraw conclusions

[email protected] 6

Types of data

[email protected] 7

QUANTITATIVE QUALITATIVE

Order No OrderE. g:Age, Weight, Height,Price, Number ofcustomers, Marketshare, Revenue $ etc

E. g:Maritalstatus,Gender,Color ofclothes

E. g:Customersatisfactionlevels, Rankin class

Scales & Data

Data scales & data measures depend onthe kind of data being evaluatedWhat information do you want from thisdata?

The answers to these questionsdetermine the ‘scale of measurement’

[email protected] 8

Scales of measurement

[email protected] 9

QUANTITATIVE QUALITATIVE

Order No OrderE. g:Age, Weight, Height,Price, Number ofcustomers, Marketshare etc

E. g:Marital status,Gender,

E. g:Customersatisfactionlevels, Rankin class

Nominal ScaleOrdinal Scale

Ratio / Interval Scale

Quantitative scalesRATIO

Absolute zero makesense ( eg: zeroweight)Ratios betweenmeasurements makesenseEg: a person at 100Kgis twice as heavy as at50kg

INTERVAL

Focus is on the‘Interval’ between 2measurementswhich is equal &measurableNo absolute zero(man made, artificial)Eg: Each year is 365days apart but theyear 2000 is notdouble the year 1000

[email protected] 10

Question Set 1

What scale is being used? Why?1. Background of students in class (BBA,

Engg. etc)2. Height of students in class3. Temperature in various Bangalore

districts4. Test scores in an exam & grading ( A+,

A, B+, B- etc)


Qualitative: Nominal

Quantitative: Ratio

Quantitative: Interval

Qualitative: Ordinal

Classifying Data

1. Data ArrayClassify the data according to ascendingor descending order

2. Frequency DistributionsClassify the data according to groups ofsimilar values


Measures of Central Tendency &Dispersion

These are key ‘summary’ measures tomake sense of data

Measures of Central tendency: Look atwhere the data is ‘centered around’ orthe middle point of the dataMeasures of Dispersion: Look at the‘dispersion’ or ‘spread’ or scatter of thedata


Measures of Central TendencyMean : The average of all the data points

Median: The data element in the ‘midpoint’

Mode : The most frequently occurringdata element

Which measure does one use?


It depends on what you want to find out!

Which measure of central tendencyis to be used?

Mode requires only a frequency count, it canbe applied to any set of data at the nominal,ordinal or interval level of data.Median requires an ordering of items formthe highest to the lowest or vice versa.Hence it can be obtained from an ordinal orinterval level of data and not from nominaldata like party affiliation, caste or religion.Mean is exclusively restricted to intervaldata such as income, age, wage rate & testscore.


Which measure of central tendencyis to be used?Research objective:

Mode is useful to find out most commoncategory. E.g. test score, caste, ageMean is useful for further mathematicalmanipulations.Median is useful to find out mid values.


Question Set 2:

1. A factory manager takes the weight of10 samples of ball bearings beingmanufactured. What is the mostcommonly occurring weight? What isthe average weight and what is themedian weight?

Weight (grams) 12.5 12 10 10 11 12.511.5 10 11 10.5


Mean = 11.1 Median = 11 Mode = 10

1. Measures of central tendency:MEANGrouped data is data that is classified into

groups of similar data values

Mean = sum of all observations / totalnumber of observations

Weighted Mean = observation * frequency/ total number of observations


Question Set 2:

1. Child care Community Nursery iseligible for a grant as long as the averageage of its children stays below 9. If thesedata represent the ages of all children inthe child care, do they qualify for thegrant?8 5 9 10 9 12 7 12 13 7 8


Average age = 9.09 yrs… NO

Question Set 2

3. The following are the number of daysworkers in a factory went on leave inJune. What is the average leave/workerin June?


# days leave # of workers1 22 53 74 25 16 0

Average leave perworker = 2.7 daysor approx 3 days

Question Set 23. The frequency distribution represents the time in

seconds needed to serve a sample of customers atSukh Sagar. What is the average time to servecustomers?


Time(seconds) Frequency20-29 630-39 1640-49 2150-59 2960-69 2570-79 2280-89 1190-99 7

100-109 4110-119 0120-129 2

Average = 61 sec

Question Set 2


4. Micromax ran six local newspaper ads last month. Thedetails of number of times subscribers saw the ad isshown below. What is the average number of times a

subscriber saw a Micromax ad?

0 8971 10822 13253 8144 3075 2536 198

No. of timessubscibers saw ad Freqency

Answer = 2 times

Geometric Mean

This is used when we want to find averagegrowth rates over time when the growthrates change. Eg: over ‘n’ years - computethe growth rates if the growth rates peryear are X1, X2, X3… Xn


Question Set 2

5. Rs 100 deposited in a bank at varyinginterest rates over 5 years. What is theaverage growth rate over 5 years?


interest rate - %Yr 1 7Yr 2 8Yr 3 10Yr 4 12Yr 5 18

Principal 100

10.93%

Measures of Central Tendency:MEDIAN

This is the data element in the center ofthe array.Data is arranged in order – ascending ordescendingThe middle most element is then chosenMedian = ( n + 1)/2 th item in data array

If there are even number of data elementsthen the average of the middle two dataelements is the median


Median for grouped data


L = Lower exact limit of the interval containingMd.

nb = number of scores below L.

nw= number of scores within the intervalcontaining Md.

i = the width of the interval (for ungrouped datai=1).

N = the Number of scores.

Where:

Question Set 3:1. Nilgiris compares the prices charged for a

packet of Parle G biscuits in all of its stores.What is the median price? What is the meanor average price?


Price - Rs10.89.8

10.912.413.311.415.510.812.210.5

Mean = Rs 11.76

Median = Rs 11.15

Question Set 3

2. The following data shows the weights offish caught in the Cauvery last month.What is the median weight of the fish.What is the average ( mean) weight?


Weight (Kg) Frequency0 - 24.9 525-49.9 1350-74.9 1675-99.9 8

100-124.9 6

Median = 59.375/ 59.325

Underlined is the exactanswer using the exactlower limit. Delta is smallgiven the differencebetween classes is tiny).

Question Set 33. The values of gold per gram sold in the black market

over the last 1 year are shown. What is the medianclass of gold sold? What is the number of the itemthat represents the median? What is the estimatedvalue of median value of gold?


Class ( $) Frequency100 - 149.5 12150- 199.5 14200 - 249.5 27250 - 299.5 58300 - 349.5 72350 - 399.5 63400 - 449.5 36450 - 499.5 18

Median = 327.083 /326.83

Underlined is the exactanswer using the exactlower limit (Delta issmall given thedifference betweenclasses is tiny)

Measures of central tendency: Mode

Distribution of application for admission toM.B.A. by discipline,


Discipline No. of studentsScience 55

Arts 60Commerce 101Engineering 45

Medical 5Total 266

Findings:The commerce category is predominant.

Measures of Central Tendency:MODEWhen data is grouped in classes, we assume the mode is

in the class with the highest frequency.


An approximation would be to use the mid point of the‘modal class interval’

Question set 41. The table shows the avg. monthly balances of

600 customers. What is the model of thechecking account balances? What is themedian? What is the most commonlyoccurring account balance?


Class in $ Frequency0-49.99 78

50 - 99.99 123100 - 149.99 187150 - 199.99 82200 - 249.99 51250 - 299.99 47300 - 349.99 13350 - 399.99 9400 - 449.99 6450 - 499.99 4

Mode = 119$

Median = 126.47

Question Set 42. The ages of residents of Twin Lakes

Retirement village have this frequencydistribution. What is the modal class andmodal value of the distribution?


Class Frequency47-51.9 452-56.9 957-61.9 1362-66.9 4267-71.9 3972-76.9 2077-81.9 9

Mode = 66.5Modal class = ( 62 – 66.9)

Measures of Dispersion

RangeInterquartile RangeStandard deviationCoefficient of variation


DispersionGives an indication of how the data is spread orits variability.If the data set is highly dispersed, the reliabilityof the ‘mean’ as indicative of the data reduces


Note: All the above data sets have a common mean but have widelydiffering values – and therefore different implications

Data 1 Data 2 Data 3 Data 4Mean /Average

Set 1 1 4 3 12 5

Set 2 0 6 7 7 5

Set 3 5 5 5 5 5

1. Measures of dispersion: RangeRange = highest data – lowest dataPercentile = if we divide the data into 100 parts,with each part having 1/100th of the data, eachpart is a percentile50th percentile – means 50 of the ‘percentile’data points or 50% of the data points fall belowthis point.


Interquartile Range

Quartile – we divide the data into 4 partssuch that 1/4th of data falls in each partNote: The widths of the quartile don’tneed to be the same!


Interquartile Range


Interquartile Ranges; Grouped data


Interquartile range: ExampleAn insurance company compiles data of 40of its customers showing the milestravelled. What is the interquartile range?

Number of Km driven3600 4200 4700 4900 5300 5700 67007300 7700 8100 8300 8400 8700 87008900 9300 9500 9500 9700 1000010300 10500 10700 10800 1100011300 11300 11800 12100 1270012900 13100 13500 13800 1460014900 16300 17200 18500 20300


Measures of Dispersion : Variance &Standard Deviation

How far does each data point vary fromthe mean or the average? = deltaThe sum of the square of this delta =VarianceStandard deviation = square root ofvariance


Population & Sample


Population Formulae


Standard Deviation( square root of variance)

Variance

Sample Formulae


Coefficient of Variation


Used when you wish to compare two populations variability( where means are very different) . The higher coefficient hashigher variability.

Question Set 6

1. Results on a purity test on a populationof 15 vials of a compound produced in aday were done. Results are shownbelow. What is the average impurity level& Standard Deviation?

% impurity 0.04 0.14 0.17 0.19 0.220.06 0.14 0.17 0.21 0.24 0.12 0.150.18 0.21 0.25


Hint: Are we looking at a sample or a population?

SD = 0.58%

Question Set 53. I’m evaluating National Insurance and Reliance’s

payouts over the last year. I do this by evaluating a setof accident claims from each company’s database. Ifthe payout is linked to the variability of the accidentclaims, which company will payout more?


National Reliance863 490903 540957 560

1041 5701138 5901204 6001354 6101624 6201698 6601745 6701802 6901883 630

National will payout more.National Insurance coefficientof variation = 28 which ishigher than Reliance’s = 10.

Grouped data: Population


Grouped Data: Samples


Note that in formulae for Samples, N is replaced by (n-1)

Question Set 52. Maruti Suzuki did a survey of a set of number

of families in Kengeri who would like to own acar next year. The results are tabulatedbelow ( average of husband’s & wife’s):. Whatis the variance & Standard Deviation?


No. of cars Frequency0.0 2.00.5 14.01.0 23.01.5 7.02.0 4.02.5 2.0

Standard deviation =0.55 car

Hint: Are we looking at asample or a population?

Question Set 54. An airline published data on the busiest airports in

the US. Assume this is population data. What is themean? What is the variance & Standard deviation ofthis data?


No. ofpassengersarriving &departing

No. ofairports

20 -under 30 830 - under 40 740 - under 50 150 - under 60 060 - under 70 370 -under 80 1

Mean = 38Variance = 251,SD = 15.8

Average no. of passengersper airport is 38 and theSD of no. passengersfrom this mean in thedata = 15.8

Question Set 55. A survey was done of the music listeners to a radio

station in a state in South India & North India. Data istabulated below: Which state has a wider range ofages of people listening to its music?


Age Frequency15 under 20 920 under 25 1625 under 30 2730 under 35 4435 under 40 4240 under 45 2345 under 50 750 under 55 2

Age Frequency15 under 20 220 under 25 525 under 30 930 under 35 2535 under 40 5540 under 45 6045 under 50 1250 under 55 2

South India North India

The South Indian state has a wider range of ages. Its coefficient ofvariability = 23% compared to North India which is 16%

Question Set 51. There are 2 training programs to be conducted within

a specific time.Based on data collected last month:Training A takes an average of 32.11 hours with a variance

of 68.09Training B takes an average of 19.75 hours with a variance

of 71.14Which training program will have less variability

performance wise?


Training program A which has a lowerCoefficient of variation (Coefficent ofvariation for A = 0.26 versus B = 0.43 )

Skewness: Measure of skewness


•Mean is very sensitive toextreme values and will tendtowards ‘tail end’•Median & Mode are less sensitive

Coefficient of Skewness


•‘Skewness’ refers to the Symmetry of the data.

•A greater magnitude of coefficient of skewness means thedata is more skewed or less symmetrical

•Implications: Highly skewed data may not give accurateresults as the frequency formulae used till now assumes noskew!

Mean > Median……. + ive skewMean < Median……. – ive skewMean = Median……. No skew

KurtosisRefers to the ‘peakedness’ of a frequency distributioncurveAlso indicates whether the data distributions are ‘heavytailed’ or ‘light tailed’


scales of measurement module1

Education