brm unit 7

50

Click here to load reader

Upload: raja-sekhar

Post on 05-Sep-2015

273 views

Category:

Documents


5 download

DESCRIPTION

Business Research Methods

TRANSCRIPT

  • DATA ANALYSISBy G. Raja Sekhar

  • INTRODUCTION

    The data is collected, then the sampling is done,simultaneously the interviewees are carried out. Thecollected data then goes for processing. After the datahave been processed, it is necessary that these data areanalysed.

    Analysis refers to computation of certain indices ormeasures along with searching for patterns of relationshipthat exist among the data groups.

  • Statistical methods can be used to summarize ordescribe a collection of data this is called descriptivestatistics. This is useful in research, when communicatingthe results of experiments.

    In addition, patterns in the data may be modeled in away that accounts for randomness and uncertainty in theobservation and are then used to draw inferences aboutthe process or population being studied this is calledinferential statistics.

    If the inference hold true, then the descriptive statistics ofthe new data increases the soundness of that hypothesis.

  • PROCESSING STAGE

    The processing stage includes the editing, coding, classification andtabulation of collected data that are ready to analyse.

    The collected data must be arranged. In other words it means that out of allreceived data some of them are useful and others not and therefore in thisstep, these received data must be

    Edited, Coded, Classified and Tabulated.

  • EDITING

    The purpose of editing is that careful scrutinyof all collected data to producecompleteness, error-free and readability.

  • CODING

    The purpose of coding is the assigningcodes (numbers) for each category ofanswers.

  • CLASSIFICATION

    The purpose of classification is todivide the received data on the basisof their groups.

  • TABULATION

    The purpose of tabulation is the process ofsummarizing data and displaying them in theappropriate tables that further analysis are tobe facilitated.

  • ANALYSIS OF DATA

    Whenever the mass of data is collected the statisticscomes into account and it creates the proceduresto support processing of data and also analysis ofdata.

  • STATISTICS IN RESEARCH

    The role of statistics in research is to function as a tool in designing research,analysing its data and drawing conclusions therefrom.

    To achieve the objective of the research, we have to go a step further anddevelop certain indices or measures to summarise the collected/classifieddata.

    Only after this we can adopt the process of generalisation from small groups(i.e., samples) to population. If fact, there are two major areas of statisticsviz., descriptive statistics and inferential statistics.

  • DESCRIPTIVE STATISTICS ANDINFERENTIAL STATISTICS

    Descriptive statistics concern the development of certain indices from theraw data, whereas inferential statistics concern with the process ofgeneralisation.

    Inferential statistics are also known as sampling statistics and are mainlyconcerned with two major type of problems:

    The estimation of population parameters, and The testing of statistical hypotheses.

  • DESCRIPTIVE STATISTICS Descriptive statistics is the term given to the analysis of data that helps

    describe, show or summarize data in a meaningful way Descriptive statistics, allow us to make conclusions beyond the data we

    have analysed or reach conclusions regarding any hypotheses we mighthave made.

    Descriptive statistics are very important because if we simply presentedour raw data it would be hard to visualize what the data was showing,especially if there was a lot of it.

  • Descriptive statistics allow us to properly describe data through statistics andgraphs is an important topic and discussed.

    Typically, there are two general types of statistic that are used to describedata

    Measures of Central Tendency Measures of spread

  • MEASURES OF CENTRAL TENDENCY

    Measures of central tendency (or statistical averages) tell us the point aboutwhich items have a tendency to cluster. Such a measure is considered asthe most representative figure for the entire mass of data. Measure of centraltendency is also known as statistical average.

    The mean, median and mode are all valid measures of central tendency,but under different conditions, some measures of central tendency becomemore appropriate to use than others.

  • ARITHMETIC MEAN

    Arithmetic mean is defined as the sum of the items divided by the number ofitems in a series.

    Arithmetic mean is the most is the widely used and practical method for themeasurement of central tendency. It is further divided into

    Simple arithmetic mean Weighted arithmetic mean

  • SIMPLE ARITHMETIC MEAN

    Simple arithmetic mean is defined as the simple mean, i.e., total of all theitems by number of items.

    Individual Series: Direct method: In individual series, the following formula is used

    Example: Find the mean X : 10,15,12,9,6,8

    =

    =10

    Where= arithmetic mean = total of the items in a seriesn = number of items

  • Indirect Method: the indirect method is used when the number of items isvery large and to simplify that data, we take the deviation from the assumedmean. The following formula will be used for it

    Where= arithmetic mean= frequencies= variable or mid points of class interval frequencyN = Total Number of frequencies in series

  • EXAMPLEX 20 30 40 50 60 70

    No. of Students 8 12 20 10 6 4

    X f fX20 8 16030 12 36040 20 80050 10 50060 6 36070 4 280

    N = 60

    = 41

  • INCLUSIVE SERIES

    Different between upper limit of interval and lower limit of next interval isnoted; then half of the difference is deducted from lower limit of everyinterval and the same is added to upper limit of every interval.

    Example:

    Class interval 4 6 7 9 10 12 13 15 16 18 19 21 22 24Frequency 1 3 7 15 11 3 2

  • Class Interval Frequency (f) Mid point (M) fm3.5 6.5 1 5 56.5 9.5 3 8 24

    9.5 12.5 7 11 7712.5 15.5 15 14 21015.5 18.5 11 17 18718.5 21.5 3 20 6021.5 24.5 2 23 46

    42 609

    = 14.5

  • Open end intervals: Open end intervals are those in which lower limit ofthe first class and the upper limit of the last class are not known. In such case,we cannot find out the arithmetic mean unless we make an assume aboutthe unknown limits. The assumption would naturally depend upon the classinterval.

    Unequal intervals: if class intervals are not equal, make class intervalsequally, then solve the problem.

    Example:

    X 0 2 2 5 5 6 6 8 8 10 10 20 20 21 21 25F 1 2 6 2 3 6 1 4

  • X f m fm0 5 3 2.5 7.55 10 11 7.5 82.510 15 3 12.5 37.515 20 3 17.5 52.520 25 5 22.5 112.5

    25 292.5

    = 11.7

  • WEIGHTED ARITHMETIC MEAN

    WAM is defined as the calculation of arithmetic mean by assigning theweights to different items in a series differently according to their relativeimportance. It can be calculated from the following formula.

    First of all find out the product of items with their respective weights, that isWX

    Take the total of WX as Divide the value of by to get the value of weights.

    = Where = Weighted arithmetic meanN = weighted assigned to different items differently = sum of products of items with their respective weights = Sum of weights.

  • EXAMPLE A train run 25 km at a speed of 30 kmph and another 50 km at a speed of 40

    kmph. Due to repairs of the tracks it travels at a speed 10kmph for 6 minutes,and finally covers the remaining distance of 24 km at a speed of 60 kmph.What is the average speed in kmph?

    Solution: Time taken in covering 25 km at a speed of 30kmph = 50 minutesand so on. Therefore taking the time as weights.

    Speed in KMPH (X) Time taken in minutes (W) WX30 50 150040 75 300010 6 6024 60 1400

    191 6000

    =

    =

  • MERITS OF ARITHMETIC MEAN

    It is simple to understand and easy to calculate. It is rigid in nature. It includes all the items in calculation. It has further applicability for mathematical treatment. It is universal in nature.

  • DEMERITS OF ARITHMETIC MEAN

    It cannot be represented in graphically. It is not suitable in open ended classes. It is useful only in the normal distribution, It is mean effected by extreme values.

  • MEDIAN

    When the observations are arranged in ascending or descending order ofmagnitude, then the middle value is known as median of these observations.

    Let x1,x2,xn be n observations arranged in the ascending order ofmagnitude. Median is defined as the middle most term, that is the value of xat the position

    we can write

    Me =

    item (when n is odd)

    Me =

    (when n is even)

  • STEPS TO CALCULATE MEDIAN First of all, arrange the data in order whether it is ascending or descending

    order. Put the value of N to find out the value of Me

    Example: Find the median of the following data: 391,384,591,407,672,522,777,753,1490

    Solution:X = 384,391,407,522,591,672,753,777,1490Median =

    =

    = 5th item

    Find the median of the following data: 391,384,591,407,672,522,777,753,1490,222Solution:X = 222,384,391,407,522,591,672,753,777,1490Median =

    = 5.5 item

  • IN CASE OF DISCRETE SERIES

    In discrete series, following formula is useful

    Median = Me =

    =

    th item

    Steps to calculate Arrange the data in order whether it is ascending or descending order Then, make the cumulative frequency

    Put the value N in the formula

    Look just greater value which find in step 3 in the cumulative frequencytable, the value of corresponding variable is median.

  • EXAMPLE

    Find the median from the following data

    Income 100 150 80 200 250 300No. of Persons 12 13 8 10 3 15

    X F Cf80 8 8100 12 20150 13 33180 15 48200 10 58250 3 61

    Median =

    =

    31th itemJust greater than 31 is 33, and the corresponding variable is 150Therefore, median = 150

  • MERITS OF MEDIAN

    It is very simple to understand. Its calculation is very easy and simple. It is not effected by the extreme items. It can be represented graphically very small. It is not suitable average for open enabled class intervals. It deals with quality more than quantity.

  • DEMERITS OF MEDIAN

    It needs extra labour to make the ascending or descending order of datathan other averages measures.

    It does not involve all the observations at the time of calculation whichaffect its relationship.

    It cannot be calculated exactly in the series of even number of items. It is very difficult to calculate at the time of presence of very small or large

    numbers of items in the series. It has no further, mathematical applicability like other methods of average.

  • MODE

    The mode is defined to be size of the variable which occurs most frequentlyor the point of maximum frequency or the point of greatest density. It is alsoan important measure of central tendency.

    According to Kenny and Keeping, The value of the variable which occurmost frequently in a distribution is called the mode.

    WhereL1 = Lower limit of class limitF1 = Higher value of the frequencyF0 = Preceding the value of highest frequencyF2 = Succeeding the value of height frequencyI = Difference between two variables.

  • EXAMPLE

    X: 19, 21, 20, 19, 19, 19, 25, 3, 1, 9, 2, 8, 5, 8 Solution

    19 is the mode value which occurring very frequently. Therefore Z = 19

    Having two modes is called "bimodal". Having more than two modes is called "multimodal".

  • GROUPING MODE When all values appear the same number of times the idea of a mode is not

    useful. But you could group them to see if one group has more than theothers.

    Example: {4, 7, 11, 16, 20, 22, 25, 26, 33} Each value occurs once, so let us try to group them. We can try groups of 10: 0-9: 2 values (4 and 7) 10-19: 2 values (11 and 16) 20-29: 4 values (20, 22, 25 and 26) 30-39: 1 value (33)

    In groups of 10, the "20s" appear most often, so we could choose 25 as themode.

  • EXAMPLE

    Calculate the mode for the following distribution

    Solution: Here, the largest frequency is 72. it lies in the class 21 28 so the model class is 21

    28 and the lower limit of the model class is 21. Thus

    Gross Profit as % ofsales

    0 7 7 14 14 21 21 28 28 35 35 42 42 49

    No. of co s 19 25 36 72 51 43 28

  • X F0 7 197 14 2514 21 36 F021 28 72 F128 35 51 F235 42 4342 49 28

    = 21 +

  • MERITS OF MODE

    It is very simple to understand and easy to calculate because it is apositional average.

    This is based on quality rather than quantity. It is least effected by the extreme values. Where there is a large concentration of items around the value, that value is

    the good representation of the items. It is possible graphically to show the model value.

  • DEMERITS OF MODE

    Is not a suitable measure of central tendency where the number of items isvery small.

    It has no future mathematical applicability. If we have given the data about more than two series, then it is not possible

    to calculate model value. It is not possible to find out the sum of the items by multiplying with the model

    value the number or items in this measure of central.

  • SUMMARY OF WHEN TO USE THEMEAN, MEDIAN AND MODE

    Type of Variable Best measure of central tendency

    Nominal Mode

    Ordinal Median

    Interval/Ratio (not skewed) Mean

    Interval/Ratio (skewed) Median

  • MEASURES OF SPREAD A measure of spread, sometimes also called a measure of dispersion, is used to

    describe the variability in a sample or population. It is usually used in conjunction

    with a measure of central tendency, such as the mean or median, to provide an

    overall description of a set of data.

    Measures of spread, these are ways of summarizing a group of data by describing

    how spread out the scores are. For example, the mean score of our 100 students

    may be 65 out of 100. However, not all students will have scored 65 marks. Rather,

    their scores will be spread out. Some will be lower and others higher. Measures of

    spread help us to summarize how spread out these scores are. To describe this

    spread, a number of statistics are available to us, including the range, quartiles,

    absolute deviation, variance and standard deviation.

  • The measures of dispersion may be expressed as

    Absolute Measure of Dispersion

    Relative Measure of Dispersion

    Absolute Measure of Dispersion: it is constituted when the deviation of actual

    values from the measures of central tendency are taken. These measures

    are expressed in the same statistical units in which the original values are

    stated, that is, Kilograms, meters, rupees, years, months, times, etc. But

    absolute measures of dispersion cannot be used for comparison of variations

    between two series.

  • Relative Measures of Dispersion: Measure of relative dispersion is defined as

    the ration of a measure of absolute dispersion to an appropriate average.

    These are expressed in different statistical units; so these can be easily used

    for comparison or variation between two series. It is also called the

    coefficient of dispersion.

  • RANGE The simplest possible measure of dispersion is the range, which is the

    difference between the greatest and least level of the variables.

    Range may be shown under these methods

    Simple range

    Inter quartile range

    Percentile range and

    Decline range

  • It is the difference between the value of the smallest item and the value of

    the largest item include in a distribution

    Example: in the series 8, 9, 14, 10, 12, 7; range = 14 7 = 7

    Coefficient of dispersion: the relative measure of the range is called the

    coefficient of dispersion and is obtained by dividing the range with sum of

    the extreme values

    Coefficient of dispersion =

    Where R1 is Max, and R2 is Min Value of the variate

    SIMPLE RANGE

  • INTERQUARTILE RANGE

    The interquartile range is another range used as a measure of the spread.The difference between upper and lower quartiles (Q3Q1), which is calledthe interquartile range, also indicates the dispersion of a data set. Theinterquartile range spans 50% of a data set, and eliminates the influence ofoutliers because, in effect, the highest and lowest quarters are removed.

    Interquartile range = difference between upper quartile (Q3) and lower quartile(Q1)

  • EXAMPLE

    A year ago, Angela began working at a computer store. Her supervisorasked her to keep a record of the number of sales she made each month.

    The following data set is a list of her sales for the last 12 months: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.

    find: The range The upper and lower quartiles The interquartile range

  • Range = difference between the highest and lowest values

    = 57 1 = 56

    Lower quartile = value of middle of first half of data Q1= the median of 1, 11, 15, 19, 20, 24 = (third + fourth observations) 2

    = (15 + 19) 2 = 17

    Upper quartile = value of middle of second half of data Q3= the median of 28, 34, 37, 47, 50, 57 = (third + fourth observations) 2

    = (37 + 47) 2 = 42

    Interquartile range = Q3Q1= 42 17 = 25

  • MERITS OF RANGE

    Range is a very easy and simple measure to understand and calculate.Therefore, even a layman can understand it with out any difficulty.

    It is rigidly defined to some extent.

  • The disadvantage of using range is that it does not measure the spread of

    the majority of values in a data setit only measures the spread between

    highest and lowest values. As a result, other measures are required in order

    to give a better picture of the data spread. The range is an informative tool

    used as a supplement to other measures such as the standard deviation or

    semi-interquartile range, but it should rarely be used as the only measure of

    spread.