agec 405 lecture iii

Analysing Data

Descriptive Statistics

Descriptive statistics

Descriptive statistics provides an objective way of describing and summarising data

Data description

Two key measures of data description

• Location – to show where the centre of the data is, giving some kind of typical or average value

• Dispersion (spread) – to show how spread out the data is around this centre, giving an idea of the range of values.

Measures of location

• Three basic measures of location used:Arithmetic mean the average value

Median the middle value

Mode the most frequent value

• Three data structures:Untabulated (raw data)

Tabulated (ungrouped)

Tabulated (grouped)For use with Curwin & Slater, Quantitative

Methods for Business Decisions, 6th Edition ISBN: 9781844805747

Mean - Untabulated (raw data)

The mean for untabulated data is obtained by dividing the sum of all values by the number of values in the data set. Thus,

Mean for population data:

Mean for sample data:

Example 1

The following are the ages of all eight employees of a small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.

Solution 1

years 25.458

Thus, the mean age of all eight employees of this company is 45.25 years, or 45 years and 3 months.

Mean - tabulated (ungrouped data)

Sample mean of data:

Where x is the value of the observation and f is the frequency of the observation.

Example

The number of working days lost by employees in the last quarter (Calculate the average number of working

days lost)Number of days (x)

Number of employees (f)

x f fx

20901.451 days lost

• Mean can be affected by outliers

Outliers

Definition Values that are very small or very large

relative to the majority of the values in a data set are called outliers or extreme values.

Example 3

Table 2 lists the 2000 populations (in thousands) of the five Pacific states.

StatePopulation

(thousands)

WashingtonOregonAlaskaHawaiiCalifornia

58943421627

121233,872 An outlier

Table 2

Solution 3

Now, to see the impact of the outlier on the value of the mean, we include the population of California and find the mean population of all five Pacific states. This mean is

thousand2.90055

872,33121262734215894Mean

Example 3

Notice that the population of California is very large compared to the populations of the other four states. Hence, it is an outlier. Show how the inclusion of this outlier affects the value of the mean.

Solution 3

If we do not include the population of California (the outlier) the mean population of the remaining four states (Washington, Oregon, Alaska, and Hawaii) is

thousand5.27884

121262734215894Mean

Mean - tabulated (grouped data)

Mean for population data:

Mean for sample data:

Where x is the midpoint and f is the frequency of a class.

Calculate the mean of the grouped data below

Weight (oz) Class midpoint (x)

Frequency f fx

19.2-19.4 19.3 1 19.3

19.5-19.7 19.6 2 39.2

19.8-20.0 19.9 8 159.2

20.1-20.3 20.2 4 80.8

20.4-20.6 20.5 3 61.5

20.7-20.9 20.8 2 41.6

Total 20f n 401.6fx

• n = 20• Ʃfx = 401.6

401.620.08

fxx oz

Median

Definition The median is the value of the middle term

in a data set that has been ranked in increasing order.

Median cont.

The calculation of the median consists of the following two steps:

1. Rank the data set in increasing order

2. Find the middle term in a data set with n values. The value of this term is the median.

Median cont.

Value of Median for Ungrouped Data

set data ranked ain th term2

1 theof Value Median

Example 6

The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership:

10 5 19 8 3

Find the median.

Solution 6

First, we rank the given data in increasing order as follows:

3 5 8 10 19

There are five observations in the data set. Consequently, n = 5 and

1 termmiddle theofPosition

Solution 6

Therefore, the median is the value of the third term in the ranked data.

3 5 8 10 19

The median weight loss for this sample of five members of this health club is 8 pounds.

Median

Example 7

Table 8 lists the total revenue for the 12 top-grossing North American concert tours of all time.

Find the median revenue for these data.

Table 8

Tour Artist

Total Revenue

(millions of dollars)

Steel Wheels, 1989

Magic Summer, 1990

Voodoo Lounge, 1994

The Division Bell, 1994

Hell Freezes Over, 1994

Bridges to Babylon, 1997

Popmart, 1997

Twenty-Four Seven, 2000

No Strings Attached, 2000

Elevation, 2001

Popodyssey, 2001

Black and Blue, 2001

The Rolling Stones

New Kids on the Block

The Rolling Stones

Pink Floyd

The Eagles

The Rolling Stones

Tina Turner

‘N-Sync

The Backstreet Boys

Solution 7

First we rank the given data in increasing order, as follows:

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

There are 12 values in this data set. Hence, n = 12 and

1 termmiddle theofPosition

Solution 7

Therefore, the median is given by the mean of the sixth and the seventh values in the ranked data.

74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2

Thus the median revenue for the 12 top-grossing North American concert tours of all time is $84.45 million.

million 45.84$45.842

8.861.82Median

Median - tabulated (ungrouped data)

Steps:

• Order the observations

• Calculate cummulative frequency

• Cummulative frequency is the number of items with a given value or less

Example

The number of working days lost by employees in the last quarter (Calculate the median number of working

x f Cumulative frequency

840=410+430

1130=840+290

1310=1130+180

1420=1310+110

1440=1420+20

The position of the median is (n+1)/2 = (1440+1)/2 =720.5ie between 720th and 721st one day

Advantages of using median

The advantage of using the median as a measure of central tendency is that it is not influenced by outliers. Consequently, the median is preferred over the mean as a measure of central tendency for data sets that contain outliers.

Median for grouped data

• Median for a grouped data is given by:

• L ≡ lower limit of the median class• n ≡ number of observation• F ≡ sum of frequency up to but excludes the median class

• fm ≡ frequency of the median class

• c ≡ width of the class

n Fmedian L c

Calculate the median of the grouped data below

Weight (oz) Frequency (f)

19.2-19.4 1

19.5-19.7 2

19.8-20.0 8

20.1-20.3 4

20.4-20.6 3

20.7-20.9 2

Total 20f n

Median

• L ≡ 19.8, n ≡ 20, F ≡ 3, fm ≡ 8, c ≡ 0.3

2 20 2 319.8 0.3

7 19.8 0.3 19.8 0.2625

20.06 oz

n Fmed L c

Definition

The mode is the value that occurs with the highest frequency in a data set.

Example 8

The following data give the speeds (in miles per hour) of eight cars that were stopped for speeding violations.

77 69 74 81 71 68 74 73

Find the mode.

Solution 8

In this data set, 74 occurs twice and each of the remaining values occurs only once. Because 74 occurs with the highest frequency, it is the mode. Therefore,

Mode = 74 miles per hour

Mode cont.

• A data set may have none or many modes, whereas it will have only one mean and only one median.– The data set with only one mode is called

unimodal.– The data set with two modes is called

bimodal.– The data set with more than two modes is

called multimodal.

Different patterns for the mode

Mode - tabulated (ungrouped data)

The number of working days lost by employees in the last quarter (Calculate the mode number of working

The mode correspond to the highest frequency occurring number which is one day lost

Number of days (x)

Advantage of using the mode

One advantage of the mode is that it can be calculated for both quantitative and qualitative kinds of data, whereas the mean and median can be calculated for only quantitative data.

Example 12

The status of five students who are members of the student senate at a college are senior, sophomore, senior, junior, senior. Find the mode.

Solution 12

Because senior occurs more frequently than the other categories, it is the mode for this data set.

We cannot calculate the mean and median for this data set.

Mode for tabulated grouped data

• For a group data, mode is given as:

• L ≡ lower limit of the modal class

• d1 ≡ frequency of modal class minus previous class

• d2 ≡ frequency of modal class minus following class

• c ≡ width of the class

dmode L c

Calculate the mode of the grouped data below

Weight (oz) Frequency (f)

19.2-19.4 1

19.5-19.7 2

19.8-20.0 8

20.1-20.3 4

20.4-20.6 3

20.7-20.9 2

Total 20f n

• L ≡ 19.8, d1 ≡ 6, d2 ≡ 4, c ≡ 0.3

619.8 0.3

1.8 19.8 19.8 0.18

10 19.98 oz

dmode L c

Relationships among the Mean, Median, and Mode

1. This is observed with regards to the shape of the frequency distribution (Skewness).

In Figure 1, the values of the mean, median, and mode are identical, and they lie at the center of the distribution.

Figure 1 Zero Skewed (Symmetrical)

Figure 2 Positively skewed

Positively Skewed

2. A histogram and a frequency curve is positively skewed if the right tail is longer (Figure 2),

the value of the mean > median > mode

Notice that the mode always occurs at the peak point. The value of the mean is the largest in this case

because it is sensitive to outliers that occur in the right tail. Outliers in the right tail pull the mean to the right.

Figure 3 Negatively skewed

Negatively Skewed

3. A histogram and a frequency distribution is negatively skewed if the left tail is longer (Figure 3)

the value of the mode > median > mean – In this case, the outliers in the left tail pull the

mean to the left.

agec 405 lecture iii

Technology

lecture notes on course code: bms 405: pe-i

lecture 7: linear programming in excel agec 352 spring 2011...

agec 352 final project

agec magazine juin 2012

agec 642 - spring 2021

agec/fnr 406 lecture 11. dynamic efficiency two lectures...

agec/fnr 406 lecture 2 sherpa hauling fodder, langtang...

agec 420, lec 311 agec 420 markets tradesim technical...

lecture notes for agec 2225: statistical analysis

lecture notes for phy 405 classical...

agec/fnr 406 lecture 21 atmospheric concentrations of carbon...

home | agec-eg.com

agec alim trans

agec 105 - department of agricultural...

agec magazine aout 2012

cjn 405 lecture 3-22-11

agec 429 lecture 22 u.s. agricultural trade policy

agec/fnr 406 lecture 6 an irrigated rice field in bangladesh

lecture 1 - introduction to mech 405

agec/fnr 406 lecture 4 collecting fragments of coal in a...