descriptive statistics prepared by masood amjad khan gcu, lahore

70
Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Upload: milton-carroll

Post on 03-Jan-2016

229 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Descriptive Statistics

Prepared By

Masood Amjad Khan

GCU, Lahore

Page 2: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Slide No.

1. Index 22. Index 3 3. Statistics (Definitions) 4 4. Descriptive Statistics 5 5. Inferential Statistics 116. Examples of 4 and 5 147. Data, Level of measurements 15 8. Variable 8 9. Discrete variable 10 10. Continues variable 9 11. Frequency Distribution 6 12. Constructing Freq. Distn. 22, 23 13. Example of 12 24, 2514. Displaying the Data 715. Bar Chart, Pie Chart 1616. Stem Leaf Plot 32-34 17. Graph 17 18. Histogram 26, 2719. Frequency Polygon 28, 29 20. Cumulative Freq. Polygon 30, 31

Subject Slide No.

21. Summary Measures 18 22. Goals 1923. Arithmetic Mean 37, 4024. Characteristic of Mean 2025. Examples of 23 38-39 26. Weighted Mean 41 27. Example weighted Mean 42 28. Geometric Mean 43 29. Example: Geometric Mean 44 30. Median 4531. Example of Median 4632. Properties of Median 4733. Mode 48 34. Examples of Mode 49-50 35. Positions of mean, median and mode. 51 36. Dispersion 52 37. Range and Mean Deviation 53 39. Example of Mean Deviation 54-55 40. Variance 56

Subject

Index

Page 3: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

41. Examples of variance 57-59 42. Moments 60 43. Examples of Moments 61-6244. Skewness 6345. Types of Skewness 6446. Coefficient of Skewness 6547. Example of skewness 66-6748. Empirical Rule 68-6949. Exercise 7050.51.52.53.54.55.56.57.58.59.60.

Slide No.

Subject Subject

61.62.63.64.65.66.67.68.69.70.71.72.73.74.75.76.77.75.79.80.

Slide No.

Index

Page 4: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Numerical Facts (Common Usage)

Field or Discipline of StudyDefinition

The Science of Collection, Presentation, Analyzing and Interpretation of Data to make Decisions and Forecasts.

1. No. of children born in a hospital in some specified time.2. No. of students enrolled in GCU in 2007.3. No of road accidents on motor way.4. Amount spent on Research Development in GCU during 2006-2007.5. No. of shut down of Computer Network on a particular day.

Inferential Statistics

Probability provides the transitionbetween Descriptive and

Inferential Statistics

Examples of DescriptiveAnd Inferential

Statistics

DescriptiveStatistics

1

STATISTICS

Page 5: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Consists of methods for Organizing, Displaying,and Describing Data by using Tables, Graphs,and Summary Measures.

DataData

A data set is a collection of observations on oneor more variables..

Types of Data

1

Descriptive Statistics

Page 6: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

A grouping of qualitative data intomutually exclusive classes showing the number of observations in each class.

A grouping of quantitative data intomutually exclusive classes showingthe number of observations in eachclass.

Preference of four type of beverageby 100 customers.Beverage NumberCola-Plus 40Coca-Cola 25Pepsi 207-UP 15

Selling price of 80 vehiclesVehicle Selling Number ofPrice Vehicles15000 to 24000 4824000 to 33000 3033000 to 42000 2

Tables

Frequency Table Frequency Distribution

Construction of Frequency Distribution

1 Organizing the Data

Page 7: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Displaying the Data

Diagrams/Charts Graph

Bar Chart Pie Chart

Histogram Frequency Polygon

Stem and Leaf Plot

1

Page 8: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

A characteristic under study that assumes different values for different elements. (e.g Height of persons,

no. of students in GCU )

Qualitative orCategorical variable

Quantitative Variable

A variable that can not assumea numerical value but can beclassified into two or more non numeric categories iscalled qualitative or categoricalvariable.

A variable that can be measured numerically is called quantitativevariable.

Educational achievements Marital status Brand of PC

Discretevariable

Continuous variable

1

Variable

Go to Descriptive Statistics

Page 9: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

A variable whose observations can assume anyvalue within a specific range.

Amount of income tax paid. Weight of a student. Yearly rainfall in Murree.Time elapsed in successive network breakdown.

1

Continuous variable

Back

Page 10: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Variable that can assume only certain values, and there are gaps between the values.

Children in a family Strokes on a golf hole TV set owned Cars arriving at GCU in an hour Students in each section of statistics course

1

Discrete variable

Back

Page 11: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Consists of methods, that use sample results to helpmake decisions or predictions about population.

1

InferentialStatistics

Page 12: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

1. A portion of population selected for study.2. A sub set of Data selected from a population.

Estimation Testing ofHypothesis

PointEstimation

IntervalEstimation

Selecting a Sample1

Sample

Go to Inferential Statistics

Page 13: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

1. Consists of all-individual items or objects-whose characteristics are being studied.2. Collection of Data that describe some phenomenon of interest.

ExamplesFinite Population Infinite Population

Length of fish in particular lake. No. of students of Statistics course in BCS. No. of traffic violations on some specific holiday.

Depth of a lake from any conceived position. Length of life of certain brand of light bulb. Stars on sky.

Population

1Go to Inferential Statistics

Page 14: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Examples

InferentialDescriptive

1. At least 5% of all fires reported last year in Lahore were deliberately set.2. Next to colonial homes, more residents in specified locality prefer a contemporary design.

1. As a result of recent poll, most Pakistanis are in favor of independent and powerful parliament.2. As a result of recent cutbacks by the oil-producing nations, we can expect the price of gasoline to double in the next year.

Descriptive and Inferential Statistics

1

Page 15: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Data can be classified according to level of measurement. The level of measurement dictates the calculations that can be done to summarize and present the data. It also determines the statistical tests that should be performed.

Data may only beclassified

Data are ranked nomeaningful differencebetween values

Meaningfuldifferencebetween values.

Level of measurement

Nominal Ordinal Interval Ratio

Meaningful 0 pointand ratio betweenvalues.

Jersey numbers of football player. Make of car.

Your rank in class. Team standings.

Temperature Dress size

No. of patients seen No of sales call made Distance students travel to class

1Types of Data

Page 16: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Bar Chart Pie Chart

A graph in which the classes are reported on the horizontal axis and the class frequencies onvertical axis. The class frequenciesare proportional to the heights ofthe bars.

A chart that shows the proportion or percent that each class representsof the total number of frequencies.

Orange35%

Red22%

Lime25%

White10%

Black8%

Covers for Cell phones

0200400600

Brigh

twh

ite

Mag

netic

lime

Fusio

nre

d

Cover Color(variable of interest)

No. o

f Co

vers

(Cla

ss

Freq

uenc

y)

Angle = (f/n)3603601300

79286Red

126455Orange

90325Lime

29104Black

36130White

f Angle

n =

1

Diagrams/Charts

Back

Page 17: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

HistogramFrequency

PolygonCumulative Frequency

Polygon

1

Graphs

Go to Descriptive Statistics

Page 18: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Summary Measures

Arithmetic Mean Weighted Arithmetic Mean Geometric Mean Median Mode

Range, Mean Deviation Variance, Standard Deviation

GoalsMeasures of

LocationMeasures ofDispersion

1

Describing the Data

Moments

Moments about Origin Moments about mean

Skewness

Page 19: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Calculate the arithmetic mean, weighted mean, median, mode, and geometric mean. Explain the characteristics, uses, advantages, and disadvantages of each measure of location. Identify the position of the mean, median, and mode for both symmetric and skewed distributions.

Goals

Compute and interpret the range, mean deviation, variance, and standard deviation. Understand the characteristics, uses, advantages, and disadvantages of each measure of dispersion.

Understand Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations. 1

Summary Measures

Page 20: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Characteristics of the Mean

The arithmetic mean is the most widely used measure of location. It requires the interval scale.

Its major characteristics are: All values are used. It is unique. The sum of the deviations

from the mean is 0. It is calculated by

summing the values and dividing by the number of values.

Every set of interval-level and ratio-level data has a mean.

All the values are included in computing the mean.

A set of data has a unique mean. The mean is affected by unusually

large or small data values. The arithmetic mean is the only

measure of central tendency where the sum of the deviations of each value from the mean is zero.

1

Page 21: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Use of Tables of Random Numbers

Random numbers are the randomly produced digits from 0 to 9. Table of random numbers contain rows and columns of these randomly produced digits. In using Table, choose: the starting point at random read off the digits in groups containing either one, two, three, or more of the digits in any predetermined direction (rows or columns).

Example

Choose a sample of size 7 from a group of 80 objects. Label the objects 01, 02, 03, …, 80 in any order. Arbitrarily enter the Table on any line and read out the pair of digits in any two consecutive columns. Ignore numbers which recur and those greater than 80.

1

Selecting a Sample

Go to Sample

Page 22: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Step 1 Step 2

How many no. of groups (classes)?

Just enough classes to reveal the shape of the distribution.

Let k be the desired no. of classes.

k should be such that 2k > n.

If n = 80 and we choose k = 6,

then 26 = 64 which is < 80, so k = 6 is not desirable. If we take k = 7, then 27 = 128, which is > 80, so no. of classes should be 7.

Determine the class interval (width).

the class interval should be the same for all classes.

The formula to determine class width:

where i is the class width, H is the highest observed value, L is the lowest observed value, and k is the number of classes.

Next

H Li

k

1

Construction of Frequency Distribution

Page 23: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Set the individual class limits. Class limits should be very clear. Class limits should not be overlapping. Some time class width is rounded which may increase the range H-L. Make the lower limit of the first class a multiple of class width.

Make tally of observations falling in each class.

Step 3 Step 4

Step 5

Count the number of items in each class (class frequency)

Example1

Back

Construction of Frequency Distribution(continued)

Page 24: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Raw Data( Ungrouped Data )

2044519251266132281725449245712237421740

2361317266274433592527896262852284520962

1889029237155463227719331217222144220356

2063320642358512365718263157942579924052

1796832492274532453326661257832376520203

2198119766208182867019688201551735720004

2089517399283372316928034252772525119873

1988920642290762665124609243242428520047

1593524296216392155819587308722868318021

1789122442306552422023591204542337223197

Continued 1Back

Construction of Frequency Distribution( Example )

Page 25: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Following Step 1, with n = 80 k should be 7. Following Step 2 the class width should be 2911. The width size is usually rounded up to a number multiple of 10 or 100. The width size is taken as i = 3000. Following Step 3, with i = 3000 and k = 7, the range is 7×3000=21000. Where as the actual range is H – L = 35925 - 15546 = 20379. The lower limit of the first class should be a multiple of class width. Thus the lower limit of starting class is taken as 15000.

Following Step 4and Step 5

Total = 80

233000 up to 36000

430000 up to 33000

827000 up to 30000

1824000 up to 27000

1721000 up to 24000

2318000 up to 21000

815000 up to 18000

FrequencySelling Price

1Back

Construction of Frequency Distribution( Example Continued )

Page 26: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other.

2405.14.6 - 5.2

6384.54.0 - 4.6

13323.93.4 – 4.0

13193.32.8 - 3.4

462.72.2 – 2.8

222.11.6 - 2.2

f cf HGroup Histogram (Example 1)

0

5

10

15

20

25

30

35

1.60 2.20 2.80 3.40 4.00 4.60 5.20

Groups

Example 1 k = 6

Next 1

Histogram

Page 27: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

24054.5 - 5.0

6384.54.0 - 4.5

83243.5- 4.0

15243.53.0 - 3.5

5932.5- 3.0

242.52.0 - 2.5

2221.5 - 2.0

fcf HGroup Histogram (Example 1)

0

10

20

30

40

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Groups

Per

cen

t

Example 1 k = 7

1

Histogram

Back

Page 28: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

A graph in which the points formed by the intersections of the classmidpoints and the class frequencies are connected by line segments.

BackBackMid point = ( Li +Hi )/2 1

FrequencyPolygon

2404.94.6 - 5.2

6384.34.0 - 4.6

13323.73.4 – 4.0

13193.12.8 - 3.4

462.52.2 - 2.8

221.91.6 - 2.2

fcfMid ptGroup

Example 1 k = 6Frequency Polygon (Example 1)

1.90

2.50

3.10 3.70

4.30

4.90

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

1 2 2 3 3 4 5

Raw Data

Per

cen

t

Page 29: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

3404.754.5 – 5.0

5374.254.0 - 4.5

10323.753.5 – 4.0

15223.253.0 - 3.5

472.752.5 – 3.0

132.252.0 - 2.5

221.751.5 – 2.0

f

cf Mid ptGroup

Example 1 k = 7

1Back

Frequency PolygonContinued

Frequency Polygon (Example 1)

1.75 2.25

2.75

3.25

3.75

4.25

4.75

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

1 2 3 4

Data Example1

Per

cent

Page 30: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

A graph in which the points formed by the intersections of the classmidpoints and the class cumulative frequencies are connected by line segments.

A cumulative frequency polygon portrays the number or percent of observations below given value.

Next

2404.94.6 - 5.2

6384.34.0 - 4.6

13323.73.4 – 4.0

13193.12.8 - 3.4

462.52.2 - 2.8

221.91.6 - 2.2

f cf Mid ptGroupExample 1 k = 6 Ogive Example 1

2.20 2.80

3.40

4.00

4.60 5.20

0.0

25.0

50.0

75.0

100.0

1 2 2 3 3 4 5

Data Example 1

Cum

ula

tive P

ercent

1

Cumulative FrequencyPolygon

Page 31: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

3404.754.5 – 5.0

5374.254.0 - 4.5

10323.753.5 – 4.0

15223.253.0 - 3.5

472.752.5 – 3.0

132.252.0 - 2.5

221.751.5 – 2.0

fcfMid ptGroup

Cumulative Frequency PolygonContinued

Example 1 K = 7

Ogive Example 1

2.00 2.50 3.00

3.50

4.00

4.50 5.00

0.0

25.0

50.0

75.0

100.0

1 2 3 4

Data Example 1C

umul

ativ

e P

erce

nt

1Back

Page 32: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

1

A Stem and Leaf Plot is a type of graph that is similar to a histogram but shows more

information. Summarizes the shape of a set

of data. provides extra detail regarding

individual values. The data is arranged by placed

value. Stem and Leaf Plots are great

organizers for large amounts

of information.

The digits in the largest place are referred to as the stem.

The digits in the smallest place are referred to as the leaf

The leaves are always displayed to the left of the stem.

Series of scores on sports teams, series of temperatures or rainfall over a period of time, series of classroom test scores are examples of when Stem and Leaf Plots could be used.

What is A Stem and Leaf Plot Diagram? What Are They Used For?

ConstructingStem and Leaf Plot

Stem and Leaf Plot

Page 33: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

1

Make Stem and Leaf Plot with the following temperatures for June. 77 80 82 68 65 59 61 57 50 62 61 70 69 64 67 70 62 65 65 73 76 87 80 82 83 79 79 71 80 77

Stem (Tens) and Leaf (Ones)

Begin with the lowest temperature.

The lowest temperature of the month was 50.

Enter the 5 in the tens column and a 0 in the ones.

The next lowest is 57. Enter a 7 in the ones Next is 59, enter a 9 in the

ones. find all of the temperatures that

were in the 60's, 70's and 80's. Enter the rest of the

temperatures sequentially until your Stem and Leaf Plot contains all of the data.

0 0 0 2 2 3 78

0 0 1 3 6 7 7 9 97

1 1 2 2 4 5 5 5 7 8 9 6

0 7 95

Leaf (Ones)Stem (Tens)

Temperature

Next

ConstructingStem and Leaf Plot

Page 34: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Stem and LeafExample

Make a Stem and Leaf Plot for the

following data.

11.71.22.12.51.2

1.92.00.26.35.3

2.05.91.13.91.7

1.43.52.80.42.7

1.82.61.32.41.8

4.31.52.32.10.4

2.52.33.40.94.6

0.33.11.83.53.2

2.13.72.62.91.6

1.32.83.90.72.4Freq Stem Leaf

6 0 2 3 4 4 7 9

14 1 1 2 2 3 3 4 5 6 7 7 8 8 8 9

17 2 0 0 1 1 1 3 3 4 4 5 5 6 6 7 8 8 9

8 3 1 2 4 5 5 7 9 9

2 4 3 6

2 5 3 9

1 6 3

50 Next Back

Page 35: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Stem and Leaf PlotExample

Following are the car battery life

Data.

Make a Stem and Leaf Plot.

2.2 4.1 3.5 4.5 3.2 3.7 3 2.6

3.1 1.6 3.1 3.3 3.8 3.1 4.7 3.7

2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1

3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4

4.7 3.8 3.2 2.6 3.9 3 4.2 3.5

f S L

2 1 6 9

5 2 2 5 6 6

9

25 3 0 0 1 1 1 1 1 2 2 2 3 3 3 4 4 5 5 6 7 7 7 8 8

9 9

8 4 1 1 2 3 4 5 7 7

40

1Next Back

Page 36: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Stem and Leaf PlotExample

Frequen

cy

Stem

Leaf

2 1 6 9

1 2 2

4 2 5 6 6 9

15 3 0 0 1 1 1 1 1 2 2 2 3 3 3 4 4

10 3 5 5 6 7 7 7 8 8 9 9

5 4 1 1 2 3 4

3 4 5 7 7

40

1Go to Stem and Leaf Plot

Back

Page 37: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Arithmetic Mean

N observationsX1, X2,…, XN inthe population.

1 1 2 ...

N

ii N

XX X X

N N

n observationsX1, X2 ,…, Xn inthe sample

1

n

ii

XX

n

Let Xi and fi be the midpoint and frequencyrespectively of the ithgroup in the populationThe mean is defined as

1

1

N

i iiN

ii

f X

f

Ungrouped Data Grouped Data

Population Sample Population Sample

Let Xi and fi be the midpoint and frequencyrespectively of the ithgroup in the sampleThe mean is defined as

1

1

n

i iin

ii

f XX

f

Next

Point ofEquilibrium

1

Measures of Location

Page 38: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Example of Sample Mean

Following is a random sample of

12 Clients showing the number of

minutes used by clients in a

particular cell phone last month.

What is the mean number of

Minutes Used?

Example of Population Mean

There are automobile manufacturing

Companies in the U.S.A. Listed below

is the no. of patents granted by the US

Government to each company.

Is this information a sample or population?

90 110 89 113

91 94 100 112

77 92 119 83

90 91 77 ... 83 117097.5

12 12

XX

n

Number of Number of

Company Patent Granted Company Patent Granted

General Motors 511 Mazda 210

Nissan 385 Chrysler 97

DaimlerChrysler 275 Porsche 50

Toyota 257 Mistubishi 36

Honda 249 Volvo 23

Ford 234 BMW 13

511 385 ... 13 2340195

12 12

X

N

Next1 Back

Numerical Examples Of Arithmetic MeanUngrouped Data

Page 39: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Numerical Examples Of Arithmetic MeanGrouped Data

Following is the frequency distribution of Selling Prices of Vehicles at

Whitner Autoplex Last month.

Find arithmetic mean.

So the mean vehicle selling price is $23100.

184523.1

80

fXX

f

1845.080Total

69.034.5233 - 36

126.031.5430 - 33

228.028.5827 - 30

459.025.51824 - 27

382.522.51721 - 24

448.519.52318 - 21

132.016.5815 - 18

fXXf($ thousands)

MidpointFrequencySelling Price

Go to Summary measures

1Back

Page 40: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

X

3X2X 4X1X 5X 6X

3f2f1f 4f 5f 6f

1

An object is balanced at whenX

1 1 2 2 3 3 4 4 5 5 6 6

1 1 2 2 3 3 1 2 3 4 5 6 4 4 5 5 6 6

1 1 2 2 3 3 4 4 5 5 6 6 1 2 3 4 5 6

1 1 2 2 3 3 4 4 5 5 6 6

1 2 3 4

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

( )

X X f X X f X X f X X f X X f X X f

f X f X f X f f f X f f f X f X f X f X

f X f X f X f X f X f X f f f f f f X

f X f X f X f X f X f XX

f f f f

5 6

6

16

1

i ii

ii

f f

f X

f

Back

Point ofEquilibrium

Page 41: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

1EXAMPLE

Weighted Mean

Summary Measures

A special case of arithmetic mean. Case when values of variable are associated with certain quality,

e.g price of medium, large, and big

The weight mean of a set of numbers

X1, X2, ..., Xn, with corresponding

weights w1, w2, ...,wn, is computed

from the following formula:

1 1 2 2

1 2

1

1

...

...n n

wn

n

i iin

ii

w X w X w XX

w w w

w X

w

3$1.50 Big

4$1.25 Large

3$0.90 Medium

WeightsPriceSoft Drink

Weighted Mean

Page 42: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

The Carter Construction Company pays its hourly employees

$16.50, $19.00, or $25.00 per hour. There are 26 hourly employees,

14 of which are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the

$25.00 rate. What is the mean hourly rate paid the 26 employees?

EXAMPLE Weighted Mean

1Go to

Summary measuresBack

Page 43: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Summary Measures

The geometric mean of a set of n positive numbers is defined as the nth root of the product of n values. The formula for the geometric mean is written:

The geometric mean used as the average percent increase over timen is calculated as:

Useful in finding the average change of percentages, ratios, indexes, or growth rates over time.

It has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the GDP, which compound or build on each other.

The geometric mean will always be less than or equal to the arithmetic mean.

1

1 2( ... ) nnGM X X X

Example1

Geometric Mean

nValue at the end of period

GMValue at the start of period

Page 44: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Example of Geometric Mean

The return on investment by certainCompany for four successive years was 30%, 20%, -40%, and 200%. Find the geometric mean rate of return on investment.Solution: The 1.3 represents the 30 percentreturn on investment, i.e original Investment of 1.0 plus the return of0.3. So

Which shows that the average return is 29.4 percent.

If you earned $30000 in 1997 and $50000 in 2007, what is your annual rate ofincrease over the period?

The annual rate of increase is 5.24 percent.

Summary Measures 1

4 (1.3)(1.2)(0.6)(3.0) 1.294GM

nValue at the end of period

GMValue at the start of period

500001 0.0524

30000nGM

Back

Page 45: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Median

If number of observations n is odd,the median is( n+1)/2th observation. If n is even the median is the average of n/2th and (n/2+1)th observationsExample:Determine the median for each set ofdata.

Arrange the set of data

1) n=7 median is 4th observation that is 33.

2) n=6, median is average of 3rd and 4th observation, that is (27+28)/2= 27.5.Median for Grouped DataThe median is obtained by using theformula:

Where m is the group of n/2th obs.

Lm, Im, fm, and cfm-1 are the lowest value, class width, frequency, andcumulative frequency respectively ofthe mth group.

Median is the midpoint of the values after they have been ordered fromthe smallest to the largest, or thelargest to the smallest

(1) 41 15 39 54 31 15 33(2) 15 16 27 28 41 42

(1) 15 15 31 33 39 41 54(2) 15 16 27 28 41 42

1( )2

mm m

m

I nX L cf

f

Example1

Page 46: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Example (Median)

Find the Median for the following

data.

n/2 = 20, so median group is 3.40-4.00

Lm = 3.40, Im = 0.6, fm = 13, cfm-1 = 19

Go to Summary Measures

  Example 1  

L H f cf

1.60 < 2.20 2 2

2.20 < 2.80 4 6

2.80 < 3.40 13 19

3.40 < 4.00 13 32

4.00 < 4.60 6 38

4.60 < 5.20 2 40

0.63.40 (20 19) 3.45 3.5

13X

Back

1

Page 47: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Properties of the Median

There is a unique median for each data set. It is not affected by extremely large or small

values and is therefore a valuable measure of central tendency when such values occur.

It can be computed for ratio-level, interval-level, and ordinal-level data.

It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class.

Go to Summary Measures 1

Page 48: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Mode

The mode is the value of the

observation that appears most

frequently.

0100200300400500600700800900

NewEngland

Middle

Atlantic

E.N.Central

W.N.Central

S.Atlantic

E.S.Central

W.S.Central

Mountain

Pacific

Regions

No. o

f Sen

iors

Region No. of Seniors

New England 524

Middle Atlantic 818

E.N.Central 815

W.N.Central 367

S.Atlantic 679

E.S.Central 196

W.S.Central 436

Mountain 346

Pacific 783

ModeNext 1

Page 49: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Mode(Example)

Next1

Back

Page 50: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

ModeGrouped Data

Calculating Mode for Grouped Data.

Calculate the mode of the following

Distribution.

Solution:

Modal Group is 2.8 - 3.4

fm = 14, fm-1 = 4, fm+1 = 12 and Im= 0.6

Group f

1.6 - 2.2 2

2.2 - 2.8 4

2.8 - 3.4 14

3.4 - 4.0 12

4.0 - 4.6 6

4.6 - 5.1 2

1Go to Summary Measures

1

1 1( ) ( )m m

m mm m m m

f fMode L I

f f f f

1

1 1( ) ( )

14 42.8 0.6

(14 4) (14 12)

3.3

m mm m

m m m m

f fMode L I

f f f f

Back

Page 51: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

The Relative Positions of the Mean, Median and the Mode

Go to Summary Measures 1

Page 52: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Dispersion

Why Study Dispersion? A measure of location, such as

the mean or the median, only describes the center of the data. It is valuable from that standpoint, but it does not tell us anything about the spread of the data.

For example, if your nature guide told you that the river ahead averaged 3 feet in depth, would you want to wade across on foot without additional information? Probably not. You would want to know something about the variation in the depth.

A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions.

Studying dispersion through display.

1Next

Page 53: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Range and Mean Deviation

Range

Mean Deviation

Example

The number of cappuccinos sold at

the Starbucks location in the Orange

Country Airport between 4 and 7p.m.

for a sample of 5 days last year were

20, 40, 50, 60, and 80. Determine the

mean deviation for the number of

cappuccinos sold.

Range = Largest value – Smallest value

1.

n

ii

X XM D

n

Range = Largest – Smallest value = 80 – 20 = 60

Next

1Back

Page 54: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Mean DeviationExample

Example

The number of cappuccinos sold at he Starbucks location in the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50,60, and 80. Determine the mean deviation for the number of cappuccinos sold.

Solution

Number of Cappuccinos Absolute Deviation

Sold Daily ( X )

2020 - 50 = -30 30

4040 - 50 = -10 10

50 50 - 50 = 0 0

60 60 - 50 = 10 10

80 80 - 50 = 30 30

Total 80

X X X X

1 80. 16

5

n

ii

X XM D

n

Next 1Back

Page 55: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Mean Deviation (Grouped Data)

Mean Deviation for Grouped Data

80Total

34.5233 - 36

31.5430 - 33

28.5827 - 30

25.51824 - 27

22.51721 - 24

19.52318 - 21

16.5815 - 18

Xf($ thousands)

FrequencySelling Price

184523.1

80

fXX

f

Go to Summary Measures 1

1

1

k

i ii

k

ii

f X XMD

f

288.6Total

22.811.434.52

33.68.431.54

43.25.428.58

43.22.425.518

10.2-0.622.517

82.8-3.619.523

52.8-6.616.58

Xf

1

1

288.63.61

80

k

i ii

k

ii

f X XMD

f

X X f X X

Back

Page 56: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Variance andStandard Deviation

Population variance and standarddeviation.Let X1, X2,…, XN be N observations in the population.The variance is defined as:

The standard deviation is defined as:

The sample variance and Standard deviation.Let X1, X2,…, Xn be n observations in the sample.The variance is defined as:

The standard deviation is defined as:

2

2 1

( )N

ii

X

N

2

1

( )N

ii

X

N

2

2 1

( )

1

n

ii

X Xs

n

2

1

( )

1

n

ii

X Xs

n

Next 1

Page 57: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

ExampleVariance and standard deviation

The number of traffic citations issued

during the last five months in

Beaufort County, South Carolina, is

38, 26, 13, 41, and 22. What is the

population variance?

The hourly wages for a sample of

part-time employees at Home Depot

are: $12, $20, $16, $18, and $19.

What is the sample variance?

Hourly Wage

$ ( X )

12 -5 25

20 3 9

16 -1 1

18 1 1

19 2 4

85 0 40

X X 2( )X X

2

2 1

( )

140

10.04

n

ii

X Xs

n

8517.0

5X

Next 2Back

Page 58: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

ExampleGrouped Data

The sample standard deviation is defined as:

Example:

For the following frequency distribution of prices of vehicle, compute the

standard deviation of the prices.

2( )

( ) 1

f X Xs

f

Next 2Back

Page 59: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Example (continued)

Alternate method of computing variance is:

Example

Group Mid pt (X) f fX fX2

1.5- 2.0 1.75 2 3.5 6.125

2.0 - 2.5 2.25 2 4.5 10.13

2.5 - 3.0 2.75 5 13.75 37.81

3.0 - 3.5 3.25 15 48.75 158.4

3.5 - 4.0 3.75 8 30 112.5

4.0 - 4.5 4.25 6 25.5 108.4

4.5 - 5.0 4.75 2 9.5 45.13

Total 40 135.5 478.5

22 2 ( )1

( )1

fXs fX

n n

22 1 (135.5)

(478.5 ) 0.540 1 40

s

Go to Measures of Dispersion 2Back

Page 60: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Moments

Moments about Origin

The rth moment about origin ‘a’ is

defined as:

Moments about Mean

The rth moment about mean is

defined as:

First moment about mean is Zero.

Moments of Grouped Data

The rth moment about origin ‘a’ is

defined as:

The rth moment about mean is

defined as:

First moment about mean is Zero.

( )r

r

X am

n

( )r

r

X Xm

n

( )r

r

f X am

f

( )r

r

f X Xm

f

Next 2

Page 61: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Example of Moments

Moments about Mean.Group

Mid pt (X) f fX

1.5- 2.0 1.75 2 3.5

2.0 - 2.5 2.25 2 4.5

2.5 - 3.0 2.75 5 13.75

3.0 - 3.5 3.25 15 48.75

3.5 - 4.0 3.75 8 30

4.0 - 4.5 4.25 6 25.5

4.5 - 5.0 4.75 2 9.5

Total 40 135.5

5.445 -8.98425 14.824013

2.645 -3.04175 3.4980125

2.1125 -1.373125 0.8925313

0.3375 -0.050625 0.0075937

0.98 0.343 0.12005

4.335 3.68475 3.1320375

3.645 4.92075 6.6430125

19.5 -4.50125 29.11725

135.53.4

40

fXX

f

( )r

r

f X Xm

f

2

19.50.5

40m

3

-4.50125 0.1125

40m 4

29.11725 0.7279

40m

Next 2Back

2( )f X X 3( )f X X 4( )f X X

Page 62: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Example of Moments(Continued)

Example

-1.97 -9.84 19.37 -38.11 75.00

-1.17 -10.51 12.28 -14.34 16.75

-0.37 -5.52 2.03 -0.75 0.28

0.43 4.32 1.87 0.81 0.35

1.23 7.39 9.11 11.22 13.82

2.03 4.06 8.26 16.78 34.10

2.83 2.83 8.02 22.71 64.32

3.63 7.26 26.38 95.82 348.03

0 87.31 94.14 552.65

Class f X fX

0.0-0.8 5 0.4 2

0.8-1.6 9 1.2 10.8

1.6-2.4 15 2 30

2.4-3.2 10 2.8 28

3.2-4.0 6 3.6 21.6

4.0-4.8 2 4.4 8.8

4.8-5.6 1 5.2 5.2

5.6-6.4 2 6 12

Total 50 118.4118.4

2.3750

fX

fX

X X ( )f X X 2( )f X X 3( )f X X 4( )f X X

2

2

( ) 87.311.75

50

f X X

fm

3

3

( ) 94.141.88

50

f X X

fm

4

4

( ) 552.6511.05

50

fm

X X

f

Go to Dispersion 2Back

Page 63: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Skewness

Mean, median and mode are measures of central location for a set of observations and measures of data dispersion are range and the standard deviation.

Another characteristic of a set of data is the shape.

There are four shapes commonly observed: symmetric, positively skewed, negatively skewed, Bimodal

The coefficient of skewness can range from -3 up to 3. A value near -3, such as -

2.57, indicates considerable negative skewness.

A value such as 1.63 indicates moderate positive skewness.

A value of 0, which will occur when the mean and median are equal, indicates the distribution is symmetrical and that there is no skewness present.

Next

2

Page 64: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Types of Skewness

2Next Back

Page 65: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Coefficient ofSkewness

The Pearson coefficient of skewness is defined as:

Example Following are the earnings per share for

a sample of 15 software companies for the year 2005. The earnings per share are arranged from smallest to largest.

Compute the mean, median, and standard deviation. Find the coefficient of skewness using Pearson’s estimate. What is your conclusion regarding the shape of the distribution?

Solution

The shape is moderately positively skewed.

2

3( )X Xsk

s

2

2 2

$74.26$4.95

15

1

($0.09 $4.95) ... ($16.40 $4.95) )

15 1$5.22

3( )

3($4.95 $3.18)1.017

$5.22

XX

n

X Xs

n

X Mediansk

s

Next Back

Page 66: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Example of Skewness(Continued)

ExampleClass

0.0-0.8 5 5 2 20.65

0.8-1.6 8 13 9.6 12.14

1.6-2.4 14 27 28 2.61

2.4-3.2 11 38 30.8 1.49

3.2-4.0 7 45 25.2 9.55

4.0-4.8 2 47 8.8 7.75

4.8-5.6 1 48 5.2 7.66

5.6-6.4 2 50 12 25.46

Total 50 118.4 87.31

2( )f X X

121.62.43

50

fX

fX

2( ) 87.31

1.33481 49

f X Xs

n

1( )2

0.8 501.6 ( 13) 2.29

14 2

mm m

m

XI n

L cff

3( ) 3(2.43 2.29)0.3147

1.3348

X Xs

sk

The shape is slightly positively skewed

fXcff

Go to Skewness

2

Back

The skewness can also be measuredwith moments as: m2

= 1.75, m3 = 62 b = 0.492

23

32

mb

m

Next

Page 67: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

ExampleSkewness

Histogram

0

5

10

15

20

25

30

0.00 0.80 1.60 2.40 3.20 4.00 4.80 5.60 6.40

Data

Per

cent

MeanMode MedianGo to Skewness

2Back Next

Page 68: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Empirical RuleEmpirical RuleFor a symmetrical, bell-shaped frequency distribution: Approximately 68% of the observations will lie within plus and minus one standard

deviations of the mean. ( mean ±s.d ) About 95% of the observations will lie within plus and minus two standard deviations

of the mean. ( mean ± 2s.d ) Practically all (99.7%) wiill lie within plus and minus three standard deviations of the

mean. ( mean ± 3s.d ) Let the mean of a symmetric distribution be 100 and standard deviation be 10, then

the empirical rule is as follows:

70 80 90 100 110 120 130

68%

95%

99.7% 2

Go to SkewnessBack

Next

Page 69: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

ExampleEmpirical Rule

Consider the following distribution:

Check the empirical rule.

Mean = 3.2 s.d = 0.75Mean ± sd = ( 2.45 – 3.95 ) ( 67.5%)Mean ± 2sd = ( 1.7 – 4.7 ) ( 97.5%)Mean ± 3sd = ( 0.89 – 5.45 ) (100%)

Mean = 3.25 sd = 0.77Mean ± sd = ( 2.48 – 4.05) ( 67.5%)Mean ± 2sd = ( 1.71 – 4.79 ) ( 97.5%)Mean ± 3sd = ( 0.94 – 5.56 ) ( 100%)

1.6 2.5 3 3.4 3.8

1.8 2.6 3.2 3.5 4.1

2 2.6 3.2 3.6 4.1

2.3 2.6 3.2 3.6 4.2

2.3 2.8 3.3 3.6 4.3

2.3 2.8 3.3 3.7 4.3

2.4 2.9 3.4 3.7 4.5

2.5 3 3.4 3.8 4.6

Group f X fX fX^2

1.5- 2.0 2 1.75 3.5 6.13

2.0 - 2.5 5 2.25 11.3 25.3

2.5 - 3.0 8 2.75 22 60.5

3.0 - 3.5 10 3.25 32.5 106

3.5 - 4.0 8 3.75 30 113

4.0 - 4.5 5 4.25 21.3 90.3

4.5 - 5.0 2 4.75 9.5 45.1

40 130 446

Back 2Next

Page 70: Descriptive Statistics Prepared By Masood Amjad Khan GCU, Lahore

Exercise

For the following data of examination

marks find the Mean, Median, Mode,

Mean Deviation and variance. Also

find the Skewness.

The following is the distribution of

Wages per thousand employees in a

Certain factory.

Marks30 – 3940 – 4950 – 5960 – 6970 – 7980 – 8990 - 99

No. of students8

871903042118520

Daily Wages

222426283032343638404244

No. of Employees

31343

102175220204139692561

Calculate the Modal

and Medianwages. Why isdifference b/w

the two.

Back

3