chap 3-1 statistics for business and economics topic 3 describing data: numerical figures won´t...

45
Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Upload: jodie-mclaughlin

Post on 11-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Chap 3-1Statistics for Business and Economics

Topic 3

Describing Data: Numerical

Figures won´t lie, but liars will figure

Page 2: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Why descriptive statistics is important to managers?

Managers also need to become acquainted with numerical descriptive measures that provide very brief and easy-to understand summaries of a data collection

There are two broad categories into which these measures fail: measures of central tendency and measures of variability

Page 3: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-3

After completing this topic, you should be able to compute and interpret the:

Measures of central tendency, variation, and shape (Arithmetic) Mean, median, mode, geometric mean Quartiles Five number summary and box-and-whisker plots Range, interquartile range, variance and standard

deviation, coefficient of variation Symmetric and skewed distributions Flat and peaked distribution

Topic Goals

Page 4: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-4

Describing Data Numerically

Arithmetic Mean

Median

Mode

Describing Data Numerically

Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile Range

Central Tendency Variation

Geometric Mean

Page 5: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-5

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

xx

n

1ii

Overview

Midpoint of ranked values

Most frequently observed value

Arithmetic average

Page 6: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-6

Arithmetic Mean

The arithmetic mean (mean) is the most common measure of central tendency

For a population of N values:

For a sample of size n:

Sample size

n n

i 1 i 1x (simple) or x (weighted)i i ix x n

n n

Observed

values

N N

ii 1 i 1μ (simple) or μ = (weighted)

i ix x n

N N

Population size

Population values

Page 7: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Arithmetic mean

The arithmetic mean of a collection of numerical values is the sum of these values divided by the number of values. The symbol for the population mean is the Greek letter μ (mu), and the symbol for a sample mean is X (X-bar)

A population parameter is any measurable characteristic of a population.

A sample statistic is any measurable characteristic of a sample.

Page 8: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Characteristics of the arithmetic mean

l. Every data set measured on an interval or ratio level has a mean.

2. The mean has valuable mathematicaI properties that make it convenient to use in further computations.

3. The mean is sensitive to extreme values. 4. The sum of the deviations of the numbers in a

data set from the mean is zero 5. The sum of the squared deviations of the

numbers in a data set from the mean is a minimum value.

Page 9: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-9

Geometric Mean

The geometric mean is the most common measure of central tendency for rates (growth rates, interest rates, etc.)

For N values:

NNN21

N

1iigeo xxxxμ

Page 10: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-10

Arithmetic Mean

The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) !!!

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321

4

5

20

5

104321

Page 11: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-11

Median

The numerical value in the middle when data set is arranged in order (50% above, 50% below)

Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Page 12: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-12

Finding the Median

The location of the median:

If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of

the two middle numbers

Note that is not the value of the median, only the

position of the median in the ranked data

dataorderedtheinposition2

1npositionMedian

2

1n

Page 13: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-13

Mode

A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be no mode There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Page 14: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-14

Review Example:Summary Statistics

Mean: (€3,000,000/5)

= €600,000

Median: middle value of ranked data = €300,000

Mode: most frequent value = €100,000

House Prices:

€2,000,000 500,000 300,000 100,000 100,000

Sum 3,000,000

Page 15: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-15

Mean is generally used, unless extreme values (outliers) exist

Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be

reported for a region – less sensitive to outliers

Which measure of location is the “best”?

Page 16: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-16

Same center, different variation

Measures of Variability

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

Measures of variation give information on the spread or variability of the data values.

Page 17: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-17

Range

Simplest measure of variation Difference between the largest and the smallest

observations:

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Page 18: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-18

Ignores the way in which data are distributed

Sensitive to outliers

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Page 19: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-19

Quartiles

Quartiles split the ranked data into 4 segments with an equal number of values per segment

25% 25% 25% 25%

The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger

Q2 is the same as the median (50% are smaller, 50% are larger)

Only 25% of the observations are greater than the third quartile

Q1 Q2 Q3

Page 20: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-20

Quartile Formulas

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1) (the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

Page 21: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-21

(n = 9)

Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data

so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Quartiles

Sample Ranked Data: 11 12 13 16 16 17 18 21 22

Example: Find the first quartile

Page 22: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-22

Interquartile Range

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

Page 23: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-23

Interquartile RangeFive number summary –Box plot

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Page 24: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-24

Average of squared deviations of values from the mean Population variance:

Population Variance

N N2 2

i i2 2i 1 i 1

(x μ) (x μ)σ (simple) or σ (weighted)

N N

in

Where = population mean

N = population size

xi = ith value of the variable x

ni = absolute frequency

μ

Page 25: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-25

Average (approximately) of squared deviations of values from the mean Sample variance:

Sample Variance

n n2 2

i i2 2i 1 i 1

(x x) (x x)s (simple) or s (weighted)

n-1 n-1

in

Where = arithmetic mean

n = sample size

xi = ith value of the variable x

ni = absolute frequency

x

Page 26: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-26

Population Standard Deviation

The square root of population variance Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Population standard deviation:

N N2 2

i ii 1 i 1

(x μ) (x μ)σ (simple) or σ (weighted)

N N

in

Page 27: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-27

Sample Standard Deviation

The square root of the sample variance Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Sample standard deviation:n n

2 2i i

i 1 i 1

(x x) (x x) (simple) or (weighted)

n-1 n-1

ins s

Page 28: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-28

Calculation Example:Sample Standard Deviation

Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

A measure of the “average” scatter around the mean

Page 29: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-29

Measuring variation

Small standard deviation

Large standard deviation

Page 30: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-30

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = 0.926

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.570

Data C

Page 31: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-31

Advantages of Variance and Standard Deviation

Each value in the data set is used in the calculation

Values far from the mean are given extra weight (because deviations from the mean are squared)

Page 32: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-32

Coefficient of Variation

Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of

data measured in different units

scv 100%

x

Page 33: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-33

Comparing Coefficient of Variation

Stock A: Average price last year = $50 Standard deviation = $5

Stock B: Average price last year = $100 Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

A

s $5100% 100% 10%

x $50cv

B

s $5100% 100% 5%

x $100cv

Page 34: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-34

Skewness

Shows asymmetry and refers to the shape of a distribution

Can take on positive, negative or zero values

n n3 3

i ii 1 i 1

1 13 3

(x x) (x x) (simple) or (weighted)

in

s n s n

Page 35: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Distribution Shape

The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the center; coefficient of skewness equal zero

0123456789

10

1 2 3 4 5 6 7 8 9

Fre

qu

en

cy

Symmetric Distribution

Chap 3-35Statistics for Business and Economics

When the distribution is unimodal, the mean, median, and mode are all equal to one another and are located at the center of the distribution

Page 36: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Distribution Shape

The shape of the distribution is said to be skewed if the observations are not symmetrically distributed around the center.

(continued)

Positively Skewed Distribution

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9

Fre

qu

ency

Negatively Skewed Distribution

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9

Fre

qu

ency

A positively skewed distribution (skewed to the right) has a tail that extends to the right in the direction of positive values.

A negatively skewed distribution (skewed to the left) has a tail that extends to the left in the direction of negative values.

Chap 3-36Statistics for Business and Economics

Page 37: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-37

Shape of a Distribution

Describes how data are distributed Measures of shape

Symmetric or skewed

Mean = Median = Mode Mean < Median < Mode Mean <Median < Mode

Right-SkewedLeft-Skewed Symmetric

Page 38: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Kurtosis

Refers to the shape of a distribution Can take positive (peaked distribution), negative

(flat distribution) or zero values (a symmetrical, bell-shaped, normal distribution)

Statistics for Business and Economics Chap 3-38

n n4 4

i ii 1 i 1

2 24 4

(x x) (x x) - 3 (simple) or -3 (weighted)

in

s n s n

Page 39: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Kurtosis

Statistics for Business and Economics Chap 3-39

(continued)

Page 40: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-40

If the data distribution is bell-shaped, then the interval:

contains about 68% of the values in the population or the sample

The Empirical Rule

1σμ

μ

68%

1σμ

Page 41: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-41

contains about 95% of the values in the population or the sample

contains about 99.7% of the values in the population or the sample

The Empirical Rule

2σμ

3σμ

3σμ

99.7%95%

2σμ

Page 42: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-42

Using Microsoft Excel

Simple Descriptive Statistics can be obtained from Microsoft® Excel

Use menu choice:

Data Tab / Data analysis / Descriptive statistics

Enter details in dialog box

Page 43: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-43

Enter dialog box details

Check box for summary statistics

Click OK

Using Microsoft Excel(continued)

Page 44: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Statistics for Business and Economics Chap 3-44

Excel output

Microsoft Excel

Simple descriptive statistics output, using the house price

data:

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Page 45: Chap 3-1 Statistics for Business and Economics Topic 3 Describing Data: Numerical Figures won´t lie, but liars will figure

Next topic

Theory of probability

Thank you!

Have a nice day!