Download - Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Measures of Variability

Copyright © 2013 Pearson Education

Same center,

different variation

Variation

Variance Standard

Deviation

Coefficient of

Variation

Range Interquartile

Range

Measures of variation give

information on the spread

or variability of the data

values.

Ch. 2-1

2.2

Range

Simplest measure of variation

Difference between the largest and the smallest

observations:


Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Ch. 2-2

Disadvantages of the Range

Ignores the way in which data are distributed

Sensitive to outliers


7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Ch. 2-3

Interquartile Range

Can eliminate some outlier problems by using the interquartile range (IQR)

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

Copyright © 2013 Pearson Education Ch. 2-4

Interquartile Range


The interquartile range (IQR) measures the

spread in the middle 50% of the data

Defined as the difference between the

observation at the third quartile and the

observation at the first quartile

IQR = Q3 - Q1

Box-and-Whisker Plot


A box-and-whisker plot is a graph that describes the

shape of a distribution

Created from the five-number summary: the

minimum value, Q1, the median, Q3, and the

maximum

The inner box shows the range from Q1 to Q3, with a

line drawn at the median

Two “whiskers” extend from the box. One whisker is

the line from Q1 to the minimum, the other is the line

from Q3 to the maximum value

Box Plot


Population Variance

Average of squared deviations of values from

the mean (Karl Pearson 1893)

Population variance:


N

μ)(x

σ

N

1i

2

i

2

Where = population mean

N = population size

xi = ith value of the variable x

μ

Ch. 2-8

Sample Variance

Average (approximately) of squared deviations

of values from the mean

Sample variance:


1-n

)x(x

s

n

1i

2

i

2

Defect: Not in the same unit of original data values.

Ch. 2-9

Population Standard Deviation

Most commonly used measure of variation

Shows variation about the mean

Has the same units as the original data

Population standard deviation:


N

μ)(x

σ

N

1i

2

i

Ch. 2-10

Sample Standard Deviation

Most commonly used measure of variation

Shows variation about the mean

Has the same units as the original data

Sample standard deviation:


1-n

)x(x

S

n

1i

2

i

Ch. 2-11

Calculation Example:Sample Standard Deviation


Sample

Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.30957

130

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

A measure of the “average”

scatter around the mean

Ch. 2-12

Measuring variation


Small standard deviation

Large standard deviation

Ch. 2-13

Comparing Standard Deviations


s = 3.338(compare to the two

cases below)

11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

s = 0.926 (values are concentrated

near the mean)

11 12 13 14 15 16 17 18 19 20 21s = 4.570(values are dispersed far

from the mean)Data C

Ch. 2-14

Mean = 15.5 for each data set

Advantages of Variance and Standard Deviation

Each value in the data set is used in the

calculation

Values far from the mean are given extra

weight

(because deviations from the mean are squared)


Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Can be used to compare two or more sets of

data measured in different units


100%x

sCV

Ch. 2-16

100%μ

σCV

Population coefficient of

variation:

Sample coefficient of

variation:

Comparing Coefficient of Variation

Stock A:

Average price last year = $50

Standard deviation = $5

Stock B:

Average price last year = $100

Standard deviation = $5


Both stocks

have the same

standard

deviation, but

stock B is less

variable relative

to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Ch. 2-17

For any population with mean μ and

standard deviation σ , and k > 1 , the

percentage of observations that fall within

the interval

[μ + kσ]Is at least


Chebychev’s Theorem

)]%(1/k100[12

Ch. 2-18

Regardless of how the data are distributed, at

least (1 - 1/k2) of the values will fall within k

standard deviations of the mean (for k > 1)

Examples:

(1 - 1/1.52) = 55.6% ……... k = 1.5 (μ ± 1.5σ)

(1 - 1/22) = 75% …........... k = 2 (μ ± 2σ)

(1 - 1/32) = 89% …….…... k = 3 (μ ± 3σ)


Chebychev’s Theorem

withinAt least

(continued)

Ch. 2-19

If the data distribution is bell-shaped, then

the interval:

contains about 68% of the values in

the population or the sample


The Empirical Rule

1σμ

μ

68%

1 σμ

Ch. 2-20

contains about 95% of the values in

the population or the sample

contains almost all (about 99.7%) of

the values in the population or the sample


The Empirical Rule

2 σμ

3 σμ

3 σμ

99.7%95%

2 σμ

Ch. 2-21

(continued)

A z-score shows the position of a value

relative to the mean of the distribution.

indicates the number of standard deviations a

value is from the mean.

A z-score greater than zero indicates that the value is

greater than the mean

a z-score less than zero indicates that the value is

less than the mean

a z-score of zero indicates that the value is equal to

the mean.


z-Score

Ch. 2-22

If the data set is the entire population of data

and the population mean, µ , and the population

standard deviation, σ, are known, then for each

value, xi, the z-score associated with xi is


z-Score

Ch. 2-23

σ

μ-xz

i

(continued)

If intelligence is measured for a population

using an IQ score, where the mean IQ score

is 100 and the standard deviation is 15, what

is the z-score for an IQ of 121?


z-Score

Ch. 2-24

1.415

100- 121

σ

μ-xz

i

(continued)

A score of 121 is 1.4 standard

deviations above the mean.

Weighted Mean and Measures of Grouped Data

The weighted mean of a set of data is

Where wi is the weight of the ith observation

and

Use when data is already grouped into n classes, with wi values in the ith class


n

xwxwxw

n

xw

xnn2211

n

1i

ii

Ch. 2-25

i

wn

2.3

Approximations for Grouped Data

Suppose data are grouped into K classes, with

frequencies f1, f2, . . ., fK, and the midpoints of the

classes are m1, m2, . . ., mK

For a sample of n observations, the mean is


n

mf

x

K

1i

ii

K

1i

ifnwhere

Ch. 2-26

Approximations for Grouped Data

Suppose data are grouped into K classes, with

frequencies f1, f2, . . ., fK, and the midpoints of the

classes are m1, m2, . . ., mK

For a sample of n observations, the variance is


1n

)x(mf

s

K

1i

2

ii

2

Measures of Relationships Between Variables

Two measures of the relationship between variable are

Covariance a measure of the direction of a linear relationship

between two variables

Correlation Coefficient a measure of both the direction and the strength of a

linear relationship between two variables


2.4

Covariance

The covariance measures the strength of the linear relationship between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship

No causal effect is implied


N

))(y(x

y),(xCov

N

1i

yixi

xy

1n

)y)(yx(x

sy),(xCov

n

1i

ii

xy

Ch. 2-29

Interpreting Covariance

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent


Coefficient of Correlation

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:


YXss

y),(xCovr

YXσσ

y),(xCovρ

Ch. 2-31

Features of Correlation Coefficient, r

Unit free

Ranges between –1 and 1

The closer to –1, the stronger the negative linear

relationship

The closer to 1, the stronger the positive linear

relationship

The closer to 0, the weaker any positive linear

relationship


Scatter Plots of Data with Various Correlation Coefficients


Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Ch. 2-33

Interpreting the Result

r = .733

There is a relatively

strong positive linear

relationship between

test score #1

and test score #2

Students who scored high on the first test tended to score high on second test


Scatter Plot of Test Scores

70

75

80

85

90

95

100

70 75 80 85 90 95 100

Test #1 ScoreT

est

#2 S

co

re

Ch. 2-34

Covariance

Let X and Y be discrete random variables with means

μX and μY

The expected value of (X - μX)(Y - μY) is called the

covariance between X and Y

For discrete random variables

An equivalent expression is

x y

yxYXy))P(x,μ)(yμ(x)]μ)(YμE[(XY)Cov(X,

x y

yxyxμμy)xyP(x,μμE(XY)Y)Cov(X,

Copyright © 2013 Pearson

EducationCh. 4-35

Correlation

The correlation between X and Y is:

-1 ≤ ρ ≤ 1

ρ = 0 no linear relationship between X and Y

ρ > 0 positive linear relationship between X and Y when X is high (low) then Y is likely to be high (low)

ρ = +1 perfect positive linear dependency

ρ < 0 negative linear relationship between X and Y when X is high (low) then Y is likely to be low (high)

ρ = -1 perfect negative linear dependency

YXσσ

Y)Cov(X,Y)Corr(X,ρ


EducationCh. 4-36

Covariance and Independence

The covariance measures the strength of the

linear relationship between two variables

If two random variables are statistically

independent, the covariance between them

is 0

The converse is not necessarily true


EducationCh. 4-37

Example: Investment Returns

Return per $1,000 for two types of investments

P(xiyi) Economic condition Passive Fund X Aggressive Fund Y

.2 Recession - $ 25 - $200

.5 Stable Economy + 50 + 60

.3 Expanding Economy + 100 + 350

Investment

E(x) = μx = (-25)(.2) +(50)(.5) + (100)(.3) = 50

E(y) = μy = (-200)(.2) +(60)(.5) + (350)(.3) = 95


EducationCh. 4-38

Computing the Standard Deviation for Investment Returns


0.2 Recession - $ 25 - $200

0.5 Stable Economy + 50 + 60

0.3 Expanding Economy + 100 + 350

Investment

43.30

(0.3)50)(100(0.5)50)(50(0.2)50)(-25σ 222

X

193.71

(0.3)95)(350(0.5)95)(60(0.2)95)(-200σ 222

y


EducationCh. 4-39

Covariance for Investment Returns


.2 Recession - $ 25 - $200

.5 Stable Economy + 50 + 60

.3 Expanding Economy + 100 + 350

Investment

8250

95)(.3)50)(350(100

95)(.5)50)(60(5095)(.2)200-50)((-25Y)Cov(X,


EducationCh. 4-40

Portfolio Example

Investment X: μx = 50 σx = 43.30

Investment Y: μy = 95 σy = 193.21

σxy = 8250

Suppose 40% of the portfolio (P) is in Investment X and 60% is in Investment Y:

The portfolio return and portfolio variability are between the values for investments X and Y considered individually

77)95()6(.)50(4.E(P)

04.133

8250)2(.4)(.6)((193.21))6(.(43.30)(.4)σ2222

P


EducationCh. 4-41

Interpreting the Results for Investment Returns

The aggressive fund has a higher expected return, but much more risk

μy = 95 > μx = 50

but

σy = 193.21 > σx = 43.30

The Covariance of 8250 indicates that the two investments are positively related and will vary in the same direction


EducationCh. 4-42

Download - Measures of Variability - National Tsing Hua Universitymx.nthu.edu.tw/~chaoenyu/stat5.pdf · 2019. 11. 7. · or variability of the data values. Ch. 2-1 2.2. Range ... Measures the

Top Related