1 tendencia central y dispersión de una distribución

29
1 Tendencia central y Tendencia central y dispersión de una dispersión de una distribución distribución

Upload: clifton-lester

Post on 26-Dec-2015

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Tendencia central y dispersión de una distribución

1

Tendencia central y dispersión Tendencia central y dispersión de una distribuciónde una distribución

Page 2: 1 Tendencia central y dispersión de una distribución

2

Review Topics•Measures of Central Tendency Mean, Median, Mode•Quartile

•Measures of Variation The Range, Variance and Standard Deviation, Coefficient of variation•Shape Symmetric, Skewed

Page 3: 1 Tendencia central y dispersión de una distribución

3

Important Summary Measures

Central Tendency

MeanMedian

Mode

Quartile

One sample Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

Page 4: 1 Tendencia central y dispersión de una distribución

4

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

xn

ii

1

Data: You can access practice sample data on HMO premiums here.

Page 5: 1 Tendencia central y dispersión de una distribución

5

With one data pointclearly the central location is at the pointitself.

But if the third data point appears on the left hand-sideof the midrange, it should “pull”the central location to the left.

Measures of Central Measures of Central Location (Tendency)Location (Tendency)

Usually, we focus our attention on two aspects of measures of central location:– Measure of the central data point (the average).– Measure of dispersion of the data about the average.

With two data points,the central location should fall in the middlebetween them (in order to reflect the location ofboth of them).

If the third data point appears exactly in the middle of the current range, the centrallocation should not change (because it is currently residing in the middle).

Page 6: 1 Tendencia central y dispersión de una distribución

6

nx

x in

1i

– This is the most popular and useful measure of central location

Sum of the measurementsNumber of measurements

Mean =

Sample mean Population mean

Nx i

N1i

Sample size Population size

nx

x in

1i

Arithmetic Arithmetic meanmean

Page 7: 1 Tendencia central y dispersión de una distribución

7

66654321

61 xxxxxxx

x ii

• Example 4.1

The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by

77 33 99 44 664.54.5

• Example 4.2

Suppose the telephone bills of example 2.1 represent population of measurements. The population mean is

200x...xx

200x 20021i

2001i 42.1942.19 15.3015.30 53.2153.21

43.5943.59

2 2

Page 8: 1 Tendencia central y dispersión de una distribución

8

26,26,28,29,30,32,60,31

Odd number of observations

26,26,28,29,30,32,60

Example 4.4

Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29.Find the median salary.

– The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude.

Suppose one employee’s salary of $31,000was added to the group recorded before.Find the median salary.

Even number of observations

26,26,28,29, 30,32,60,3126,26,28,29, 30,32,60,31

There are two middle values!First, sort the salaries.Then, locate the value in the middle

First, sort the salaries.Then, locate the values in the middle26,26,28,29, 30,32,60,3129.5,

The medianThe median

Page 9: 1 Tendencia central y dispersión de una distribución

9

– The mode of a set of measurements is the value that occurs most frequently.

– Set of data may have one mode (or modal class), or two or more modes.

The modal classFor large data setsthe modal class is much more relevant than the a single-value mode.

The modeThe mode

Page 10: 1 Tendencia central y dispersión de una distribución

10

• Example 4.6A professor of statistics wants to report the results of a midterm exam, taken by 100 students. The data appear in file XM04-06.Find the mean, median, and mode, and describe the information they provide.

Marks

Mean 73.98Standard Error 2.1502163Median 81Mode 84Standard Deviation 21.502163Sample Variance 462.34303Kurtosis 0.3936606Skewness -1.073098Range 89Minimum 11Maximum 100Sum 7398Count 100

Marks

Mean 73.98Standard Error 2.1502163Median 81Mode 84Standard Deviation 21.502163Sample Variance 462.34303Kurtosis 0.3936606Skewness -1.073098Range 89Minimum 11Maximum 100Sum 7398Count 100

The mean provides informationabout the over-all performance level of the class.The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated.Then, the mode becomes a logical measure to compute.

Excel Results

Page 11: 1 Tendencia central y dispersión de una distribución

11

Relationship among Mean, Median, Relationship among Mean, Median, and Modeand Mode

If a distribution is symmetrical, the mean, median and mode coincide

If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode

Page 12: 1 Tendencia central y dispersión de una distribución

12

If a distribution is symmetrical, the mean, median and mode coincide

If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ.

A positively skewed distribution(“skewed to the right”)

MeanMedian

Mode MeanMedian

Mode

A negatively skewed distribution(“skewed to the left”)

Page 13: 1 Tendencia central y dispersión de una distribución

13

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance

Sample

Variance

PopulationStandardDeviationSample

Standard

Deviation

Range

Interquartile Range

100%

X

SCV

Page 14: 1 Tendencia central y dispersión de una distribución

14

Measures of variabilityMeasures of variability(Looking beyond the average)(Looking beyond the average)

Measures of central location fail to tell the whole story about the distribution.

A question of interest still remains unanswered:

How typical is the average value of all the measurements in the data set?

How much spread out are the measurements about the average value?

or

Page 15: 1 Tendencia central y dispersión de una distribución

15

Observe two hypothetical data sets

The average value provides a good representation of thevalues in the data set.

Low variability data set

High variability data set

The same average value does not provide as good presentation of thevalues in the data set as before.

This is the previous data set. It is now changing to...

Page 16: 1 Tendencia central y dispersión de una distribución

16

– The range of a set of measurements is the difference between the largest and smallest measurements.

– Its major advantage is the ease with which it can be computed.

– Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points.

? ? ?

But, how do all the measurements spread out?

Smallestmeasurement

Largestmeasurement

The range cannot assist in answering this question

Range

The rangeThe range

Page 17: 1 Tendencia central y dispersión de una distribución

17

– This measure of dispersion reflects the values of all the measurements.

– The variance of a population of N measurements x1, x2,…,xN having a mean is defined as

– The variance of a sample of n measurementsx1, x2, …,xn having a mean is defined as

N

)x( 2i

N1i2

N

)x( 2i

N1i2

x

1n

)xx(s

2i

n1i2

1n

)xx(s

2i

n1i2

The varianceThe variance

Page 18: 1 Tendencia central y dispersión de una distribución

18

Consider two small populations:Population A: 8, 9, 10, 11, 12Population B: 4, 7, 10, 13, 16

1098

74 10

11 12

13 16

8-10= -2

9-10= -111-10= +1

12-10= +2

4-10 = - 6

7-10 = -3

13-10 = +3

16-10 = +6

Sum = 0

Sum = 0

The mean of both populations is 10...

…but measurements in Bare much more dispersedthen those in A.

Thus, a measure of dispersion is needed that agrees with this observation.

Let us start by calculatingthe sum of deviations

A

B

The sum of deviations is zero in both cases,therefore, another measure is needed.

Page 19: 1 Tendencia central y dispersión de una distribución

19

1098

74 10

11 12

13 16

8-10= -2

9-10= -111-10= +1

12-10= +2

4-10 = - 6

7-10 = -3

13-10 = +3

16-10 = +6

Sum = 0

Sum = 0

A

B

The sum of deviations is zero in both cases,therefore, another measure is needed.

The sum of squared deviationsis used in calculating the variance.See example next.

Page 20: 1 Tendencia central y dispersión de una distribución

20

Let us calculate the variance of the two populations

185

)1016()1013()1010()107()104( 222222B

25

)1012()1011()1010()109()108( 222222A

Why is the variance defined as the average squared deviation?Why not use the sum of squared deviations as a measure of dispersion instead?

After all, the sum of squared deviations increases in magnitude when the dispersionof a data set increases!!

Page 21: 1 Tendencia central y dispersión de una distribución

21

– Example 4.8 Find the mean and the variance of the following

sample of measurements (in years).

3.4, 2.5, 4.1, 1.2, 2.8, 3.7– Solution

n

)x(x

1n1

1n

)xx(s

2i

n1i2

i

n

1i

2i

n1i2

95.26

7.176

7.38.22.11.45.24.36

xx i

61i

A shortcut formula

=[3.42+2.52+…+3.72]-[(17.7)2/6] = 1.075 (years)2

Page 22: 1 Tendencia central y dispersión de una distribución

22

Sample Standard Deviation

1

2

n

XX i For the Sample : use n - 1 in the denominator.

Data: 10 12 14 15 17 18 18 24

s =

n = 8 Mean =16

18

1624161816171615161416121610 2222222

)()()()()()()(

= 4.2426

s

:X i

Page 23: 1 Tendencia central y dispersión de una distribución

23

Interpreting Standard Interpreting Standard DeviationDeviation

The standard deviation can be used to– compare the variability of several distributions– make a statement about the general shape of a

distribution.

The empirical rule: If a sample of measurements has a mound-shaped distribution, the interval

tsmeasuremen the of 68%ely approximat contains )sx,sx( tsmeasuremen the of 95%ely approximat contains )s2x,s2x(

tsmeasuremen the of allvirtually contains )s3x,s3x(

Page 24: 1 Tendencia central y dispersión de una distribución

24

Comparing Standard Deviations

1

2

n

XX is =

= 4.2426

N

X i

2 = 3.9686

Value for the Standard Deviation is larger for data considered as a Sample.

Data : 10 12 14 15 17 18 18 24:X i

N= 8 Mean =16

Page 25: 1 Tendencia central y dispersión de una distribución

25

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Page 26: 1 Tendencia central y dispersión de una distribución

26

Measures of AssociationMeasures of Association

Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram.

– Covariance - is there any pattern to the way two

variables move together? – Correlation coefficient - how strong is the

linear relationship between two variables

Page 27: 1 Tendencia central y dispersión de una distribución

27

N

)y)((xY)COV(X,covariance Population yixi

N

)y)((xY)COV(X,covariance Population yixi

x (y) is the population mean of the variable X (Y)

N is the population size. n is the sample size.

1-n

)y)((xY)cov(X,covariance Sample yixi

1-n

)y)((xY)cov(X,covariance Sample yixi

The The covariancecovariance

Page 28: 1 Tendencia central y dispersión de una distribución

28

– This coefficient answers the question: How strong is the association between X and Y.

yx

)Y,X(COV

ncorrelatio oft coefficien Population

yx

)Y,X(COV

ncorrelatio oft coefficien Population

yxss)Y,Xcov(

r

ncorrelatio oft coefficien Sample

yxss

)Y,Xcov(r

ncorrelatio oft coefficien Sample

The coefficient of correlationThe coefficient of correlation

Page 29: 1 Tendencia central y dispersión de una distribución

29

COV(X,Y)=0 or r =

+1

0

-1

Strong positive linear relationship

No linear relationship

Strong negative linear relationship

or

COV(X,Y)>0

COV(X,Y)<0