numerical descriptive measures · 2/13/2017 · dr. ahmed jaradat, agu 2. different sets of data...

21
1 Numerical Descriptive Measures Measures of Central Tendency (Location) The Arithmetic Mean (Average) The Median The Mode Measures of Dispersion or Variability The Range The variance The Standard Deviation The Coefficient of Variation Measures of Relative Standing Quartiles and Percentiles STA 231: Biostatistics Dr. Ahmed Jaradat, AGU 2/12/2017

Upload: builiem

Post on 14-Feb-2019

234 views

Category:

Documents


10 download

TRANSCRIPT

Page 1: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

1

Numerical Descriptive Measures

Measures of Central Tendency (Location)

The Arithmetic Mean (Average)

The Median

The Mode

Measures of Dispersion or Variability

The Range

The variance

The Standard Deviation

The Coefficient of Variation

Measures of Relative Standing

Quartiles and Percentiles

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Page 2: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

2.4 Descriptive Statistics:

Measures of Central Tendency

Definitions:

Population: is the totality of all subjects that have certain common

characteristics

Parameter: is a characteristic or numerical value obtained from the

population.

Sample: is a subset or part of the population

Statistic: is a characteristic or numerical value obtained from the

sample

Notation:

Variable of interest is denoted by capital letters, say X. Specific values

of the random variable will be denoted by lowercase letters x.

x1, x2, …, xN Population of size N

x1, x2, …, xn Sample of size n

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 2

Page 3: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

Different sets of data have the same

distribution shape

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 3

Page 4: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

4

This is the most popular and useful measure of central location

Example 1: Sample mean

The mean of the sample of six measurements represent random blood

glucose levels from a group of first year medical students (mmol/litre)

4.7, 3.6, 3.8, 3.2, 4.7, 4.0 is given by

Mean Population N

x

N

x Ni

N

1i ,...

μ 21

xx

Mean Sample ,n

x

n

xx ni

n

1i

...21 xx

The Arithmetic Mean (Average)

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

tsMeasuremen ofNumber

tsMeasuremen of SumMean

4.06

4.04.73.23.83.64.7 6

xxxxxx

6

xX 654321i

6

1i

Page 5: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

5

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Example 2: Population Mean

Consider the final grades of STA 231: Biostatistics:

( 41, 41, 45, 45, 47, …., 100,100). The population mean is:

Example 3: Mean in repeated measurements

When many of the measurements have the same value, the measurements can be

summarized in a frequency table. Suppose the Biostatistics scores of a sample of

30 students in a class were recorded as follows:

181

181x...

2x

1x

181

ix181

1iμ

Score 55 66 70 83 88 90

Number of Students 4 8 7 4 5 2

30

30x...

2x

1x

30

ix30

1ix

7330

90(2)...66(8)55(4)

6.76181

100100...4745454141

Page 6: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

Properties of the Mean1. All values are used

2. Uniqueness. For a given data set there is one and only one

arithmetic mean.

3. Simplicity. The arithmetic mean is easily understood and easy to

compute and interpret.

4. Mean is sensitive to the extreme values.

5. The sum of the deviations from the mean is 0.

6

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Page 7: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

7

The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude.

The [(n+1)/2]th largest observation if n is odd

The average of (n/2)th and [(n/2)+1]th largest observations if n is even

Example:

Seven patient s survival times were recorded (in months):

28, 60, 26, 32, 30, 26, 29. Find the median survival time.

First, sort the times, then, locate the value in the middle

n=7: Odd number of observations

26, 26, 28, 29, 30, 32, 60

Median=29

The Median

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Page 8: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 8

Suppose one patient’s survival time of 31 was added to the group

recorded before. 28, 60, 26, 32, 30, 26, 29, 31.

Find the median survival time.

First, sort the times, then, locate the values in the middle

n=8: Even number of observations

26, 26, 28, 29, 30, 31, 32, 60

Median=(29+30)/2=29.5

Properties of the median

1. Uniqueness. For a given data set there is one and only one median.

2. Simplicity. The median is easily to compute.

3. It is not affected by extreme values.

4. It may not be an actual observation in the data set.

5. It can be applied in ordinal level.

Example Cont’d

Page 9: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

9

The mode is the most frequently occurring value among all the measurements in a sample.

Example:

The manager of a men’s store observes the waist size (in inches) of

trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40.

The mode of this data set is 34 in.

This information seems valuable (for example, for the design of a new

display in the store), much more than “the median is 33.5 in.”

The modal class: 74.5-79.5

Mode=(79.5+74.5)/2=77

The Mode

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

0

10

20

30

40

50

60

Nu

mb

er o

f S

tud

en

ts

Grade

Page 10: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

The Mode Set of data may have one mode (or modal class), or two or more

modes.

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 10

0

10

20

30

40

50

60

Nu

mb

er

of

Stu

den

ts

Grade

Unimodal

0

5

10

15

20

25

30

35

40

45

Nu

mb

er

of

Stu

den

ts

Grade

Multimodal

0

10

20

30

40

50

60

Nu

mb

er

of

Stu

den

ts

Grade

Bimodal

Page 11: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

The Mode

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 11

Properties of the Mode:

1. Very quick and easy to determine.2. Is an actual value of the data.3. Sometimes, it is not unique.4. It can be used when the data are qualitative as well as

quantitative. 5. Not affected by extreme scores. 6. Can change dramatically from sample to sample.7. Modes in particular are probably best applied to nominal

data. 8. It may not exist.

Page 12: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

Relationship Between Mean, Median and Mode

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 12

Mean=Median=Mode Mean<Median<ModeMean>Median>Mode

If a distribution is symmetrical, the mean, median and mode coincide. If

a distribution is non symmetrical, and skewed to the left or to the

right, the three measures differ.

Page 13: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

Properties of Central Tendency Measures

◦ Mean is sensitive to the tails, median and mode, not.

◦ Mode can be affected by little changes in the data, median and

mean, not.

◦ Mode and median can be find in a chart.

◦ The three measures are the same in a symmetric (Normal)

distribution.

What measurement to use?

◦ For skewed distributions, we use median.

◦ For statistical analysis or inference, we use mean.

13

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Type of Variable Best Measure of Central Tendency

Nominal Mode

Ordinal Median

Interval/Ratio (not skewed) Mean

Interval/Ratio (skewed) Median

Page 14: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 14

50

50

50

50

45

55

55

45

90

90

10

10

Mean = 50 Mean =50

40

40

60

60

Mean =50Mean =50

A measure of dispersion conveys information regarding the amount of variability present in a set of data.

Note:

1. If all the values are the same, there is no dispersion.

2. If all the values are different, there is a dispersion.

3. If the values close to each other, the amount of dispersion small.

4. If the values are widely scattered, the Dispersion is greater.

2.5 Descriptive Statistics:

Measures of Dispersion

Page 15: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

15

Measure of dispersion:

Absolute:

Measure the dispersion in the original unit of the data. Variability

in two or more distributions can be compared provided they are

given in the same unit and have the same mean.

Relative:

Measure of dispersion is free from unit of measurement of data. It

is the ratio of a measure of absolute dispersion to the average,

from which absolute deviations are measured. It is called as

coefficient of variation.

The Range: The range of a set of measurements is the difference

between the largest and smallest measurements.

Its major advantage is the ease with which it can be computed.

Its major shortcoming is its failure to provide information on the

dispersion of the values between the two end points.

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Page 16: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

16

This measure of dispersion reflects the values of all the measurements.

The variance of a population of N measurements x1, x2, …, xN

having a mean m is defined as

The variance of a sample of n measurements x1, x2, …,xn

having a mean is defined as

N

)x( 2i

N1i2 m

x

1n

)xx(s

2i

n1i2

The Variance

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Page 17: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

Example

Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16

2/12/2017

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 17

Xi Xi-10

8 -2

9 -1

10 0

11 1

12 2

Sum 50 0

Xi Xi-10

4 -6

7 -3

10 0

13 3

16 6

Sum 50 0

The mean of both populations is 10, but measurements in B are much

more dispersed than those in A. Therefore, another measure is

needed.

T he sum of deviations is zero in both cases.

The sum of squared deviations is used in calculating the variance.

Page 18: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

18

Let us calculate the variance of the two populations

185

90

5

)1016()1013()1010()107()104( 222222B

25

10

5

)1012()1011()1010()109()108( 222222A

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Xi Xi-10 (Xi-10)2

8 -2 4

9 -1 1

10 0 0

11 1 1

12 2 4

Sum 50 0 10

Xi Xi-10 (Xi-10)2

4 -6 36

7 -3 9

10 0 0

13 3 9

16 6 36

Sum 50 0 90

Page 19: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

19

The standard deviation of a set of measurements is the

square root of the variance of the measurements

The Coefficient of Variation

◦ The coefficient of variation of a set of measurements is

the standard deviation divided by the mean value.

This coefficient provides a proportionate measure of variation.

(100%)μ

σC.V. : variationoft coefficien Population

(100%)x

sC.V. : variationoft coefficien Sample

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

2

2

:deviationandardstPopulation

ss:deviationstandardSample

Page 20: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

20

Interpreting Standard Deviation

The standard deviation can be used to

◦ compare the variability of several distributions

◦ make a statement about the general shape of a

distribution.

The coefficient of variation can be used to compare

variation between sets of data.

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU2/12/2017

Page 21: Numerical Descriptive Measures · 2/13/2017 · Dr. Ahmed Jaradat, AGU 2. Different sets of data have the same distribution shape 2/12/2017 STA 231: Biostatistics ... 34, 36, 33,

STA 231: Biostatistics

Dr. Ahmed Jaradat, AGU 21

Example 2.5.3 Page 46

Suppose two samples of human males yield the following data:

C.V.(25 years)= (10/145)*100%=6.9%

C.V.(11 years)= (10/80)*100%=12.5%

The variation in11-years sample is more than the variation

in 25-years sample

Sample 1 Sample 2

Age 25 years 11 years

Mean weight 145 pounds 80 pounds

Standard deviation 10 pounds 10 pounds

2/12/2017