description of measurement data

46
Description of measurement data Prof. Yi-xiong Lei (102130 5)

Upload: mandar

Post on 19-Jan-2016

72 views

Category:

Documents


0 download

DESCRIPTION

Description of measurement data. Prof. Yi-xiong Lei (1021305). Distribution of frequency To summarize the data or describe the distri- bution of frequency, frequency table or graph is the common way. The type of distribution: Normal distribution Skewness distribution - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Description of measurement data

Description of measurement data

Prof. Yi-xiong Lei (1021305)

Page 2: Description of measurement data

1.1. Distribution of frequencyDistribution of frequencyTo summarize the data or describe the distri-

bution of frequency, frequency table or graph

is the common way. The type of distribution:

Normal distribution

Skewness distribution

Positive skewness distribution

Negative skewness distribution

Page 3: Description of measurement data

If we are faced with a large amount of data,

we want to describe its more important features

more concisely. Usually we describe the data fr

om two aspects.

Measures of Location

Central tendency (central position)

Measures of Spread

Tendency of dispersion (variation)

Page 4: Description of measurement data

RBCs Mark Frequency

3.70~ | | 2 3.90~ | | | | 44.10~ 正 | | | | 94.30~ 正正正 | 164.50~ 正正正正 | | 224.70~ 正正正正正 254.90~ 正正正正 | 215.10~ 正正正 | | 175.30~ 正 | | | | 95.50~ | | | | 45.70~5.90 | 1

Total —— 130

* Normal distributionTable2-1. Frequency Distribution of Red Blood Cells (1012/L)

among 130 Normal Male Adults in Some District

Page 5: Description of measurement data

﹡ Positive skewness distribution Table 2-2. Frequency Distribution of Hair Hg Value (μg/g)

among 238 Normal Adults

Value of hair Hg Frequency Accumulative Frequency AF (%) (1) (2) (3) (4)=(3)/238

0.3~ 20 20 8.4 0.7~ 66 86 36.1 1.1~ 60 146 61.3 1.5~ 48 194 81.5 1.9~ 18 212 89.1 2.3~ 16 228 95.8 2.7~ 6 234 98.3 3.1~ 1 235 98.7 3.5~ 0 235 98.7 3.9~ 3 238 100.0

Page 6: Description of measurement data

﹡ Negative skewness distributionTable 2-3. Frequency Distribution of Patients who die

of Malignant Tumors in some year and district Age (yr.) No. of Death Accumulative Frequency AF (%) 0~ 5 5 0.42 10~ 12 17 1.41 20~ 15 32 2.66 30~ 76 108 8.98 40~ 189 297 24.69 50~ 234 531 44.14 60~ 386 917 76.23 70~ 286 1203 100.00

Page 7: Description of measurement data

Figure 2-1. Frequency Distribution of Serum Cholesterol (mg/dl)

among 200 Normal Adults in Some District

Page 8: Description of measurement data

2. Average2. AverageWhat measure is used to describe the central

tendency? It is the average including mean,

geometric mean and median.

Mean, symbolized by , -bar

Average Geometric mean, symbolized by G

Median, symbolized by M

Page 9: Description of measurement data

1). Mean, is suitable to the data distri-buted in normal distribution or at least symmetric distribution.

x1+ x2+……+ xn ∑ x

Formula(1) x = = n n

f 1x1 + f 2x2 + ……+f kxk ∑ fx Formula(2) x = =

f 1 +f 2+……+f k n

The formula (1) is for original data (direct method)

The formula (2) is for frequency table (weighing method)

Page 10: Description of measurement data

RBCs Middle value (X) Frequency (f) f X

3.80~ 3.90 2 7.8 4.00~ 4.10 6 24.6 4.20~ 4.30 11 47.3 4.40~ 4.50 25 112.5 4.60~ 4.70 32 150.4 4.80~ 4.90 27 132.3 5.00~ 5.10 17 86.7 5.20~ 5.30 13 68.9 5.40~ 5.50 4 22.0 5.60~ 5.70 2 11.4 5.80~6.00 5.90 1 5.9

Total — 140(∑f) 669.8 (∑f x)

X= ∑f x ∑f =

669.8140

= 4.78 (×1012/L)

Table2-4. Frequency Distribution of Red Blood Cells (1012/L) among 140 Normal Male Adults in Some District

Page 11: Description of measurement data

2). Geometric mean, is suitable to the data distributed in positive skewed distribution or logarithm normal distribution.

(1) G = n √ x1 · x2 … xn

lgx1+lgx2+…+lgxn ∑ lgx G = lg–1 = lg–1

n n

f1lgx1+f2lgx2+…+fklgxk ∑f lgx (2) G = lg–1 = lg–1

∑f ∑f

The formula (1) is for original data (direct method)

The formula (2) is for frequency table (weighing method)

Page 12: Description of measurement data

There are 6 items of serum antibodies, the concentrations respectively are 1:10, 1:100, 1:1000, 1:10000 and 1:100000, what is the average concentration ?

X=∑x n

10+100+… … … +100000 5=

=22222

G = lg –1 lgx1+lgx2+…+lgxn n 〔 〕= lg–1

∑ lgx n

= lg –1 1+2+3+4+55

〔 〕 = lg –1 3

=10001:1000 RightRight

WrongWrong

Page 13: Description of measurement data

Ab concent. Children (f) Reciprocal (x) lgx flgx

1: 40 3 40 1.602 4.81 1: 80 22 80 1.903 41.87 1: 160 17 160 2.204 37.47 1: 320 9 320 2.505 22.55 1: 640 0 640 2.806 0.00 1:1280 1 1280 3.107 3.11 Total ∑52 — — 109.79

=129.2

1 : 129

Lg –1 ∑f lgx∑f 〔 〕= Lg –1 109.79

52〔 〕G =

Table 2-5. The special serum antibodies’ concentrations after one month when 52 susceptible

children immunized with measles vaccine

average antibodies’ concentration

Page 14: Description of measurement data

3).Median, is suitable to all kinds of data but it is poor attribution for further ana-lysis comparing to mean.

M = X n+1 (n is odd No.) 2

1

or M= X n + X n (n is even No.) 2 2 2

+1

The following formula is for original data (direct method):

Page 15: Description of measurement data

For example :

There are 9 cases, the latent period is 2, 3, 3, 3, 4, 5, 6, 9, 16 days, please calculate their average latent period.

M = X(n+1)/2 = X(9+1)/2 = X5 = 4 (days)

Page 16: Description of measurement data

4).Median and percentile

for the data from a frequency table

we do not know the exactly value of median, using the following formula for median or percentile

Px =L+ i / fx ( n.x% - ΣfL )

(frequency table method or percentile method)

Page 17: Description of measurement data

Px =L+ i / fx ( n.x% - ΣfL )X : percentile;

L : the low limit of group where percentile located in

i : the interval;

f : frequency in the group;

n : the total cases;

ΣfL : accumulative frequency that less than L.

If Px = 50% = M, using following formula:

M=L+i/f(n/2-ΣfL)

Page 18: Description of measurement data

Table 2-6. The calculations of median and percentile of latent period of food poisoning among 164 cases

Latent period

( hours ) Cases ( f )Accumulative

frequency(Σf ) Accumulative

frequency ( %)

0 ~ 25 25 15.2

12 ~ 58 83 50.6

24 ~ 40 123 75.0

36 ~ 23 146 89.0

48 ~ 12 158 96.3

60 ~ 5 163 99.4

72 ~ 84 1 164 100.0

Page 19: Description of measurement data

Median calculation: from table 2-6, accumulative frequency 50% is within the group “12 ~”, L=12 , i=12, f=58, ΣfL=25, n=164

M=L+i/f(n/2-ΣfL)=12+12/58(164/2-25)=23.8 (hrs)

Percentile (Px) calculation:when P95, x=95 , accumulative frequency 95% is withi

n the group “48 ~”, L=48 , i=12 , fx=12, ΣfL=146, n=164

P95=48+12/12 ( 164×95%-146 ) =57.8 (hrs)

Page 20: Description of measurement data

Measures of SpreadMeasures of Spread

Tendency of dispersion (variation)

Prof. Yi-xiong Lei (1021305)

Page 21: Description of measurement data

3. Measures of Spread3. Measures of Spread

There are some features to describe the distri -bution of different data. Two common features we might be interested in are:

> What is the typical (average) value of a variable (what is its location)?

> How much variability is there in the data (how much does it spread out)?

Page 22: Description of measurement data

The common variations are the following:

 

Range, symbolized by R

Interval of quartile, symbolized by Q

Variations Variance, symbolized by 2, S2

Standard deviation, symbolized by , S

Coefficient of variation, symbolized by CV

 

Page 23: Description of measurement data

1) Range,1) Range, is suitable to all kinds of data is suitable to all kinds of data

but it is a poor measure of variability but it is a poor measure of variability

because it is based on only two extreme because it is based on only two extreme

observations.observations.

R = Xmax - Xmin

Page 24: Description of measurement data

2) Interval of quartile (Q), is the scale of variation, from the 25 percentile ( P25) to the 75 percentile( P75) .

Quartile is suitable to all kinds of data, especially for the data of skewness distribution, it’s application is better than range. Using the following formula to calculate the quartile (Q)

Px =L+ i / fx ( n.x% - ΣfL )

Q = Qu-QL= P75 - P25 = 36.0 -15.3=20.7 ( hrs )

The example above is

Page 25: Description of measurement data

3) 3) Variance (2, s2) and

Standard deviation (SD or S)

They are the important variability measures and suitable to data of normal distribution

∑( X— μ) 2 ∑(X— μ) 2

σ2 = σ= N N

∑( X— X) 2 ∑( X— X) 2

S2 = S = n — 1 n — 1

Page 26: Description of measurement data

The Formula for standard deviation

∑X2 — ( ∑X ) 2 / n Direct S = method √ n — 1

∑f X2 — ( ∑ f X) 2 / nWeighing S = method √ n — 1

Page 27: Description of measurement data

For example, 5 persons’ diastolic blood

pressure are: 162, 145, 178, 142, 186 (mmHg)

∑X2 - ( ∑X ) 2 / nn - 1√S =

∑X = 813

∑X2 = 133317

√ =133317 – (813)2/ 5

5 –1

= 19.49 mmHg

Page 28: Description of measurement data

RBCs Middle value (X) Frequency (f) f X f X 2

3.80~ 3.90 2 7.8 30.42 4.00~ 4.10 6 24.6 100.86 4.20~ 4.30 11 47.3 203.39 4.40~ 4.50 25 112.5 506.25 4.60~ 4.70 32 150.4 706.88 4.80~ 4.90 27 132.3 648.27 5.00~ 5.10 17 86.7 442.17 5.20~ 5.30 13 68.9 365.17 5.40~ 5.50 4 22.0 121.00 5.60~ 5.70 2 11.4 64.98 5.80~6.00 5.90 1 5.9 5.90

Total(∑) — 140 669.8 3224.20

∑fX2 - ( ∑fX ) 2 / nn - 1√ S = √ =

3224.20 – (669.8)2/n140 - 1 =0.38

Table2-7. Frequency Distribution of Red Blood Cells (1012/L) among 140 Normal Male Adults in Some District

Page 29: Description of measurement data

4) Coefficient of variation (CV), is that the standard deviation divided by mean and then it is comparable between different data.

If comparing the variability among two or more than two groups that their metric units are different or their means are obvious different values

you may calculate their CV

CV = s / x × 100%

Page 30: Description of measurement data

For example: Someone randomly measure the heights

(cm) and weighs (kg) of 110 health male students in the

age of 20 at a city in 2004, please compare the variability

between heights (cm) and weighs (kg)

For heights, knowing: = 172.73 (cm), S =4.09 (cm)

For weights, knowing: = 55.04 (kg), S =4.10 (kg)

For heights, CV = 4.09 / 172.73 ×100% = 2.37%

For weights, CV = 4.10 / 55.04 ×100% = 7.45%

Indicating the index of heights is more stable.

Page 31: Description of measurement data

5) Applications of standard deviation

(A) Showing the variability (spread) of observations;

(B) Describing the features of normal distribution of data when combined with mean;

(C) Estimating the medical reference range when combined with mean;

(D) Calculating the standard error of mean when combined with sample size (n).

Page 32: Description of measurement data
Page 33: Description of measurement data

Normal Distribution Normal Distribution

&& It’s application It’s application

Prof. Yi-xiong Lei (1021305)

Page 34: Description of measurement data

4. Normal Distribution and Application4. Normal Distribution and Application

(1) What means the normal distribution

Frequency (f )

125 129 133 137 141 145 149 153 157 161

Heights (cm)

f

Figure 2-1. Frequency distribution and its curve of heights among 120 health boys at the age of 12

Normal distribution curve

Page 35: Description of measurement data

F(X)

f(X)

-∞ +∞

The normal distribution is defined by the The normal distribution is defined by the function: f(X)function: f(X)

Page 36: Description of measurement data

(2) The attributes of normal distribution

A. The shape of curve likes a bell and it is symmetric.

B. The top of peak locates in center (mean, median) .

C. There are two parameter and , marking N (, ) .

D. There is a rule to estimate the area of distribution,

the area within the curve is 1 or 100% .

Parameter Parameter

Page 37: Description of measurement data

(3) The area rule of normal distribution

μ±1σ the area rule is 68.27%

μ±1.96σ the area rule is 95.00%

μ±2.58σ the area rule is 99.00%

If n>100 , μreplace by x , σreplace by s-

-2.58 -1.96 -1 +1 +1.96 +2.58

2.5%0.5%

Normal distribution curve

Page 38: Description of measurement data

(4) Standard normal distribution

Ifμ=0, σ=1, Nd SndIf u=(X-μ)/σ, u observed a snd N(0,1)

So, standard normal distribution means u-distribution

-∞ 0 U +∞

(u)

(u)

Page 39: Description of measurement data

-2.58 -1.96 -1 0 +1 +1.96 +2.58

Standard normal distribution

(5) The area rule of Snd

-1< u < +1, the area rule is 68.27%

-1.96 < u < +1.96, the area rule is 95.0%

-2.58 < u < +2.58, the area rule is 99.0%

Page 40: Description of measurement data

(6) Application of normal distribution

(A) Estimating frequency distribution

130 newborn’s weight: X-bar=3200g, s=350g

Please estimate the ratio of the low weight.

(The standard of low weight: X=2500g)

u=(x-µ)/ = (2500-3200)/350= -2

See table 2-11 in the book (p39) :

( -2 ) = 0.0228= 2.28% (the ratio of the low weight)

130 2.28% = 2.96 = 3 (person No. of of the low weight)

Page 41: Description of measurement data

(B) The estimation of a reference range: Reference range, meaning “normal range”, is the

value range of most normal individuals.

(The most means 80%, 90%, 95% or 99%)

Upper limit (95%)

Normal

Patient

False negative

False positive

Page 42: Description of measurement data

For example

Red blood cells (RBC): 3.5~ 5.0 (×1012/L)

White blood cells (WBC): 4~ 10 (×109/L)

Cholesterol in blood: 3.1~ 5.7 mmol /L

Lead in urine: < 0.08 mg /L

Two methods to estimate a reference range:

(A) Method of normal distribution

(B) Method of percentiles

Page 43: Description of measurement data

If the frequency distribution is close to the normal di

stribution, we may estimate the reference range accor

ding to the method of normal distribution or percentile

s.

x ± u s (two side 1- range)

x + u s or x - u s (one side 1- range) 

Table 2-8. Common u- value Reference range Two side One side 80 % 1.282 0.842 90 % 1.645 1.282 95 % 1.960 1.645 99 % 2.576 2.326

Page 44: Description of measurement data

For example: Someone randomly measure the heights

(cm) of 110 health male students in the age of 20 at a city

in 2004. For heights, =172.73 (cm), S=4.09 (cm), please

calculate the reference range of the height.

Calculating two side 95% reference range:

x ± u s x ±1.96·s

x ±1.96s = 172.73 ± 1.96 × 4.09

95% reference range of the students’ heights is

164.71 ~ 180.75 (cm)

Page 45: Description of measurement data

If the frequency distribution is skewed, we may

estimate the reference range by percentiles.

95% two side reference range: P2.5 ~ P97.5

95% one side range in upper limit: < P95

95% one side range in lower limit: > P5

(C) The control of data quality

χ± 3s

Page 46: Description of measurement data