the standard normal curve & its application in biomedical sciences
TRANSCRIPT
The Standard Normal Curve
and its applications
By : Dr. Abhishek Tiwari
Based on the Normal distribution Probability distribution of a continuous variable Most important probability distribution in statistical
inference NORMAL : statistical properties of a set of data Most biomedical variables follow this Its not a law Truth : many of these characteristics approx. follow it No variable is precisely normally distributed
Introduction
Can be used to model the distribution of variable of interest
Allows us to make useful probability statements Human stature & human intelligence PD powerful tool for summarizing , describing set of
data Conclusion about a population based on sample Relationship between values of a random variable &
probability of their occurrence Expressed as a graph or formulae
Introduction
Abraham de Moivre discovered the normal distribution in 1733
French
Quetelet noticed this in heights of army people.
Belgian
Gaussian distribution, after Carl Friedrich Gauss.
German
Marquis de Laplace proved the central limit theorem in 1810 , French
For large sample size the sampling distribution of the mean follows normal distribution
If sample studied is large enough normal distribution can be assumed for all practical purposes
The Normal Curve
.
The Normal Distribution
X
f(X)
µ
σ
Changing μ shifts the distribution left or right.
Changing σ increases or decreases the spread.
The normal curve is not a single curve but a family of curves, each of which is determined by its mean and standard deviation.
Mean : Measure of Central tendency Center or middle of data set around which
observations are lying Assuming : frequency in each class is uniformly
distributed and representable by mid point Mean for grouped data is given by where n = no of observations fi = frequency of each (ith) class interval xi = mid point of each class interval
Mean (µ)
Standard Deviation : Measure of Dispersion Average deviation of observations around the
mean Compactness or variation of data SD = root mean square deviation SD = variance = (xi – x )² where n = no of observations
= mean of the frequency distribution xi = mid point of each class interval
Standard Deviation σ
Standard Deviation : Measure of Dispersion Average deviation of observations around the
mean Compactness or variation of data SD = root mean square deviation SD = variance = f (xi – x )² where n = no of observations
= mean of the frequency distribution xi = mid point of each class interval
Standard Deviation σ
Properties Of Normal Curve Perfectly symmetrical about its mean µ has a so called ‘ bell-shaped’ form Unimodal & Unskewed The mean of a distribution is the midpoint of
the curve and mean = median = mode Two points of inflection The tails are asymptotic As no of observations n tend towards → ∞ And the Width of class interval → 0
The frequency polygon approaches a smooth curve
Properties Of Normal Curve
The “area under the curve” is measured in standard deviations from the mean
Total area under curve & x axis = 1 sq unit (based on probability)
Transformed to a standard curve for comparison
Proportion of the area under the curve is the relative frequency of the z-score
Mean = 0 and SD = 1 , unit normal distribution
Properties of the normal curve General relationships:±1 SD = about 68.26%
±2 SD = about 95.44%±3 SD = about 99.72%
-5 -4 -3 -2 -1 0 1 2 3 4 5
68.26%
95.44%
99.72%
Consider the distribution of a group of runners :
mean = 127.8
SD = 15.5
68-95-99.7 Rule
68% of the data
95% of the data
99.7% of the data
80 90 100 110 120 130 140 150 160 0
5
10
15
20
25
P e r c e n t
POUNDS
127.8 143.3112.3
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1± SD (15.5 kg) of the mean.
Weight(kg)
80 90 100 110 120 130 140 150 160 0
5
10
15
20
25
P e r c e n t
POUNDS
127.896.8
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.
158.8
Weight(kg)
80 90 100 110 120 130 140 150 160 0
5
10
15
20
25
P e r c e n t
POUNDS
127.881.3
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.
174.3
Weight(kg)
Standard Scores are expressed in standard deviation units
To compare variables measured on different scales. There are many kinds of Standard Scores. The most
common is the ‘z’ scores. How much the original score lies above or below the
mean of a normal curve All normal distributions can be converted into the
standard normal curve by subtracting the mean and dividing by the standard deviation
The Standard Normal Distribution (Z)
Z scores
What is a z-score?A z score is a raw score expressed in standard deviation units.
S
XXz
Here is the formula for a z score:
Comparing X and Z units
Z100
2.00200 X ( = 100, = 50)
( = 0, = 1)
What we need is a standardized normal curve which can be used for any normally distributed variable. Such a curve is called the Standard Normal Curve.
Application of Normal Curve Model
Using z scores to compare two raw scores from different distributions
Can determine relative frequency and probability Can determine percentile rank Can determine the proportion of scores between
the mean and a particular score Can determine the number of people within a
particular range of scores by multiplying the proportion by N
Using z scores to compare two raw scores from different distributions
You score 80/100 on a statistics test and your friend also scores 80/100 on their test in another section. Hey congratulations you friend says—we are both doing equally well in statistics. What do you need to know if the two scores are equivalent?
the mean?
What if the mean of both tests was 75?
You also need to know the standard deviation
What would you say about the two test scores if the S in your class was 5 and the S in your friends class is 10?
Calculating z scoresWhat is the z score for your test: raw score = 80; mean = 75, S = 5?
S
XXz
1
5
7580
z
What is the z score of your friend’s test: raw score = 80; mean = 75, S = 10?
S
XXz
5.
10
7580
z
Who do you think did better on their test? Why do you think this?
Area under curve
Procedure: To find areas, first compute Z scores. Substitute score of interest for Xi
Use sample mean for µ and sample standard deviation for S.
The formula changes a “raw” score (Xi) to a standardized score (Z).
S
XXz
Finding Probabilities
If a distribution has: = 13 s = 4
What is the probability of randomly selecting a score of 19 or more?Find the Z score.For Xi = 19, Z = 1.50.
Find area in Z table = 0.9332Probability is 1- 0.9332 = 0.0668 or 0.07
X
Areas under the curve can also be expressed as probabilities
In Class Example
After an exam, you learn that the mean for the class is 60, with a standard deviation of 10. Suppose your exam score is 70.
What is your Z-score? Where, relative to the mean, does your score lie? What is the probability associated with your score
(use Z table)?
To solve:
Available information: Xi = 70
= 60 S = 10
Formula: Z = (Xi – ) / S
= (70 – 60) /10
= +1.0
Your Z-score of +1.0 is exactly 1 s.d. above the mean (an area of 34.13% + 50%) You are at the 84.13
percentile.
-5 -4 -3 -2 -1 0 1 2 3 4 5
< Mean = 60
Area 34.13%> <Area 34.13%
< Z = +1.0
68.26%
Area 50%-------> <-------Area 50%
95.44%
99.72%
What if your score is 72?
Calculate your Z-score.
What percentage of students have a score below your score? Above?
How many students are in between you and mean
What percentile are you at?
Answer: Z = 1.2 , area = 0.8849 (from left side
upto z) The area beyond Z = 1 - 0.8849 =
0.1151(% of marks below = 88.49%)
(11.51% of marks are above yours) Area between mean and Z = 0.8849 -
0.50 = 0.3849 = 38 % Your mark is at the 88th percentile!
What if your mark is 55%?
Calculate your Z-score.
What percentage of students have a score below your score? Above?
What percentile are you at?
Answer:
Z = - 0.5
The area beyond Z = .3085(30.85% of the marks are below yours)
Students above your score 1 – 0.3085 = 0.6915 (% of marks above = 69.15%)
Your mark is only at the 31st percentile!
Another Question…
What if you want to know how much better or worse you did than someone else? Suppose you have 72% and your classmate has 55%?
How much better is your score?
Answer: Z for 72% = 1.2 or area = 0.3849 (0.8849 – 0.5 )
above mean
Z for 55% = -0.5 area 0.1915 below mean (table 0.3085)
1 – 0.3085 = 0.6915 0.6915 – 0.5 = 0.1915 Area between Z = 1.2 and Z = -.5 would be .3849
+ .1915 = .5764
Your mark is 57.64% better than your classmate’s mark with respect to the rest of the class.
Probability:
Let’s say your classmate won’t show you the mark….
How can you make an informed guess about what your neighbour’s mark might be?
What is the probability that your classmate has a mark between 60% (the mean) and 70% (1 s.d. above the mean)?
Answer:
Calculate Z for 70%......Z = 1.0
In looking at Z table, you see that the area between the mean and Z is .3413
There is a .34 probability (or 34% chance) that your classmate has a mark between 60% and 70%.
The probability of your classmate having a
mark between 60 and 70% is .34 :
-5 -4 -3 -2 -1 0 1 2 3 4 5
< Mean = 60
Area 34.13%> <Area 34.13%
< Z = +1.0 (70%)
68.26%
Area 50%-------> <------Area 50%
95.44%
99.72%
Mean cholesterol of a sample : 210 mg %, SD = 20mg% Cholesterol value is normally distributed in a sample of 1000.
Find the no of persons 1) > 210 2) > 260 3) < 250 4) between 210 and 230 .
Z1 = (210-210)/20 =0 area = 0.5 person = 1000*0.5 = 500Z2 = (260-210)/20 = 2.5 , area = 0.9938 1 – 0.9938 = 0.0062 Persons = 1000*0.0062= 6.2Z3 = (250-210)/20 = 2 , area = 0.9773 ,person = 1000 * 0.9773 = 977.2Z4 = (230-210)/20 = 1 , area = 0.3413 , person = 1000*0.3413 = 341.3
Medical problem
References :
1. Biostatistics ,7th edition By Wayne W. Daniel ,Wiley India Pvt. Ltd.
2. Medical Statistics ,By K R Sundaram ,BI Publications.
3. Methods in Biostatistics ,7th edition By B K Mahajan , Jaypee publication
4. Park’s Textbook of PSM , 22nd edition.5. Biostatistics ,2nd edition By K.V.Rao ,Jaypee
publications.6. Principles & practice of Biostatistics , 5th
edition ,by J.V.Dixit , Bhanot publishers.
49
Multiple Transformation of Data
Why z-scores?
Transforming scores in order to make comparisons, especially when using different scales
Gives information about the relative standing of a score in relation to the characteristics of the sample or population Location relative to mean measured in standard
deviations Relative frequency and percentileGives us information about the location of that score relative to the “average” deviation of all scores