8.1 graphing data - governors state university · 2009. 1. 5. · 8.1 graphing data in this...
TRANSCRIPT
8.1 Graphing Data
In this chapter, we will study techniques for graphing data. We will see the importance of visually displaying large sets of data so that meaningful interpretations of the data can be made.
Bar GraphsBar graphs are used to represent data that can be classified into categories. The height of the bars represents the frequency of the category. For ease of reading, there is a space between each bar. The bar graph displayed below represents how consumers obtain their information for purchasing a new or used automobile. There are four categories (consumer guide, dealership, word of mouth and the internet. The graph illustrates that the category most used by consumers is the Consumer Guide.
0
50
100
150
200
Consumer Guide Dealership Word of Mouth Internet
Broken line graph: This graph is obtained from a bar graph by connecting the midpoints of the tops of consecutive bars with straight lines.
0
50
100
150
200
Consumer Guide Dealership Word of Mouth Internet
Series1
A pie graph is used to show how a whole is divided among several categories. The amount of each category is expressed as a percentage of the whole. The percentage is multiplied by 360 to determine the number of degrees of the central angle in the pie graph.
Source of Information
52%
28%
12%
8%
Consumer Guide
Dealership
Word of Mouth
Internet
A Frequency Distribution is used to organize a large set of numerical data into classes. A frequency table consists of 5-20 classes of equal width along the frequency of each class. Here is an example:
Rounds of golf played by golfers
Class: Frequency
This graph has seven classes.The notation [0,7) includes all numbers that are greater than or equal to zero and less than 7. The class with the highest frequency is the class[ 28, 35) with a class frequency of 23.
5[42,49)
14[35,42)
23[28,35)
21[21,28)
10[14,21)
2[7,14)
0[0,7)
A relative frequency distribution is constructed by taking the frequency of each class and dividing that number by the total frequency to get a percentage. Then a new frequency distribution is constructed using the classes and their corresponding relative frequencies:
Relative Frequency Distribution The total number of observations is 75. The third column of percentages is found by dividing the numbers in the second column by 75 and expressing that result as a percentage.
75
6.67%5[42,49)18.67%14[35,42)30.67%23[28,35)28.00%21[21,28)13.33%10[14,21)2.67%2[7,14)0.00%0[0,7)
A histogram is similar to a vertical bar graph with the exception that there are no spaces between the bars and the horizontal axis always consists of numerical values. We will represent the frequency distribution of the previous slides with a histogram:
The histogram shows a symmetric distribution with the most frequent classes in the middle between 21 and 35 rounds of golf.
Rounds of Golf
0102030
0 7 14 21 28 35 42More
Bin
Freq
uenc
y
Frequency
A frequency polygon is constructed from a histogram by connecting the midpoints of each vertical bar with a line segment. This is also called a broken-line graph.
Frequency polygon
Rounds of Golf
0102030
0 7 14 21 28 35 42More
Bin
Freq
uenc
y
Frequency
8.2 Measures of Central Tendency
In this section, we will study three measures of central tendency: the mean, the median and the mode. Each of these values determines the “center”or middle of a set of data.
Measures of Center
MeanMost commonSum of the numbers divided by number of numbersNotation:
Example: The salary of 5 employees in thousands) is: 14, 17, 21, 18, 15
Find the mean: Sum = (14 + 17+21+18+15)=85Divide 85 by 5 = 17. Thus, the average salary is 17,000 dollars.
==∑
1
n
ii
XX
n
The Mean as Center of Gravity
We will represent each data value on a “teeter-totter”. The teeter-totter serves as number line.
You can think of each point's deviation from the mean as the influence the point exerts on the tilt of the teeter totter. Positive values push down on the right side; negative values push down on the left side. The farther a point is from the fulcrum, the more influence it has.
Note that the mean deviation of the scores from the mean is always zero. That is why the teeter totter is in balance when the fulcrum is at the mean. This makes the mean the center of gravity for all the data points.
Data balances at 17. Sum of the deviations from mean equals zero. (-3 + -2 + 0 + 1 + 4 = 0 ) .
14 15 17 18 21
-3 -2 -1 0 1 2 3 4
To find the mean for grouped data, find the midpoint of each class by adding the lower class limit to the upper class limit and dividing by 2. For example (0 + 7)/2 = 3.5. Multiply the midpoint value by the frequency of the class. Find the sum of the products x and f. Divide this sum by the total frequency.
29.35333 =75
227.5545.5[42,49)
5391438.5[35,42)
724.52331.5[28,35)
514.52124.5[21,28)
1751017.5[14,21)
21210.5[7,14)
003.5[0,7)
x*ffrequencymidpoint class
=
=
=∑
∑1
1
n
i ii
n
ii
x fx
f
i
MedianThe mean is not always the best measure of central tendency especially when the data has one or more “outliers” (numbers which are unusually large or unusually small and not representative of the data as a whole). Definition: median of a data set is the number that divides the bottom 50% of data from top 50% of data. To obtain median: arrange data in ascending orderDetermine the location of the median. This is done by adding one to n, the total number of scores and dividing this number by 2. Position of the median = +1
2n
Median example
Find the median of the following data set: 14, 17, 21, 18, 15 1. Arrange data in order: 14, 15, 17, 18, 212. Determine the location of the median:
(5+1)/2 = 3.
3. Count from the left until you reach the number in the third position (21) .
4. The value of the median is 21.
+12
n
Median example 2: This example illustrates the case when the number of observations is an even number. The value of the median in this case will not be one of the original pieces of data.
Determine median of data: 14, 15, 17, 19, 23, 25Data is arranged in order. Position of median of n data values is In this example, n = 6, so the position of the median is ( 6 + 1)/2 = 3.5. Take the average of the 3rd and 4th data value. (17+19)/2= 18. Thus, median is 18.
+12
n
Which is better? Median or Mean?
The yearly salaries of 5 employees of a small company are : 19, 23, 25, 26, and 57 (in thousands)
1. Find the mean salary (30)
2. Find the median salary (25)
3. Which measure is more appropriate and why?
4. The median is better since the mean is skewed (affected) by the outlier 57.
Properties of the mean
1. Mean takes into account all values 2. Mean is sensitive to extreme values (outliers)3. Mean is called a non-resistant measure of
central tendency since it is affected by extreme values . (the median is thus resistant)
4. Population mean=mean of all values of the population
5. Sample mean: mean of sample data6. Mean of a representative sample tends to best
estimate the mean of population (for repeated sampling)
Properties of the median
1. Not sensitive to extreme values; resistant measure of central tendency
2. Takes into account only the middle value of a data set or the average of the two middle values.
3. Should be used for data sets that have outliers, such as personal income, or prices of homes in
a city
Mode
Definition: most frequently occurring value in a data set. To obtain mode: 1) find the frequency of occurrence of each value and then note the value that has the greatest frequency. If the greatest frequency is 1, then the data set has no mode. If two values occur with the same greatest frequency, then we say the data set is bi-modal.
Example of modeEx. 1: Find the mode of the following data set: 45, 47, 68, 70, 72, 72, 73, 75, 98, 100 Answer: The mode is 72.Ex. 2: The mode should be used to determine the greatest frequency of qualitative data:
Shorts are classified as small, medium, large, and extra large. A store has on hand 12 small, 15 medium, 17 large and 8 extra large pairs of shorts. Find the mode: Solution: The mode is large. This is the modal class (the class with the greatest frequency. It would not make sense to find the mean or median for nominal data.
8.3 Measures of Dispersion
In this section, you will study measures of variability of data. In addition to being able to find measures of central tendency for data, it is also necessary to determine how “spread out” the data. Two measures of variability of data are the range and the standard deviation.
Measures of variation
Example 1. Data for 5 starting players from two basketball teams:A: 72 , 73, 76, 76, 78B: 67, 72, 76, 76, 84
Verify that the two teams have the same mean heights, the same median and the same mode.
Measures of Variation
Ex. 1 continued. To describe the difference in the two data sets, we use a descriptive measure that indicates the amount of spread , or dispersion, in a data set.
Range: difference between maximum and minimum values of the data set.
Measures of Variation
Range of team A: 78-72=6Range of team B: 84-67=17Advantage of range: 1) easy to computeDisadvantage: only two values are considered.
Unlike the range, the sample standard deviation takes into account all data values. The following procedure is used to findthe sample standard deviation:
1. Find mean of data : = =∑1
n
ix
n
72 73 76 76 78 755+ + + + =
Step 2: Find the deviation of each score from the mean
0
Note that the sum of the deviations = 0
78-75= 37876-75 = 17676-75 = 176
73–75 = -27372-75 = -372
x
− =∑ ( ) 0x x
−x x
The sum of the deviations from mean will always be zero. This can be used as a check to determine if your calculations are correct.
Note that − =∑_
( ) 0x x
Step 3: Square each deviation from the mean. Find the sum of the squared deviations.
Height deviation squared deviation
72 -3 973 -2 476 1 176 1 178 3 9
= 24=
−∑ 2
1( )
n
ii
X X
Step 4: The sample variance is determined by dividing the sum of the squared deviations by (n-1) (the number of scores minus one)
Note that sum of squared deviations is 24
Sample variance is
=
=−
=−
∑2
2_
1( )
1
i
n
ix x
s n
=−24 65 1
The four steps can be combined into one mathematical formula for the sample standard deviation. The sample standard deviation is the square root of the quotient of the sumof the squared deviations and (n-1)
_ 21( )
1i
n
ix x
s n ⊗=
−=
−∑
Sample Standard Deviation:
= 6
Four step procedure to calculate sample standard deviation:
1. Find the mean of the data
2. Set up a table which lists the data in the left hand column and the deviations from the mean in the next column.
3. In the third column from the left, square each deviation and then find the sum of the squares of the deviations.
4. Divide the sum of the squared deviations by (n-1) and then take the positive square root of the result.
Problem for students:
By hand: Find variance and standard deviation of data: 5, 8, 9, 7, 6Answer: Standard deviation is approximately 1.581 and the variance is the square of 1.581 = 2.496
Standard deviation of grouped data: 1. Find each class midpoint.2. Find the deviation of each value from
the mean 3. Each deviation is squared and then
multiplied by the class frequency. 4. Find the sum of these values and
divide the result by (n-1) (one less than the total number of observations).
=
− ⋅=
−
∑ 2
1( )
1
k
i ii
x x fs
n
Here is the frequency distribution of the number of rounds of golf played by a group of golfers. The class midpoints are in the second column. The mean is 29.35 . Third column represents the square of the difference between the class midpoint and the mean. The 5th column is the product of the frequency with values of the third column. The final result is highlighted in red
8.3757909429.353335191.38666775
227.51303.5742225260.714845.5[42,49)5391171.2611561483.6615138.5[35,42)
724.5105.9880889234.60817831.5[28,35)514.5494.65173332123.5548424.5[21,28)
1751405.01511110140.501517.5[14,21)21710.89635562355.448210.5[7,14)000668.39483.5[0,7)
squared
x*f(x-mean)^2*frequencyfrequencydata-meanmidpoint class
=
− ⋅=
−
∑ 2
1( )
1
k
i ii
x x fs
n
Interpreting the standard deviation
1. The more variation in a data set, the greater the standard deviation.
2. The larger the standard deviation, the more “spread” in the shape of the histogram representing the data.
3. Standard deviation is used for quality control in business and industry. If there is too much variation in the manufacturing of a certain product, the process is out of control and adjustments to the machinery must be made to insure more uniformity in the production process.
Three standard deviations rule
“ Almost all” the data will lie within 3 standard deviations of the mean Mathematically, nearly 100% of the data will fall in the interval determined by − +
_ _( 3 , 3 )x s x s
Empirical Rule
If a data set is “mound shaped” or “bell-shaped”, then: 1. approximately 68% of the data lies within one standard deviation of the mean
2. Approximately 95% data lies within 2 standard deviations of the mean.
3. About 99.7 % of the data falls within 3 standard deviations of the mean.
Yellow region is 68% of the total area. This includes all data within one standard deviation of the mean. Yellow region plus brown regions include 95% of the total area. This includes all data that are within two standard deviations from the mean.
Example of Empirical Rule
The shape of the distribution of IQ scores is a mound shape with a mean of 100 and a standard deviation of 15.
A) What proportion of individuals have IQ’s ranging from 85 – 115 ? (about 68%)
B) between 70 and 130 ? (about 95%)
C) between 55 and 145? (about 99.7%)
Bernoulli Trialshttp://www.math.wichita.edu/history/topics/probability.html#bern-trials
Boy? Girl? Heads? Tails? Win? Lose? Do any of these sound familiar? When there is the possibility of only two outcomes occuring during any single event, it is called a Bernoulli Trial. JakobBernoulli, a profound mathematician of the late 1600s, from a family of mathematicians, spent 20 years of his life studying probability. During this study, he arrived at an equation that calculates probability in a Bernoulli Trial. His proofs are published in his 1713 book Ars Conjectandi (Art of Conjecturing).
Jacob Bernoulli:
Hofmann sums up Jacob Bernoulli's contributions as follows:-Bernoulli greatly advanced algebra, the infinitesimal calculus, the calculus of variations, mechanics, the theory of series, and the theory of probability. He was self-willed, obstinate, aggressive, vindictive, beset by feelings of inferiority, and yet firmly convinced of his own abilities. With these characteristics, he necessarily had to collide with his similarly disposed brother. He nevertheless exerted the most lasting influence on the latter.Bernoulli was one of the most significant promoters of the formal methods of higher analysis. Astuteness and elegance are seldom found in his method of presentation and expression, but there is a maximum of integrity
What constitutes a Bernoulli Trial? http://www.math.wichita.edu/history/topics/probability.html#bern-trials
To be considered a Bernoulli trial, an experiment must meet each of three criteria:
There must be only 2 possible outcomes, such as: black or red, sweet or sour. One of these outcomes is called a success, and the other a failure. Successes and Failures are denoted as S and F, though the terms given do not mean one outcome is more desirable than the other.
Each outcome has a fixed probability of occurring; a success has the probability of p, and a failure has the probability of 1 - p. Each experiment and result are completely independent of all others.
Some examples of Bernoulli Trials http://en.wikipedia.org/wiki/Bernoulli_trial
Flipping a coin. In this context, obverse ("heads") denotes success and reverse ("tails") denotes failure. A fair coin has the probability of success 0.5 by definition. Rolling a die, where for example we designate a six as "success" and everything else as a "failure". In conducting a political opinion poll, choosing a voter at random to ascertain whether that voter will vote "yes" in an upcoming referendum. Call the birth of a baby of one sex "success" and of the other sex "failure." (Take your pick.)
Introduction to Binomial Probability
A manager of a department store has determined that there is a probability of 0.30 that a particular customer will buy at least one product from his store. If three customers walk in a store, find the probability that two of
three customers will buy at least one product. 1. Determine which two will buy at least one product.
The outcomes are b b b’ ( first two buy and third does not buy) or b b’ b , or b’ b b . There are three possible outcomes each consisting of two b’s along with one not b (b’). Considering “buy” as a success, the probability of success is 0.30. Each customer is independent of the others and there are two possible outcomes, success or failure (not buy) .
Introduction to Binomial probability
Since the trials are independent, we can use the probability rule for independence: p(A and B and C) = p(A)*p(B)*p(c) . For the outcome b b b’ , the probability of b b b’ is P(b b b’) = p(b)p(b)p(b’) = 0.30(0.30)(0.70) . For the other two outcomes, the probability will be the same. For example P(b b’ b) = 0.30 (0.70)(0.30) Since the order in which the customers buy or not buy is not important, we can use the formula for combinations to determine the number of subsets of size 2 that can be obtained from a set of 3 elements. This corresponds to the number of ways two “buying” customers can be selected from a set of three customers: C(3 , 2) = 3 For each of these three combinations, the probability is the same:
2 10.30 0.70i
Thus, we have the following formula to compute the probability that two out of three customers will buy at least one product :
This turns out to be 0.189. Using the results of this problem, we can generalize the result. Suppose you have n customers and you wish to calculate the probability that x out of the n customers will buy at least one product. Let p represent the probability that at least one customer will buy a product. Then (1-p) is the probability that a given customer will not buy the product.
⋅2 1(3, 2) 0.30 0.70C i
−= ⋅ −( ) ( , ) (1 )x n xp x C n x p p
Binomial Probability Formula
The binomial distribution gives the discrete probability distribution of obtaining exactly n successes out of N Bernoulli trials (where the result of each Bernoulli trial is true with probability p and false with probability 1-p ). The binomial distribution is therefore given by
(1)
(2)
where is a binomial coefficient. The plot on the next slide shows the distribution of n successes out of N = 20 trials .
Plot of Binomial probabilities with n = 20 trials, p = 0.5
To find a binomial probability formula
Assumptions: 1. n identical trials 2. Two outcomes, success or failure, are possible for each trial3. Trials are independent4. probability of success , p, remains constant on each trialStep 1: Identify a successStep 2: Determine , p , the success probabilityStep 3: Determine, n , the number of trialsStep 4: The binomial probability formula for the number of successes, x , is
( ) (1 )x n xnP X x p px⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠
−= = −
Example
Studies show that 60 % of US families use physical aggression to resolve conflict. If 10 families are selected at random, find the probability that the number that use physical aggression to resolve conflict is:
exactly 5Between 5 and 7 , inclusiveover 80 % of those surveyedfewer than nine
Solution: P( x = 5) =
=0.201
5 (10 5)100.6 (1 0.6)
5−⎛ ⎞
⋅ −⎜ ⎟⎝ ⎠
Example continued
Probability (between 5 and 7)
inclusive)=Prob(5) or prob(6) or prob(7) =
5 5 6 4 7 310 10 100.60 (0.40) (0.6) (0.4) (0.6) (0.4)
5 6 7⎛ ⎞ ⎛ ⎞ ⎛ ⎞
+ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠
Mean of a Binomial distribution
Mean = np
To find the mean of a binomial distribution, multiply the number of trials, n, by the success probability of each trial
(Note: This formula can only be used for the binomial distribution and not for probability
distributions in general )
Example
A large university has determined from past records that the probability that a student who registers for fall classes will have his or her schedule rejected due to overfilled classrooms, clerical error, etc.) is 0.25.
l Find the probability that in a sample of 19 students, exactly 8 will have his/her schedule
rejected.
Example
Suppose 15% of major league baseball players are left-handed. In a sample of 12 major league baseball players, find the probability that :
a) none are left handed 0.14
(b) at most six are left handed . Find probability of 0,1,2,3,4,5,6 and then add the probabilities..1422 + .30122+.29236 + .17198+0.06828+0.01928+0.00397
Another example
A basketball player shoots 10 free throws. The probability of success on each shot is 0.90. Is this a binomial experiment? Why? 2) create the probability distribution of x, the number of shots made out of 10.
Use Excel to compute the probabilities and draw the histogram of the results.
Standard deviation of the binomial distribution
To find the standard deviation of the binomial distribution, multiply the number of trials by the success probability, p , and multiply result by ( 1-p), then take the square root or result
(1 )np pσ = −
Use Excel to Determine binomial probability distribution
1. Use Excel to create the binomial distribution of x, the number of heads that appear when 25 coins are tossed. In column 1, display values for x: 0, 1, 2, 3, … 25. In column 2, display P( X = x).
2. Create the histogram of the probability distribution of x. Note the shape of the histogram. (It should resemble a normal
distribution)
8.5 Normal DistributionsWe have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join the tops of the rectangles with a smooth curve.
Real world data, such as IQ scores, weights of individuals, heights, test scores have histograms that have a symmetric bell shape. We call such distributions Normal distributions. This will be the focus of this section.
http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/De_Moivre.html
Three mathematicians contributed to the mathematical foundation for this curve. They are Abraham De Moivre, Pierre Laplace and Carl Frederick Gauss
De Moivre pioneered the development of analytic geometry and the theory of probability. He published The Doctrine of Chance in 1718. The definition of statistical independence appears in this book together with many problems with dice and other games. He also investigated mortality statistics and the foundation of the theory of annuities
DeMoivre
LaplaceLaplace also systematized and elaborated probability theory in "Essai Philosophique sur les Probabilités" (Philosophical Essay on Probability, 1814). He was the first to publish the value of the Gaussian integral,
Bell shaped curves
Many frequency distributions have a symmetric, bell shaped histogram. For example, the frequency distribution of heights ofmales is symmetric about a mean of 69.5 inches. Example 2: IQ scores are symmetrically distributed about a mean of 100 and a standard deviation of 15 or 16. The frequency distribution of IQ scores is bell shaped. Example 3: SAT test scores have a bell shaped , symmetric distribution.
Graph of a generic normal distribution
0
0.1
0.2
0.3
0.4
0.5
-4 -2 0 2 4
Series1
Values on X axis represent the number of standard deviation units a particular data value is from the mean. Values on the y axis represent probabilities of the random variable x.
0
0.1
0.2
0.3
0.4
0.5
-4 -2 0 2 4
Series1
Area under the Normal Curve
1. Normal distribution : a smoothed out histogram 2. P( a < x < b) = Probability that the random variable x is between a and b is determined by the area under the normal curve between x = a and x = b .
Properties of Normal distributions
1. Symmetric about its mean,2. Approaches, but not touches, the horizontal axis as x gets very large ( or x gets very small)3. Almost all observations lie within 3 standard deviations from the mean.
µ
Area under normal curve
Example: A midwestern college has an enrollment of 3264 female students whose mean height is 64.4 inches and the standard deviation is 2.4 inches. By constructing a relative frequency distribution, with class boundaries of 56, 57, 58, … 74, we find that the frequency distribution resembles a bell shaped symmetrical distribution.
Heights of Females at a College(Relative frequency distribution with class width = 1 is smoothed out to form a normal, bell-shaped curve) ..
Normal curve areas
Key fact: For a normally distributed variable, the percentage of all possible observations that lie within any specified range equals the corresponding area under its associated normal curve expressed as a percentage. This holds true approximately for a variable that is approximately normally distributed.
The area of the red portion of the graph is equal to the prob( 66 < x < 68); the probability that a female student chosen at random from the population of all students at the college has a height between 66 and 68 in.
Finding areas under a normal, bell-shaped curve
The problem with attempting to find the area under a normal curve between x = a and x = b ( and thus finding the probability that x is between a and b, P( a < x < b) is that calculus is needed. However, we can circumvent this problem by using results from calculus. Tables have been constructed to find areas under whatis called the standard normal curve. The standard normal curve will be discussed shortly. A normal curve is characterized by its mean and standard deviation. The scale for the x axis will be different for each normal curve. The shape of each normal curvewill differ since the shape is determined by the standard deviation; the greater the standard deviation, the “flatter” and more spread out the normal curve will be.
Standardizing a Normally Distributed Variable
To find percentage of scores that lie within a certain interval, we need to find the area under the normal curve between the desired x values. To do this, we need a table of areas for each normal curve. The problem is that there are infinitely many normal curves so that we would need infinitely many tables.
Non-standard normal curves
For example, the distribution of IQ scores is normal with mean =100 and standard deviation =16.
Ex. 2. The heights of females at a certain mid-western college is normally distributed with a mean of 64.4 inches and a standard deviation of 2.4 inches.Ex. 3. The probability distribution of x, the diameter of CD’s produced by a company, is normally distributed with a mean of 4 inches and a standard deviation of .03 inches. Thus, for these three examples we would need three separate tables giving the areas under the normal curve for each separatedistribution. Obviously, this poses a problem.
Standard normal curve
The way out of this problem is to standardize each normal curve which will transform individual normal distributions into one particular standardized distribution. To find P( a < x < b) for the non-standard normal curve, we can find
Thus P(a < x < b) = The variable z is called the standard normal variable.
( )a bP zµ µσ σ− −< <
( )a bP zµ µσ σ− −< <
Standard normal distribution
The standard normal distribution will have a mean of 0 and a standard deviation of 1. Values on the horizontal axis are called z values. Z will be defined shortly. Values on the y axis are probabilities and will be decimal numbers between 0 and 1, inclusive.
0
0.1
0.2
0.3
0.4
0.5
-4 -2 0 2 4
Series1
Standardized Normally Distributed Variable
The formula below for z can be used to standardize any normally distributed variable x. Z is referred to as the amount of standard deviations from the mean; A. S. D. M. = z. represent the mean and standard deviation of the distribution, respectively.
For example, if IQ scores are distributed normally with a mean of 100 and standard deviation of 16, the if x = IQ of an individual = 124, then
z =
xz µσ−=
124 100 1.516− =
,µ σ
Areas under the standard normal curve
Find the following probabilities: A) P( 0 < z < 1.2) =
Use table or TI 83 to find area. Answer: .3849
Areas under the Standard Normal Curve
Let z be the standard normal variable. Find the following probabilities: Be sure to sketch a normal curve and shade the appropriate area. If you use a TI 83, give the appropriate commands required to do the problem.
Examples
Probability( -1.3 < z<0) 1. Draw diagram 2. Shade appropriate area 3. Use table or calculator to find area.4. Answer: .4032
Examples (continued)
Probability (-1.25 < z < .89) = 1. Draw picture 2. Shade appropriate area3. Use table to find two different areas 4. Find the sum of the two percentages. 5. Answer: .7076
More examples:
Probability ( z > .75) 1. Draw diagram 2. Shade appropriate area 3. Use table to find p(0<z<0.75) 4. Subtract this area from 0.5000. Answer: 0.2266
More examples (continued)
probability(-1.13 < z < -.79) = 1. Draw diagram 2. Shade appropriate area 3. Use table to find p(0 < z < 1.13) 4. Use table to find p( 0 < z < 0.79) 5. Subtract the smaller percentage from the larger percentage. 6. Answer: 0.0855
Finding probabilities for non-standard normal curves.
P( a < x < b) is the same as
a bP zµ µσ σ
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
− −< <
Example 1
IQ scores are normally distributed with a mean of 100 and a standard deviation of 16. Find the probability that a randomly chosen person has an IQ greater than 120. Step 1. Draw a normal curve and shade appropriate area. State probability: P( x > 120) , where x is IQ.
Example
Step 2. Convert x score to a standardized z score: Z = ( 120 – 100)/ 16 = 20/16 = 5/4 = 1.25 Probability = P( z > 1.25) Step 3. Draw standard normal curve and shade appropriate area. Step 4. Use table or TI 83To find area. Answer: .1056
( 120)x >
Areas under the Non-standard normal curbe
A traffic study at one point on an interstate highway shows that vehicle speeds are normally distributed with a mean of 61.3 mph and a standard deviation of 3.3 miles per hour. If a vehicle is randomly checked, find the probability that its speed is between 55 and 60 miles per hour.
Solution: 1. Draw diagram2. Shade appropriate area3. Use
5. Find 6. Answer: 0.3187
xz µσ−
=55 61.3 60 61.3
3.3 3.3p z− −⎛ ⎞< <⎜ ⎟⎝ ⎠
Non standard normal curve areas
If IQ scores are normally distributed with a mean of 100 and a standard deviation of 16, find the probability that a randomly chosen person will have an IQ greater than 84. Answer: approximately .84
IQ scores example
If IQ scores are normally distributed with a mean of 100 and a standard deviation of 16, find the probability that a person’s IQ is between 85 and 95.
1. Draw diagram 2. Shade appropriate area 3. standardize variable x using 4. Find
5. Answer: 0.2031
xz µσ−
=
1 2x xp zµ µσ σ− −⎛ ⎞< <⎜ ⎟
⎝ ⎠
Areas under non-standard normal curves
The lengths of a certain snake are normally distributed with a mean of 73 inches and a standard deviation of 6.5 inches. Find the following probabilities. Let x represent the length of a particular snake
P( 65<x<75) answer: 0 .5116
Mathematical Equation for bell-shaped curves
Carl Frederick Gauss, a mathematician, was probably the first to realize that certain data had bell-shaped distributions. He determined that the following equation could be used to describethese distributions:
Where are the mean and standard deviation of the data.
2
2( )21( )
2
x
f x eµσ
σπ
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
−−= ⋅
,µ σ
Using the Normal Curve to approximate binomial probabilities
Example: Binomial Distribution for n = 20 and p = 0.5( A coin is tossed 20 times and the probability of x = 0 , 1, 2, 3, …20 is calculated. Each vertical bar represents one outcome of x. )
We have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join the tops of the rectangles with a smooth curve.
If we wanted to find the probability that x (number of heads) is greater than 12, we would have to use the binomial probability formula and calculate P(x = 12) + P(x=13) + p(x=14) + … P(x=20) . The calculations would be very tedious to say the least.
Using the Normal curve to approximate binomial probabilities
We could, instead, treat the binomial distribution as a normal curve since its shape is pretty close to being a bell-shaped curve and then find the probability that x is greater than 12 using the procedure for finding areas under a normal curve.
Prob(x > 12) = P(x > 11.5) = total area in yellow
Because the normal curve is continuous and the binomial distribution is discrete ( x = 0 , 1 , 2, …20) we have to make what is called a correction for continuity. Since we want P(x > 12) we must include the rectangular area corresponding to x = 12 . The base of this rectangle starts at 11.5 and ends at 12. 5. Therefore, we must find P(x > 11.5)
The rectangle representing the prob(x = 12) extends from 11.5 to 12.5 on the horizontal axis.
Solution:
Using the procedure for finding area under a non-standard normal curve we have the following result:
= 0.25−⎛ ⎞> = > =⎜ ⎟
⎝ ⎠
11.5 10( 11.5)2.24
p x p z