8.1 graphing data - governors state university · 2009. 1. 5. · 8.1 graphing data in this...

96
8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually displaying large sets of data so that meaningful interpretations of the data can be made.

Upload: others

Post on 24-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

8.1 Graphing Data

In this chapter, we will study techniques for graphing data. We will see the importance of visually displaying large sets of data so that meaningful interpretations of the data can be made.

Page 2: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Bar GraphsBar graphs are used to represent data that can be classified into categories. The height of the bars represents the frequency of the category. For ease of reading, there is a space between each bar. The bar graph displayed below represents how consumers obtain their information for purchasing a new or used automobile. There are four categories (consumer guide, dealership, word of mouth and the internet. The graph illustrates that the category most used by consumers is the Consumer Guide.

0

50

100

150

200

Consumer Guide Dealership Word of Mouth Internet

Page 3: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Broken line graph: This graph is obtained from a bar graph by connecting the midpoints of the tops of consecutive bars with straight lines.

0

50

100

150

200

Consumer Guide Dealership Word of Mouth Internet

Series1

Page 4: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

A pie graph is used to show how a whole is divided among several categories. The amount of each category is expressed as a percentage of the whole. The percentage is multiplied by 360 to determine the number of degrees of the central angle in the pie graph.

Source of Information

52%

28%

12%

8%

Consumer Guide

Dealership

Word of Mouth

Internet

Page 5: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

A Frequency Distribution is used to organize a large set of numerical data into classes. A frequency table consists of 5-20 classes of equal width along the frequency of each class. Here is an example:

Rounds of golf played by golfers

Class: Frequency

This graph has seven classes.The notation [0,7) includes all numbers that are greater than or equal to zero and less than 7. The class with the highest frequency is the class[ 28, 35) with a class frequency of 23.

5[42,49)

14[35,42)

23[28,35)

21[21,28)

10[14,21)

2[7,14)

0[0,7)

Page 6: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

A relative frequency distribution is constructed by taking the frequency of each class and dividing that number by the total frequency to get a percentage. Then a new frequency distribution is constructed using the classes and their corresponding relative frequencies:

Relative Frequency Distribution The total number of observations is 75. The third column of percentages is found by dividing the numbers in the second column by 75 and expressing that result as a percentage.

75

6.67%5[42,49)18.67%14[35,42)30.67%23[28,35)28.00%21[21,28)13.33%10[14,21)2.67%2[7,14)0.00%0[0,7)

Page 7: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

A histogram is similar to a vertical bar graph with the exception that there are no spaces between the bars and the horizontal axis always consists of numerical values. We will represent the frequency distribution of the previous slides with a histogram:

The histogram shows a symmetric distribution with the most frequent classes in the middle between 21 and 35 rounds of golf.

Rounds of Golf

0102030

0 7 14 21 28 35 42More

Bin

Freq

uenc

y

Frequency

Page 8: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

A frequency polygon is constructed from a histogram by connecting the midpoints of each vertical bar with a line segment. This is also called a broken-line graph.

Frequency polygon

Rounds of Golf

0102030

0 7 14 21 28 35 42More

Bin

Freq

uenc

y

Frequency

Page 9: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

8.2 Measures of Central Tendency

In this section, we will study three measures of central tendency: the mean, the median and the mode. Each of these values determines the “center”or middle of a set of data.

Page 10: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Measures of Center

MeanMost commonSum of the numbers divided by number of numbersNotation:

Example: The salary of 5 employees in thousands) is: 14, 17, 21, 18, 15

Find the mean: Sum = (14 + 17+21+18+15)=85Divide 85 by 5 = 17. Thus, the average salary is 17,000 dollars.

==∑

1

n

ii

XX

n

Page 11: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

The Mean as Center of Gravity

We will represent each data value on a “teeter-totter”. The teeter-totter serves as number line.

You can think of each point's deviation from the mean as the influence the point exerts on the tilt of the teeter totter. Positive values push down on the right side; negative values push down on the left side. The farther a point is from the fulcrum, the more influence it has.

Note that the mean deviation of the scores from the mean is always zero. That is why the teeter totter is in balance when the fulcrum is at the mean. This makes the mean the center of gravity for all the data points.

Page 12: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Data balances at 17. Sum of the deviations from mean equals zero. (-3 + -2 + 0 + 1 + 4 = 0 ) .

14 15 17 18 21

-3 -2 -1 0 1 2 3 4

Page 13: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

To find the mean for grouped data, find the midpoint of each class by adding the lower class limit to the upper class limit and dividing by 2. For example (0 + 7)/2 = 3.5. Multiply the midpoint value by the frequency of the class. Find the sum of the products x and f. Divide this sum by the total frequency.

29.35333 =75

227.5545.5[42,49)

5391438.5[35,42)

724.52331.5[28,35)

514.52124.5[21,28)

1751017.5[14,21)

21210.5[7,14)

003.5[0,7)

x*ffrequencymidpoint class

=

=

=∑

∑1

1

n

i ii

n

ii

x fx

f

i

Page 14: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

MedianThe mean is not always the best measure of central tendency especially when the data has one or more “outliers” (numbers which are unusually large or unusually small and not representative of the data as a whole). Definition: median of a data set is the number that divides the bottom 50% of data from top 50% of data. To obtain median: arrange data in ascending orderDetermine the location of the median. This is done by adding one to n, the total number of scores and dividing this number by 2. Position of the median = +1

2n

Page 15: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Median example

Find the median of the following data set: 14, 17, 21, 18, 15 1. Arrange data in order: 14, 15, 17, 18, 212. Determine the location of the median:

(5+1)/2 = 3.

3. Count from the left until you reach the number in the third position (21) .

4. The value of the median is 21.

+12

n

Page 16: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Median example 2: This example illustrates the case when the number of observations is an even number. The value of the median in this case will not be one of the original pieces of data.

Determine median of data: 14, 15, 17, 19, 23, 25Data is arranged in order. Position of median of n data values is In this example, n = 6, so the position of the median is ( 6 + 1)/2 = 3.5. Take the average of the 3rd and 4th data value. (17+19)/2= 18. Thus, median is 18.

+12

n

Page 17: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Which is better? Median or Mean?

The yearly salaries of 5 employees of a small company are : 19, 23, 25, 26, and 57 (in thousands)

1. Find the mean salary (30)

2. Find the median salary (25)

3. Which measure is more appropriate and why?

4. The median is better since the mean is skewed (affected) by the outlier 57.

Page 18: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Properties of the mean

1. Mean takes into account all values 2. Mean is sensitive to extreme values (outliers)3. Mean is called a non-resistant measure of

central tendency since it is affected by extreme values . (the median is thus resistant)

4. Population mean=mean of all values of the population

5. Sample mean: mean of sample data6. Mean of a representative sample tends to best

estimate the mean of population (for repeated sampling)

Page 19: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Properties of the median

1. Not sensitive to extreme values; resistant measure of central tendency

2. Takes into account only the middle value of a data set or the average of the two middle values.

3. Should be used for data sets that have outliers, such as personal income, or prices of homes in

a city

Page 20: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Mode

Definition: most frequently occurring value in a data set. To obtain mode: 1) find the frequency of occurrence of each value and then note the value that has the greatest frequency. If the greatest frequency is 1, then the data set has no mode. If two values occur with the same greatest frequency, then we say the data set is bi-modal.

Page 21: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example of modeEx. 1: Find the mode of the following data set: 45, 47, 68, 70, 72, 72, 73, 75, 98, 100 Answer: The mode is 72.Ex. 2: The mode should be used to determine the greatest frequency of qualitative data:

Shorts are classified as small, medium, large, and extra large. A store has on hand 12 small, 15 medium, 17 large and 8 extra large pairs of shorts. Find the mode: Solution: The mode is large. This is the modal class (the class with the greatest frequency. It would not make sense to find the mean or median for nominal data.

Page 22: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

8.3 Measures of Dispersion

In this section, you will study measures of variability of data. In addition to being able to find measures of central tendency for data, it is also necessary to determine how “spread out” the data. Two measures of variability of data are the range and the standard deviation.

Page 23: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Measures of variation

Example 1. Data for 5 starting players from two basketball teams:A: 72 , 73, 76, 76, 78B: 67, 72, 76, 76, 84

Verify that the two teams have the same mean heights, the same median and the same mode.

Page 24: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Measures of Variation

Ex. 1 continued. To describe the difference in the two data sets, we use a descriptive measure that indicates the amount of spread , or dispersion, in a data set.

Range: difference between maximum and minimum values of the data set.

Page 25: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Measures of Variation

Range of team A: 78-72=6Range of team B: 84-67=17Advantage of range: 1) easy to computeDisadvantage: only two values are considered.

Page 26: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Unlike the range, the sample standard deviation takes into account all data values. The following procedure is used to findthe sample standard deviation:

1. Find mean of data : = =∑1

n

ix

n

72 73 76 76 78 755+ + + + =

Page 27: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Step 2: Find the deviation of each score from the mean

0

Note that the sum of the deviations = 0

78-75= 37876-75 = 17676-75 = 176

73–75 = -27372-75 = -372

x

− =∑ ( ) 0x x

−x x

Page 28: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

The sum of the deviations from mean will always be zero. This can be used as a check to determine if your calculations are correct.

Note that − =∑_

( ) 0x x

Page 29: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Step 3: Square each deviation from the mean. Find the sum of the squared deviations.

Height deviation squared deviation

72 -3 973 -2 476 1 176 1 178 3 9

= 24=

−∑ 2

1( )

n

ii

X X

Page 30: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Step 4: The sample variance is determined by dividing the sum of the squared deviations by (n-1) (the number of scores minus one)

Note that sum of squared deviations is 24

Sample variance is

=

=−

=−

∑2

2_

1( )

1

i

n

ix x

s n

=−24 65 1

Page 31: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

The four steps can be combined into one mathematical formula for the sample standard deviation. The sample standard deviation is the square root of the quotient of the sumof the squared deviations and (n-1)

_ 21( )

1i

n

ix x

s n ⊗=

−=

−∑

Sample Standard Deviation:

= 6

Page 32: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Four step procedure to calculate sample standard deviation:

1. Find the mean of the data

2. Set up a table which lists the data in the left hand column and the deviations from the mean in the next column.

3. In the third column from the left, square each deviation and then find the sum of the squares of the deviations.

4. Divide the sum of the squared deviations by (n-1) and then take the positive square root of the result.

Page 33: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Problem for students:

By hand: Find variance and standard deviation of data: 5, 8, 9, 7, 6Answer: Standard deviation is approximately 1.581 and the variance is the square of 1.581 = 2.496

Page 34: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Standard deviation of grouped data: 1. Find each class midpoint.2. Find the deviation of each value from

the mean 3. Each deviation is squared and then

multiplied by the class frequency. 4. Find the sum of these values and

divide the result by (n-1) (one less than the total number of observations).

=

− ⋅=

∑ 2

1( )

1

k

i ii

x x fs

n

Page 35: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Here is the frequency distribution of the number of rounds of golf played by a group of golfers. The class midpoints are in the second column. The mean is 29.35 . Third column represents the square of the difference between the class midpoint and the mean. The 5th column is the product of the frequency with values of the third column. The final result is highlighted in red

8.3757909429.353335191.38666775

227.51303.5742225260.714845.5[42,49)5391171.2611561483.6615138.5[35,42)

724.5105.9880889234.60817831.5[28,35)514.5494.65173332123.5548424.5[21,28)

1751405.01511110140.501517.5[14,21)21710.89635562355.448210.5[7,14)000668.39483.5[0,7)

squared

x*f(x-mean)^2*frequencyfrequencydata-meanmidpoint class

=

− ⋅=

∑ 2

1( )

1

k

i ii

x x fs

n

Page 36: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Interpreting the standard deviation

1. The more variation in a data set, the greater the standard deviation.

2. The larger the standard deviation, the more “spread” in the shape of the histogram representing the data.

3. Standard deviation is used for quality control in business and industry. If there is too much variation in the manufacturing of a certain product, the process is out of control and adjustments to the machinery must be made to insure more uniformity in the production process.

Page 37: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Three standard deviations rule

“ Almost all” the data will lie within 3 standard deviations of the mean Mathematically, nearly 100% of the data will fall in the interval determined by − +

_ _( 3 , 3 )x s x s

Page 38: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Empirical Rule

If a data set is “mound shaped” or “bell-shaped”, then: 1. approximately 68% of the data lies within one standard deviation of the mean

2. Approximately 95% data lies within 2 standard deviations of the mean.

3. About 99.7 % of the data falls within 3 standard deviations of the mean.

Page 39: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Yellow region is 68% of the total area. This includes all data within one standard deviation of the mean. Yellow region plus brown regions include 95% of the total area. This includes all data that are within two standard deviations from the mean.

Page 40: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example of Empirical Rule

The shape of the distribution of IQ scores is a mound shape with a mean of 100 and a standard deviation of 15.

A) What proportion of individuals have IQ’s ranging from 85 – 115 ? (about 68%)

B) between 70 and 130 ? (about 95%)

C) between 55 and 145? (about 99.7%)

Page 41: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Bernoulli Trialshttp://www.math.wichita.edu/history/topics/probability.html#bern-trials

Boy? Girl? Heads? Tails? Win? Lose? Do any of these sound familiar? When there is the possibility of only two outcomes occuring during any single event, it is called a Bernoulli Trial. JakobBernoulli, a profound mathematician of the late 1600s, from a family of mathematicians, spent 20 years of his life studying probability. During this study, he arrived at an equation that calculates probability in a Bernoulli Trial. His proofs are published in his 1713 book Ars Conjectandi (Art of Conjecturing).

Page 42: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Jacob Bernoulli:

Hofmann sums up Jacob Bernoulli's contributions as follows:-Bernoulli greatly advanced algebra, the infinitesimal calculus, the calculus of variations, mechanics, the theory of series, and the theory of probability. He was self-willed, obstinate, aggressive, vindictive, beset by feelings of inferiority, and yet firmly convinced of his own abilities. With these characteristics, he necessarily had to collide with his similarly disposed brother. He nevertheless exerted the most lasting influence on the latter.Bernoulli was one of the most significant promoters of the formal methods of higher analysis. Astuteness and elegance are seldom found in his method of presentation and expression, but there is a maximum of integrity

Page 43: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

What constitutes a Bernoulli Trial? http://www.math.wichita.edu/history/topics/probability.html#bern-trials

To be considered a Bernoulli trial, an experiment must meet each of three criteria:

There must be only 2 possible outcomes, such as: black or red, sweet or sour. One of these outcomes is called a success, and the other a failure. Successes and Failures are denoted as S and F, though the terms given do not mean one outcome is more desirable than the other.

Each outcome has a fixed probability of occurring; a success has the probability of p, and a failure has the probability of 1 - p. Each experiment and result are completely independent of all others.

Page 44: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Some examples of Bernoulli Trials http://en.wikipedia.org/wiki/Bernoulli_trial

Flipping a coin. In this context, obverse ("heads") denotes success and reverse ("tails") denotes failure. A fair coin has the probability of success 0.5 by definition. Rolling a die, where for example we designate a six as "success" and everything else as a "failure". In conducting a political opinion poll, choosing a voter at random to ascertain whether that voter will vote "yes" in an upcoming referendum. Call the birth of a baby of one sex "success" and of the other sex "failure." (Take your pick.)

Page 45: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Introduction to Binomial Probability

A manager of a department store has determined that there is a probability of 0.30 that a particular customer will buy at least one product from his store. If three customers walk in a store, find the probability that two of

three customers will buy at least one product. 1. Determine which two will buy at least one product.

The outcomes are b b b’ ( first two buy and third does not buy) or b b’ b , or b’ b b . There are three possible outcomes each consisting of two b’s along with one not b (b’). Considering “buy” as a success, the probability of success is 0.30. Each customer is independent of the others and there are two possible outcomes, success or failure (not buy) .

Page 46: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Introduction to Binomial probability

Since the trials are independent, we can use the probability rule for independence: p(A and B and C) = p(A)*p(B)*p(c) . For the outcome b b b’ , the probability of b b b’ is P(b b b’) = p(b)p(b)p(b’) = 0.30(0.30)(0.70) . For the other two outcomes, the probability will be the same. For example P(b b’ b) = 0.30 (0.70)(0.30) Since the order in which the customers buy or not buy is not important, we can use the formula for combinations to determine the number of subsets of size 2 that can be obtained from a set of 3 elements. This corresponds to the number of ways two “buying” customers can be selected from a set of three customers: C(3 , 2) = 3 For each of these three combinations, the probability is the same:

2 10.30 0.70i

Page 47: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Thus, we have the following formula to compute the probability that two out of three customers will buy at least one product :

This turns out to be 0.189. Using the results of this problem, we can generalize the result. Suppose you have n customers and you wish to calculate the probability that x out of the n customers will buy at least one product. Let p represent the probability that at least one customer will buy a product. Then (1-p) is the probability that a given customer will not buy the product.

⋅2 1(3, 2) 0.30 0.70C i

−= ⋅ −( ) ( , ) (1 )x n xp x C n x p p

Page 48: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Binomial Probability Formula

The binomial distribution gives the discrete probability distribution of obtaining exactly n successes out of N Bernoulli trials (where the result of each Bernoulli trial is true with probability p and false with probability 1-p ). The binomial distribution is therefore given by

(1)

(2)

where is a binomial coefficient. The plot on the next slide shows the distribution of n successes out of N = 20 trials .

Page 49: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Plot of Binomial probabilities with n = 20 trials, p = 0.5

Page 50: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

To find a binomial probability formula

Assumptions: 1. n identical trials 2. Two outcomes, success or failure, are possible for each trial3. Trials are independent4. probability of success , p, remains constant on each trialStep 1: Identify a successStep 2: Determine , p , the success probabilityStep 3: Determine, n , the number of trialsStep 4: The binomial probability formula for the number of successes, x , is

( ) (1 )x n xnP X x p px⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

−= = −

Page 51: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example

Studies show that 60 % of US families use physical aggression to resolve conflict. If 10 families are selected at random, find the probability that the number that use physical aggression to resolve conflict is:

exactly 5Between 5 and 7 , inclusiveover 80 % of those surveyedfewer than nine

Solution: P( x = 5) =

=0.201

5 (10 5)100.6 (1 0.6)

5−⎛ ⎞

⋅ −⎜ ⎟⎝ ⎠

Page 52: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example continued

Probability (between 5 and 7)

inclusive)=Prob(5) or prob(6) or prob(7) =

5 5 6 4 7 310 10 100.60 (0.40) (0.6) (0.4) (0.6) (0.4)

5 6 7⎛ ⎞ ⎛ ⎞ ⎛ ⎞

+ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

Page 53: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Mean of a Binomial distribution

Mean = np

To find the mean of a binomial distribution, multiply the number of trials, n, by the success probability of each trial

(Note: This formula can only be used for the binomial distribution and not for probability

distributions in general )

Page 54: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example

A large university has determined from past records that the probability that a student who registers for fall classes will have his or her schedule rejected due to overfilled classrooms, clerical error, etc.) is 0.25.

l Find the probability that in a sample of 19 students, exactly 8 will have his/her schedule

rejected.

Page 55: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example

Suppose 15% of major league baseball players are left-handed. In a sample of 12 major league baseball players, find the probability that :

a) none are left handed 0.14

(b) at most six are left handed . Find probability of 0,1,2,3,4,5,6 and then add the probabilities..1422 + .30122+.29236 + .17198+0.06828+0.01928+0.00397

Page 56: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Another example

A basketball player shoots 10 free throws. The probability of success on each shot is 0.90. Is this a binomial experiment? Why? 2) create the probability distribution of x, the number of shots made out of 10.

Use Excel to compute the probabilities and draw the histogram of the results.

Page 57: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Standard deviation of the binomial distribution

To find the standard deviation of the binomial distribution, multiply the number of trials by the success probability, p , and multiply result by ( 1-p), then take the square root or result

(1 )np pσ = −

Page 58: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Use Excel to Determine binomial probability distribution

1. Use Excel to create the binomial distribution of x, the number of heads that appear when 25 coins are tossed. In column 1, display values for x: 0, 1, 2, 3, … 25. In column 2, display P( X = x).

2. Create the histogram of the probability distribution of x. Note the shape of the histogram. (It should resemble a normal

distribution)

Page 59: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

8.5 Normal DistributionsWe have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join the tops of the rectangles with a smooth curve.

Real world data, such as IQ scores, weights of individuals, heights, test scores have histograms that have a symmetric bell shape. We call such distributions Normal distributions. This will be the focus of this section.

Page 60: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/De_Moivre.html

Three mathematicians contributed to the mathematical foundation for this curve. They are Abraham De Moivre, Pierre Laplace and Carl Frederick Gauss

De Moivre pioneered the development of analytic geometry and the theory of probability. He published The Doctrine of Chance in 1718. The definition of statistical independence appears in this book together with many problems with dice and other games. He also investigated mortality statistics and the foundation of the theory of annuities

DeMoivre

Page 61: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

LaplaceLaplace also systematized and elaborated probability theory in "Essai Philosophique sur les Probabilités" (Philosophical Essay on Probability, 1814). He was the first to publish the value of the Gaussian integral,

Page 62: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Bell shaped curves

Many frequency distributions have a symmetric, bell shaped histogram. For example, the frequency distribution of heights ofmales is symmetric about a mean of 69.5 inches. Example 2: IQ scores are symmetrically distributed about a mean of 100 and a standard deviation of 15 or 16. The frequency distribution of IQ scores is bell shaped. Example 3: SAT test scores have a bell shaped , symmetric distribution.

Page 63: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Graph of a generic normal distribution

0

0.1

0.2

0.3

0.4

0.5

-4 -2 0 2 4

Series1

Page 64: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Values on X axis represent the number of standard deviation units a particular data value is from the mean. Values on the y axis represent probabilities of the random variable x.

0

0.1

0.2

0.3

0.4

0.5

-4 -2 0 2 4

Series1

Page 65: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Area under the Normal Curve

1. Normal distribution : a smoothed out histogram 2. P( a < x < b) = Probability that the random variable x is between a and b is determined by the area under the normal curve between x = a and x = b .

Page 66: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Properties of Normal distributions

1. Symmetric about its mean,2. Approaches, but not touches, the horizontal axis as x gets very large ( or x gets very small)3. Almost all observations lie within 3 standard deviations from the mean.

µ

Page 67: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Area under normal curve

Example: A midwestern college has an enrollment of 3264 female students whose mean height is 64.4 inches and the standard deviation is 2.4 inches. By constructing a relative frequency distribution, with class boundaries of 56, 57, 58, … 74, we find that the frequency distribution resembles a bell shaped symmetrical distribution.

Page 68: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Heights of Females at a College(Relative frequency distribution with class width = 1 is smoothed out to form a normal, bell-shaped curve) ..

Page 69: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Normal curve areas

Key fact: For a normally distributed variable, the percentage of all possible observations that lie within any specified range equals the corresponding area under its associated normal curve expressed as a percentage. This holds true approximately for a variable that is approximately normally distributed.

Page 70: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

The area of the red portion of the graph is equal to the prob( 66 < x < 68); the probability that a female student chosen at random from the population of all students at the college has a height between 66 and 68 in.

Page 71: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Finding areas under a normal, bell-shaped curve

The problem with attempting to find the area under a normal curve between x = a and x = b ( and thus finding the probability that x is between a and b, P( a < x < b) is that calculus is needed. However, we can circumvent this problem by using results from calculus. Tables have been constructed to find areas under whatis called the standard normal curve. The standard normal curve will be discussed shortly. A normal curve is characterized by its mean and standard deviation. The scale for the x axis will be different for each normal curve. The shape of each normal curvewill differ since the shape is determined by the standard deviation; the greater the standard deviation, the “flatter” and more spread out the normal curve will be.

Page 72: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Standardizing a Normally Distributed Variable

To find percentage of scores that lie within a certain interval, we need to find the area under the normal curve between the desired x values. To do this, we need a table of areas for each normal curve. The problem is that there are infinitely many normal curves so that we would need infinitely many tables.

Page 73: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Non-standard normal curves

For example, the distribution of IQ scores is normal with mean =100 and standard deviation =16.

Ex. 2. The heights of females at a certain mid-western college is normally distributed with a mean of 64.4 inches and a standard deviation of 2.4 inches.Ex. 3. The probability distribution of x, the diameter of CD’s produced by a company, is normally distributed with a mean of 4 inches and a standard deviation of .03 inches. Thus, for these three examples we would need three separate tables giving the areas under the normal curve for each separatedistribution. Obviously, this poses a problem.

Page 74: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Standard normal curve

The way out of this problem is to standardize each normal curve which will transform individual normal distributions into one particular standardized distribution. To find P( a < x < b) for the non-standard normal curve, we can find

Thus P(a < x < b) = The variable z is called the standard normal variable.

( )a bP zµ µσ σ− −< <

( )a bP zµ µσ σ− −< <

Page 75: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Standard normal distribution

The standard normal distribution will have a mean of 0 and a standard deviation of 1. Values on the horizontal axis are called z values. Z will be defined shortly. Values on the y axis are probabilities and will be decimal numbers between 0 and 1, inclusive.

0

0.1

0.2

0.3

0.4

0.5

-4 -2 0 2 4

Series1

Page 76: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Standardized Normally Distributed Variable

The formula below for z can be used to standardize any normally distributed variable x. Z is referred to as the amount of standard deviations from the mean; A. S. D. M. = z. represent the mean and standard deviation of the distribution, respectively.

For example, if IQ scores are distributed normally with a mean of 100 and standard deviation of 16, the if x = IQ of an individual = 124, then

z =

xz µσ−=

124 100 1.516− =

,µ σ

Page 77: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Areas under the standard normal curve

Find the following probabilities: A) P( 0 < z < 1.2) =

Use table or TI 83 to find area. Answer: .3849

Page 78: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Areas under the Standard Normal Curve

Let z be the standard normal variable. Find the following probabilities: Be sure to sketch a normal curve and shade the appropriate area. If you use a TI 83, give the appropriate commands required to do the problem.

Page 79: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Examples

Probability( -1.3 < z<0) 1. Draw diagram 2. Shade appropriate area 3. Use table or calculator to find area.4. Answer: .4032

Page 80: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Examples (continued)

Probability (-1.25 < z < .89) = 1. Draw picture 2. Shade appropriate area3. Use table to find two different areas 4. Find the sum of the two percentages. 5. Answer: .7076

Page 81: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

More examples:

Probability ( z > .75) 1. Draw diagram 2. Shade appropriate area 3. Use table to find p(0<z<0.75) 4. Subtract this area from 0.5000. Answer: 0.2266

Page 82: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

More examples (continued)

probability(-1.13 < z < -.79) = 1. Draw diagram 2. Shade appropriate area 3. Use table to find p(0 < z < 1.13) 4. Use table to find p( 0 < z < 0.79) 5. Subtract the smaller percentage from the larger percentage. 6. Answer: 0.0855

Page 83: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Finding probabilities for non-standard normal curves.

P( a < x < b) is the same as

a bP zµ µσ σ

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

− −< <

Page 84: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example 1

IQ scores are normally distributed with a mean of 100 and a standard deviation of 16. Find the probability that a randomly chosen person has an IQ greater than 120. Step 1. Draw a normal curve and shade appropriate area. State probability: P( x > 120) , where x is IQ.

Page 85: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Example

Step 2. Convert x score to a standardized z score: Z = ( 120 – 100)/ 16 = 20/16 = 5/4 = 1.25 Probability = P( z > 1.25) Step 3. Draw standard normal curve and shade appropriate area. Step 4. Use table or TI 83To find area. Answer: .1056

( 120)x >

Page 86: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Areas under the Non-standard normal curbe

A traffic study at one point on an interstate highway shows that vehicle speeds are normally distributed with a mean of 61.3 mph and a standard deviation of 3.3 miles per hour. If a vehicle is randomly checked, find the probability that its speed is between 55 and 60 miles per hour.

Page 87: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Solution: 1. Draw diagram2. Shade appropriate area3. Use

5. Find 6. Answer: 0.3187

xz µσ−

=55 61.3 60 61.3

3.3 3.3p z− −⎛ ⎞< <⎜ ⎟⎝ ⎠

Page 88: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Non standard normal curve areas

If IQ scores are normally distributed with a mean of 100 and a standard deviation of 16, find the probability that a randomly chosen person will have an IQ greater than 84. Answer: approximately .84

Page 89: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

IQ scores example

If IQ scores are normally distributed with a mean of 100 and a standard deviation of 16, find the probability that a person’s IQ is between 85 and 95.

Page 90: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

1. Draw diagram 2. Shade appropriate area 3. standardize variable x using 4. Find

5. Answer: 0.2031

xz µσ−

=

1 2x xp zµ µσ σ− −⎛ ⎞< <⎜ ⎟

⎝ ⎠

Page 91: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Areas under non-standard normal curves

The lengths of a certain snake are normally distributed with a mean of 73 inches and a standard deviation of 6.5 inches. Find the following probabilities. Let x represent the length of a particular snake

P( 65<x<75) answer: 0 .5116

Page 92: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Mathematical Equation for bell-shaped curves

Carl Frederick Gauss, a mathematician, was probably the first to realize that certain data had bell-shaped distributions. He determined that the following equation could be used to describethese distributions:

Where are the mean and standard deviation of the data.

2

2( )21( )

2

x

f x eµσ

σπ

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

−−= ⋅

,µ σ

Page 93: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Using the Normal Curve to approximate binomial probabilities

Example: Binomial Distribution for n = 20 and p = 0.5( A coin is tossed 20 times and the probability of x = 0 , 1, 2, 3, …20 is calculated. Each vertical bar represents one outcome of x. )

We have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join the tops of the rectangles with a smooth curve.

If we wanted to find the probability that x (number of heads) is greater than 12, we would have to use the binomial probability formula and calculate P(x = 12) + P(x=13) + p(x=14) + … P(x=20) . The calculations would be very tedious to say the least.

Page 94: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Using the Normal curve to approximate binomial probabilities

We could, instead, treat the binomial distribution as a normal curve since its shape is pretty close to being a bell-shaped curve and then find the probability that x is greater than 12 using the procedure for finding areas under a normal curve.

Prob(x > 12) = P(x > 11.5) = total area in yellow

Page 95: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Because the normal curve is continuous and the binomial distribution is discrete ( x = 0 , 1 , 2, …20) we have to make what is called a correction for continuity. Since we want P(x > 12) we must include the rectangular area corresponding to x = 12 . The base of this rectangle starts at 11.5 and ends at 12. 5. Therefore, we must find P(x > 11.5)

The rectangle representing the prob(x = 12) extends from 11.5 to 12.5 on the horizontal axis.

Page 96: 8.1 Graphing Data - Governors State University · 2009. 1. 5. · 8.1 Graphing Data In this chapter, we will study techniques for graphing data. We will see the importance of visually

Solution:

Using the procedure for finding area under a non-standard normal curve we have the following result:

= 0.25−⎛ ⎞> = > =⎜ ⎟

⎝ ⎠

11.5 10( 11.5)2.24

p x p z