Download - Poisson statistics
Measures of Distribution Shape,Relative Location, and Detecting Outliers
Distribution Shape z-Scoresz-Scores Empirical RuleEmpirical Rule
Detecting OutliersDetecting Outliers
22
An important measure of the shape of a An important measure of the shape of a distribution is called distribution is called skewnessskewness..
The formula for the skewness of sample data The formula for the skewness of sample data isis
Skewness can be easily computed using Skewness can be easily computed using statistical software.statistical software.
3
)2)(1(Skewness
s
xx
nn
n i
3
)2)(1(Skewness
s
xx
nn
n i
Distribution Shape: SkewnessDistribution Shape: Skewness
33
Distribution Shape: SkewnessDistribution Shape: Skewness
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Skewness = Skewness = 0 0
• Skewness is zero.Skewness is zero.
• Mean and median are equal.Mean and median are equal.
Symmetric (not skewed)Symmetric (not skewed)
44
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Skewness = Skewness = .31 .31
• Skewness is negative.Skewness is negative.
• Mean will usually be less than the median.Mean will usually be less than the median.
Distribution Shape: SkewnessDistribution Shape: Skewness
Moderately Skewed LeftModerately Skewed Left
55
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Skewness = .31 Skewness = .31
• Skewness is positive.Skewness is positive.
• Mean will usually be more than the median.Mean will usually be more than the median.
Distribution Shape: SkewnessDistribution Shape: Skewness
Moderately Skewed RightModerately Skewed Right
66
Distribution Shape: SkewnessDistribution Shape: Skewness
Highly Skewed RightHighly Skewed RightR
ela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Skewness = 1.25 Skewness = 1.25
• Skewness is positive (often above 1.0).Skewness is positive (often above 1.0).
• Mean will usually be more than the median.Mean will usually be more than the median.
77
Seventy efficiency apartments were randomlySeventy efficiency apartments were randomly
sampled in a college town. The monthly rent sampled in a college town. The monthly rent pricesprices
for the apartments are listed below in for the apartments are listed below in ascending order. ascending order.
Distribution Shape: SkewnessDistribution Shape: Skewness
Example: Apartment RentsExample: Apartment Rents
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
88
Rela
tive F
req
uen
cyR
ela
tive F
req
uen
cy
.05.05
.10.10
.15.15
.20.20
.25.25
.30.30
.35.35
00
Skewness = .92 Skewness = .92
Distribution Shape: SkewnessDistribution Shape: Skewness
Example: Apartment RentsExample: Apartment Rents
99
The The z-scorez-score is often called the standardized value. is often called the standardized value. The The z-scorez-score is often called the standardized value. is often called the standardized value.
It denotes the number of standard deviations a dataIt denotes the number of standard deviations a data value value xxii is from the mean. is from the mean. It denotes the number of standard deviations a dataIt denotes the number of standard deviations a data value value xxii is from the mean. is from the mean.
z-Scoresz-Scores
zx xsii
zx xsii
Excel’s STANDARDIZE function can be used toExcel’s STANDARDIZE function can be used to compute the z-score.compute the z-score. Excel’s STANDARDIZE function can be used toExcel’s STANDARDIZE function can be used to compute the z-score.compute the z-score.
1010
z-Scoresz-Scores
A data value less than the sample mean will have aA data value less than the sample mean will have a z-score less than zero.z-score less than zero. A data value greater than the sample mean will haveA data value greater than the sample mean will have a z-score greater than zero.a z-score greater than zero. A data value equal to the sample mean will have aA data value equal to the sample mean will have a z-score of zero.z-score of zero.
An observation’s z-score is a measure of the relativeAn observation’s z-score is a measure of the relative location of the observation in a data set.location of the observation in a data set.
1111
• z-Score of Smallest Value (425)
425 490.80 1.20
54.74ix x
zs
425 490.80
1.2054.74
ix xz
s
Standardized Values for Apartment RentsStandardized Values for Apartment Rents-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Example: Apartment RentsExample: Apartment Rents
z-Scoresz-Scores
1212
Empirical RuleEmpirical Rule
When the data are believed to approximate aWhen the data are believed to approximate a bell-shaped distribution with moderate skew …bell-shaped distribution with moderate skew …
The empirical rule is based on the normalThe empirical rule is based on the normal distribution, which we will discuss later.distribution, which we will discuss later. The empirical rule is based on the normalThe empirical rule is based on the normal distribution, which we will discuss later.distribution, which we will discuss later.
The The empirical ruleempirical rule can be used to determine the can be used to determine the percentage of data values that must be within apercentage of data values that must be within a specified number of standard deviations of thespecified number of standard deviations of the mean.mean.
The The empirical ruleempirical rule can be used to determine the can be used to determine the percentage of data values that must be within apercentage of data values that must be within a specified number of standard deviations of thespecified number of standard deviations of the mean.mean.
1313
For data having a bell-shaped distribution, approximately
of the values are withinof the values are within of its mean.of its mean.
of the values are withinof the values are within of its mean.of its mean.
68.26%68.26%68.26%68.26%
+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation
of the values are within of the values are within of its mean.of its mean.
of the values are within of the values are within of its mean.of its mean.
95.44%95.44%95.44%95.44%
+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations
of the values are within of the values are within of its mean.of its mean. of the values are within of the values are within of its mean.of its mean.
99.72%99.72%99.72%99.72%
+/- 3 standard deviations+/- 3 standard deviations+/- 3 standard deviations+/- 3 standard deviations
Empirical RuleEmpirical Rule
1414
xx – – 33 – – 11
– – 22 + 1+ 1
+ 2+ 2 + 3+ 3
68.26%68.26%95.44%95.44%99.72%99.72%
Empirical RuleEmpirical Rule
1515
An An outlieroutlier is an unusually small or unusually large is an unusually small or unusually large value in a data set.value in a data set. A data value with a z-score less than -3 or greaterA data value with a z-score less than -3 or greater than +3 might be considered an outlier.than +3 might be considered an outlier.
It might be:It might be:• an incorrectly recorded data valuean incorrectly recorded data value• a data value that was incorrectly included in thea data value that was incorrectly included in the data setdata set• a data value that has occurred by chancea data value that has occurred by chance
Detecting OutliersDetecting Outliers
1616
• The most extreme z-scores are -1.20 and 2.27The most extreme z-scores are -1.20 and 2.27
• Using |Using |zz| | >> 3 as the criterion for an outlier, there 3 as the criterion for an outlier, there are no outliers in this data set.are no outliers in this data set.
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Standardized Values for Apartment RentsStandardized Values for Apartment Rents
Example: Apartment RentsExample: Apartment Rents
Detecting OutliersDetecting Outliers
1717
Exploratory Data Analysis
Exploratory data analysisExploratory data analysis is looking at methods is looking at methods toto summarize data.summarize data. Exploratory data analysisExploratory data analysis is looking at methods is looking at methods toto summarize data.summarize data.
For now we simply sort the data values into ascending For now we simply sort the data values into ascending order and identify the order and identify the five-number summaryfive-number summary and then and thenconstruct a construct a box plotbox plot..
For now we simply sort the data values into ascending For now we simply sort the data values into ascending order and identify the order and identify the five-number summaryfive-number summary and then and thenconstruct a construct a box plotbox plot..
1818
Five-Number Summary
1111 Smallest ValueSmallest Value Smallest ValueSmallest Value
First QuartileFirst Quartile First QuartileFirst Quartile
MedianMedian MedianMedian
Third QuartileThird Quartile Third QuartileThird Quartile
Largest ValueLargest Value Largest ValueLargest Value
2222
3333
4444
5555
1919
Five-Number Summary
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615
Lowest Value = 425Lowest Value = 425 First Quartile = 445First Quartile = 445
Median = 475Median = 475
Third Quartile = 525Third Quartile = 525Largest Value = 615Largest Value = 615
Example: Apartment RentsExample: Apartment Rents
2020
Box PlotBox Plot
A A box plotbox plot is a graphical summary of data that is is a graphical summary of data that is based on a five-number summary.based on a five-number summary. A A box plotbox plot is a graphical summary of data that is is a graphical summary of data that is based on a five-number summary.based on a five-number summary.
A key to the development of a box plot is theA key to the development of a box plot is the computation of the median and the quartiles computation of the median and the quartiles QQ11 and and QQ33..
A key to the development of a box plot is theA key to the development of a box plot is the computation of the median and the quartiles computation of the median and the quartiles QQ11 and and QQ33..
Box plots provide another way to identify outliers.Box plots provide another way to identify outliers. Box plots provide another way to identify outliers.Box plots provide another way to identify outliers.
2121
They also tell us whether the data are skewed.They also tell us whether the data are skewed. They also tell us whether the data are skewed.They also tell us whether the data are skewed.
400400
425425
450450
475475
500500
525525
550550
575575
600600
625625
• A box is drawn with its ends located at the first andA box is drawn with its ends located at the first and
third quartiles (Qthird quartiles (Q11 & Q & Q33).).
Box PlotBox Plot
• A vertical line is drawn in the box at the location ofA vertical line is drawn in the box at the location of the median (second quartile).the median (second quartile).
Q1 = 445Q1 = 445 Q3 = 525Q3 = 525
Q2 = 475Q2 = 475
Example: Apartment RentsExample: Apartment Rents
2222
Box PlotBox Plot
Limits are located (not drawn) using the Limits are located (not drawn) using the interquartile range (IQR = Qinterquartile range (IQR = Q33-Q-Q11): they are ): they are 1.5IQR below Q1.5IQR below Q11 and 1.5 IQR above Q and 1.5 IQR above Q33..
Data outside these limits are considered Data outside these limits are considered outliersoutliers..
The locations of each outlier is shown with the The locations of each outlier is shown with the
symbolsymbol * * ..
2323
Box PlotBox Plot
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
• The lower limit is located 1.5(IQR) below The lower limit is located 1.5(IQR) below QQ1.1.
• The upper limit is located 1.5(IQR) above The upper limit is located 1.5(IQR) above QQ3.3.
• There are no outliers (values less than 325 orThere are no outliers (values less than 325 or greater than 645) in the apartment rent data.greater than 645) in the apartment rent data.
Example: Apartment RentsExample: Apartment Rents
2424
• Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits.
400400
425425
450450
475475
500500
525525
550550
575575
600600
625625
Smallest valueSmallest valueinside limits = 425inside limits = 425
Largest valueLargest valueinside limits = 615inside limits = 615
Example: Apartment RentsExample: Apartment Rents
Box PlotBox Plot
2525
An excellent graphical technique for An excellent graphical technique for makingmaking
comparisons among two or more comparisons among two or more groups.groups.
Box PlotBox Plot
2626
Measures of Association Between Two Variables
Thus far we have examined numerical methods usedThus far we have examined numerical methods used to summarize the data for one variable at a time.to summarize the data for one variable at a time. Thus far we have examined numerical methods usedThus far we have examined numerical methods used to summarize the data for one variable at a time.to summarize the data for one variable at a time.
Often a manager or decision maker is interested inOften a manager or decision maker is interested in the the relationship between two variablesrelationship between two variables.. Often a manager or decision maker is interested inOften a manager or decision maker is interested in the the relationship between two variablesrelationship between two variables..
Two descriptive measures of the relationship Two descriptive measures of the relationship between two variables are between two variables are covariancecovariance and and correlationcorrelation coefficientcoefficient..
Two descriptive measures of the relationship Two descriptive measures of the relationship between two variables are between two variables are covariancecovariance and and correlationcorrelation coefficientcoefficient..
2727
Positive values indicate a positive relationship.Positive values indicate a positive relationship. Positive values indicate a positive relationship.Positive values indicate a positive relationship.
Negative values indicate a negative relationship.Negative values indicate a negative relationship. Negative values indicate a negative relationship.Negative values indicate a negative relationship.
The The covariancecovariance is a measure of the linear association is a measure of the linear association between two variables.between two variables. The The covariancecovariance is a measure of the linear association is a measure of the linear association between two variables.between two variables.
CovarianceCovariance
2828
CovarianceCovariance
The covariance is computed as follows:The covariance is computed as follows:
The covariance is computed as follows:The covariance is computed as follows:
forforsamplessamples
forforpopulationspopulations
1 1 n nxy
(x x)(y y) (x x)(y y)s
n 1
1 1 n n
xy
(x x)(y y) (x x)(y y)s
n 1
1 x 1 y N x N yxy
(x )(y ) (x )(y )
N
1 x 1 y N x N yxy
(x )(y ) (x )(y )
N
2929
Correlation CoefficientCorrelation Coefficient
There are also other types of associations not captured There are also other types of associations not captured by correlation.by correlation. There are also other types of associations not captured There are also other types of associations not captured by correlation.by correlation.
Correlation is a measure of linear association.Correlation is a measure of linear association. Correlation is a measure of linear association.Correlation is a measure of linear association.
3030
The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:
The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:
forforsamplessamples
forforpopulationspopulations
rs
s sxyxy
x yrs
s sxyxy
x y
xyxy
x y
xyxy
x y
Correlation CoefficientCorrelation Coefficient
3131
2 22 1 x N x
x
(x ) (x )
N
2 22 1 x N x
x
(x ) (x )
N
2 22 1 nx
(x x) (x x)s
n 1
2 2
2 1 nx
(x x) (x x)s
n 1
2 21 y N y2
y
(y ) (y )
N
2 21 y N y2
y
(y ) (y )
N
2 22 1 ny
(y y) (y y)s
n 1
2 2
2 1 ny
(y y) (y y)s
n 1
Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linear relationshiprelationship.. Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linear relationshiprelationship..
Values near -1 indicate a Values near -1 indicate a strong negative linearstrong negative linear relationshiprelationship. . Values near -1 indicate a Values near -1 indicate a strong negative linearstrong negative linear relationshiprelationship. .
The coefficient can take on values between -1 and +1.The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1.The coefficient can take on values between -1 and +1.
The closer the correlation is to zero, the weaker theThe closer the correlation is to zero, the weaker the relationship.relationship. The closer the correlation is to zero, the weaker theThe closer the correlation is to zero, the weaker the relationship.relationship.
Correlation CoefficientCorrelation Coefficient
3232
CorrelationCorrelation
A Positive Relationship: correlation close to 1
xx
yy
3333
CorrelationCorrelation
xx
yy
3434
A Negative Relationship: correlation close to -1
CorrelationCorrelation
No Apparent Relationship: Correlation near 0
xx
yy
3535
A golfer is interested in investigating theA golfer is interested in investigating the
relationship, if any, between driving distance relationship, if any, between driving distance and 18-hole score.and 18-hole score.
277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9
696971717070707071716969
Average DrivingAverage DrivingDistance (yds.)Distance (yds.)
AverageAverage18-Hole Score18-Hole Score
Covariance and Correlation CoefficientCovariance and Correlation Coefficient
Example: Golfing StudyExample: Golfing Study
3636
Covariance and Correlation CoefficientCovariance and Correlation Coefficient
277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9
696971717070707071716969
xx yy
10.6510.65 -7.45-7.45 2.152.15 0.050.05-11.35-11.35 5.955.95
-1.0-1.0 1.01.0 00 00 1.01.0-1.0-1.0
-10.65-10.65 -7.45-7.45 00 00-11.35-11.35 -5.95-5.95
( )ix x( )ix x ( )( )i ix x y y ( )( )i ix x y y ( )iy y( )iy y
AverageAverageStd. Dev.Std. Dev.
267.0267.0 70.070.0 -35.40-35.408.21928.2192.8944.8944
TotalTotal
Example: Golfing StudyExample: Golfing Study
3737
• Sample CovarianceSample Covariance
• Sample Correlation CoefficientSample Correlation Coefficient
Covariance and Correlation CoefficientCovariance and Correlation Coefficient
7.08 -.9631
(8.2192)(.8944)xy
xyx y
sr
s s
7.08
-.9631(8.2192)(.8944)
xyxy
x y
sr
s s
( )( ) 35.40 7.08
1 6 1i i
xy
x x y ys
n
( )( ) 35.40
7.081 6 1
i ixy
x x y ys
n
Example: Golfing StudyExample: Golfing Study
3838
So, increasing driving distance decreases score, and the relationis really strong.
A A random variablerandom variable is a numerical description of the is a numerical description of the outcome of an experiment.outcome of an experiment. A A random variablerandom variable is a numerical description of the is a numerical description of the outcome of an experiment.outcome of an experiment.
Random VariablesRandom Variables
A A discrete random variablediscrete random variable may assume either a may assume either a finite number of values or an infinite sequence offinite number of values or an infinite sequence of values.values.
A A discrete random variablediscrete random variable may assume either a may assume either a finite number of values or an infinite sequence offinite number of values or an infinite sequence of values.values.
A A continuous random variablecontinuous random variable may assume any may assume any numerical value in an interval or collection ofnumerical value in an interval or collection of intervals.intervals.
A A continuous random variablecontinuous random variable may assume any may assume any numerical value in an interval or collection ofnumerical value in an interval or collection of intervals.intervals.
3939
Random VariablesRandom Variables
QuestionQuestion Random Variable Random Variable xx TypeType
FamilyFamilysizesize
xx = Number of dependents = Number of dependents reported on tax returnreported on tax return
DiscreteDiscrete
Distance fromDistance fromhome to storehome to store
xx = Distance in miles from = Distance in miles from home to the store sitehome to the store site
ContinuousContinuous
Own dogOwn dogor cator cat
xx = 1 if own no pet; = 1 if own no pet; = 2 if own dog(s) only; = 2 if own dog(s) only; = 3 if own cat(s) only; = 3 if own cat(s) only; = 4 if own dog(s) and cat(s)= 4 if own dog(s) and cat(s)
DiscreteDiscrete
4040
The The probability distributionprobability distribution for a random variable for a random variable describes how probabilities are distributed overdescribes how probabilities are distributed over the values of the random variable.the values of the random variable.
The The probability distributionprobability distribution for a random variable for a random variable describes how probabilities are distributed overdescribes how probabilities are distributed over the values of the random variable.the values of the random variable.
Discrete Probability DistributionsDiscrete Probability Distributions
4141
The probability distribution is defined by aThe probability distribution is defined by a probability functionprobability function, denoted by , denoted by ff((xx), which provides), which provides the probability for each value of the random variable.the probability for each value of the random variable.
The probability distribution is defined by aThe probability distribution is defined by a probability functionprobability function, denoted by , denoted by ff((xx), which provides), which provides the probability for each value of the random variable.the probability for each value of the random variable.
The required conditions for a discrete probabilityThe required conditions for a discrete probability function are:function are: The required conditions for a discrete probabilityThe required conditions for a discrete probability function are:function are:
Discrete Probability DistributionsDiscrete Probability Distributions
ff((xx) ) >> 0 0 (probabilities are not negative)ff((xx) ) >> 0 0 (probabilities are not negative)
ff((xx) = 1 ) = 1 (sum of all probabilities =1)ff((xx) = 1 ) = 1 (sum of all probabilities =1)
Remember that any probability is a number between 0 and 1.
4242
Expected Value
The The expected valueexpected value, or mean, of a random variable, or mean, of a random variable is a measure of its central location.is a measure of its central location. The The expected valueexpected value, or mean, of a random variable, or mean, of a random variable is a measure of its central location.is a measure of its central location.
The expected value does not have to be a value theThe expected value does not have to be a value the random variable can assume.random variable can assume. The expected value does not have to be a value theThe expected value does not have to be a value the random variable can assume.random variable can assume.
EE((xx) = ) = = = xfxf((xx))EE((xx) = ) = = = xfxf((xx))
4343
Variance and Standard DeviationVariance and Standard Deviation
The The variancevariance summarizes the variability in the summarizes the variability in the values of a random variable.values of a random variable. The The variancevariance summarizes the variability in the summarizes the variability in the values of a random variable.values of a random variable.
Var(Var(xx) = ) = 22 = = ((xx - - ))22ff((xx))Var(Var(xx) = ) = 22 = = ((xx - - ))22ff((xx))
The The standard deviationstandard deviation, , , is defined as the positive, is defined as the positive square root of the variance.square root of the variance. The The standard deviationstandard deviation, , , is defined as the positive, is defined as the positive square root of the variance.square root of the variance.
4444
Four Properties of a Binomial Experiment
3. The probability of a success, denoted by 3. The probability of a success, denoted by pp, does, does not change from trial to trial.not change from trial to trial.3. The probability of a success, denoted by 3. The probability of a success, denoted by pp, does, does not change from trial to trial.not change from trial to trial.
4. The trials are independent.4. The trials are independent.4. The trials are independent.4. The trials are independent.
2. Two outcomes, 2. Two outcomes, successsuccess and and failurefailure, are possible, are possible on each trial.on each trial.2. Two outcomes, 2. Two outcomes, successsuccess and and failurefailure, are possible, are possible on each trial.on each trial.
1. The experiment consists of a sequence of 1. The experiment consists of a sequence of nn identical trials.identical trials.1. The experiment consists of a sequence of 1. The experiment consists of a sequence of nn identical trials.identical trials.
Binomial Probability DistributionBinomial Probability Distribution
4545
Our interest is in the Our interest is in the number of successesnumber of successes occurring in the occurring in the nn trials. trials. Our interest is in the Our interest is in the number of successesnumber of successes occurring in the occurring in the nn trials. trials.
Binomial Probability DistributionBinomial Probability Distribution
4646
where:where:
xx = the number of successes = the number of successes
pp = the probability of a success on one trial = the probability of a success on one trial
nn = the number of trials = the number of trials ff((xx) = the probability of ) = the probability of xx successes in successes in nn trials trials
( )( ) (1 ) n xxn
f x p px
( )( ) (1 ) n xxn
f x p px
Binomial Probability DistributionBinomial Probability Distribution
Binomial Probability FunctionBinomial Probability Function
1 2 3! =
( )! ! 1 2 3 ( ) 1 2 3
n nnx n x x n x x
= `n choose x’ = number of ways x people can be chosen out of n 4747
( )( ) (1 ) n xxn
f x p px
( )( ) (1 ) n xxn
f x p px
Binomial Probability DistributionBinomial Probability Distribution
Binomial Probability Binomial Probability FunctionFunction
Probability of a particularProbability of a particular sequence of trial outcomessequence of trial outcomes with x successes in with x successes in nn trials trials
Probability of a particularProbability of a particular sequence of trial outcomessequence of trial outcomes with x successes in with x successes in nn trials trials
Number of experimentalNumber of experimental outcomes providing exactlyoutcomes providing exactlyxx successes in successes in nn trials trials
Number of experimentalNumber of experimental outcomes providing exactlyoutcomes providing exactlyxx successes in successes in nn trials trials
These values are available in Table 5 of our textbook.
4848
Binomial Probability DistributionBinomial Probability Distribution
Example: IIT EntranceExample: IIT Entrance
It is known that about 10% of the examinees It is known that about 10% of the examinees taking the IIT entrance qualify.taking the IIT entrance qualify.
Choosing 3 examinees at random, what isChoosing 3 examinees at random, what is
the probability that exactly 1 of them will the probability that exactly 1 of them will qualify?qualify?
Thus, for any examinee chosen at random, Thus, for any examinee chosen at random, there is a probability of 0.1 that the person there is a probability of 0.1 that the person will qualify.will qualify.
4949
Binomial Probability DistributionBinomial Probability Distribution
f xn
x n xp px n x( )
!!( )!
( )( )
1f xn
x n xp px n x( )
!!( )!
( )( )
1
1 23!(1) (0.1) (0.9) 3(.1)(.81) .243
1!(3 1)!f
1 23!
(1) (0.1) (0.9) 3(.1)(.81) .2431!(3 1)!
f
pp = .10, = .10, nn = 3, = 3, xx = 1 = 1
Example: IIT EntranceExample: IIT EntranceUsing theUsing theprobabilityprobabilityfunctionfunction
Using theUsing theprobabilityprobabilityfunctionfunction
You can just check the binomial probability table in textbook forn= 3, p = 0.1, x = 1.
Or, in Excel, use ‘=BINOMDIST(1,3,0.1,FALSE)’
Just f(1) if FALSE,
f(0)+f(1) if TRUE 5050
(1 )np p (1 )np p
EE((xx) = ) = = = npnp
Var(Var(xx) = ) = 22 = = npnp(1 (1 pp))
Expected ValueExpected Value
VarianceVariance
Standard DeviationStandard Deviation
Binomial Probability DistributionBinomial Probability Distribution
5151
3(.1)(.9) .52 em ployees 3(.1)(.9) .52 em ployees
EE((xx) = ) = npnp = 3(.1) = .3 employees out of 3 = 3(.1) = .3 employees out of 3
Var(Var(xx) = ) = npnp(1 – (1 – pp) = 3(.1)(.9) = .27) = 3(.1)(.9) = .27
• Expected ValueExpected Value
• VarianceVariance
• Standard DeviationStandard Deviation
Example: Evans ElectronicsExample: Evans Electronics
Binomial Probability DistributionBinomial Probability Distribution
5252
A Poisson distributed random variable is oftenA Poisson distributed random variable is often useful in estimating the number of occurrencesuseful in estimating the number of occurrences over a over a specified interval of time or spacespecified interval of time or space
A Poisson distributed random variable is oftenA Poisson distributed random variable is often useful in estimating the number of occurrencesuseful in estimating the number of occurrences over a over a specified interval of time or spacespecified interval of time or space
It is a discrete random variable that may assumeIt is a discrete random variable that may assume an an infinite sequence of valuesinfinite sequence of values (x = 0, 1, 2, . . . ). (x = 0, 1, 2, . . . ). It is a discrete random variable that may assumeIt is a discrete random variable that may assume an an infinite sequence of valuesinfinite sequence of values (x = 0, 1, 2, . . . ). (x = 0, 1, 2, . . . ).
Poisson Probability DistributionPoisson Probability Distribution
5353
Examples of a Poisson distributed random variable:Examples of a Poisson distributed random variable: Examples of a Poisson distributed random variable:Examples of a Poisson distributed random variable:
the number of defects in 14 pages of a bookthe number of defects in 14 pages of a book the number of defects in 14 pages of a bookthe number of defects in 14 pages of a book
the number of customers arriving at the postthe number of customers arriving at the postoffice in one houroffice in one hour the number of customers arriving at the postthe number of customers arriving at the postoffice in one houroffice in one hour
Poisson Probability DistributionPoisson Probability Distribution
Bell Labs used the Poisson distribution to model theBell Labs used the Poisson distribution to model the arrival of phone calls.arrival of phone calls. Bell Labs used the Poisson distribution to model theBell Labs used the Poisson distribution to model the arrival of phone calls.arrival of phone calls.
5454
Poisson Probability DistributionPoisson Probability Distribution
Two Properties of a Poisson ExperimentTwo Properties of a Poisson Experiment
2.2. The occurrence or nonoccurrence in any timeThe occurrence or nonoccurrence in any time interval is independent of the occurrence orinterval is independent of the occurrence or nonoccurrence in any other time interval.nonoccurrence in any other time interval.
2.2. The occurrence or nonoccurrence in any timeThe occurrence or nonoccurrence in any time interval is independent of the occurrence orinterval is independent of the occurrence or nonoccurrence in any other time interval.nonoccurrence in any other time interval.
1.1. The probability of an occurrence is the sameThe probability of an occurrence is the same for any two time intervals of equal length.for any two time intervals of equal length.1.1. The probability of an occurrence is the sameThe probability of an occurrence is the same for any two time intervals of equal length.for any two time intervals of equal length.
5555
Poisson Probability Function
f xex
x( )
!
f x
ex
x( )
!
where:where:
xx = the number of occurrences in an interval = the number of occurrences in an interval
f(x) f(x) = the probability of = the probability of xx occurrences in an interval occurrences in an interval
= mean number of occurrences in an interval= mean number of occurrences in an interval
ee = 2.71828 = 2.71828
Poisson Probability DistributionPoisson Probability Distribution
These values are available in Table 7 of our textbook.
5656
Poisson Probability FunctionPoisson Probability Function
In practical applications, In practical applications, xx will eventually become will eventually becomelarge enough so that large enough so that ff((xx) is very small and negligible.) is very small and negligible.In practical applications, In practical applications, xx will eventually become will eventually becomelarge enough so that large enough so that ff((xx) is very small and negligible.) is very small and negligible.
Since there is no stated upper limit for the numberSince there is no stated upper limit for the numberof occurrences, the probability function of occurrences, the probability function ff((xx) is) isapplicable for values applicable for values xx = 0, 1, 2, … without limit. = 0, 1, 2, … without limit.
Since there is no stated upper limit for the numberSince there is no stated upper limit for the numberof occurrences, the probability function of occurrences, the probability function ff((xx) is) isapplicable for values applicable for values xx = 0, 1, 2, … without limit. = 0, 1, 2, … without limit.
Poisson Probability DistributionPoisson Probability Distribution
5757
Example: Mercy HospitalExample: Mercy Hospital
Patients arrive at the emergency room of Patients arrive at the emergency room of MercyMercy
Hospital at the average rate of 6 per hour onHospital at the average rate of 6 per hour on
weekend evenings.weekend evenings.What is the probability of 4 arrivals in 30 What is the probability of 4 arrivals in 30 minutesminutes
on a weekend evening?on a weekend evening?
Poisson Probability DistributionPoisson Probability Distribution
5858
Poisson Probability DistributionPoisson Probability Distribution
4 33 (2.71828)
(4) .168014!
f
4 33 (2.71828)
(4) .168014!
f
= 6/hour = 3/half-hour, = 6/hour = 3/half-hour, xx = 4 = 4
Example: Mercy HospitalExample: Mercy Hospital Using theUsing theprobabilityprobabilityfunctionfunction
Using theUsing theprobabilityprobabilityfunctionfunction
Or, simply check the table for Poisson probabilities in the bookfor μ = 3, x = 4.
In Excel, use ‘=POISSON(4,3,FALSE)’Just f(4) if FALSE, f(0)+f(1)+...+f(4)
if TRUE
5959
Poisson Probability DistributionPoisson Probability Distribution
Poisson Probabilities
0.00
0.05
0.10
0.15
0.20
0.25
0 1 2 3 4 5 6 7 8 9 10
Number of Arrivals in 30 Minutes
Pro
bab
ilit
y
actually, actually, the the sequencesequencecontinues:continues:11, 12, …11, 12, …
Example: Mercy HospitalExample: Mercy Hospital
6060
Poisson Probability DistributionPoisson Probability Distribution
A property of the Poisson distribution is thatA property of the Poisson distribution is thatthe mean and variance are equal.the mean and variance are equal.
A property of the Poisson distribution is thatA property of the Poisson distribution is thatthe mean and variance are equal.the mean and variance are equal.
= = 22
6161
Continuous Probability Distributions
A A continuous random variablecontinuous random variable can assume any can assume any value in an interval on the real line or in a value in an interval on the real line or in a collection of intervals.collection of intervals.
It is not possible to talk about the probability It is not possible to talk about the probability of the random variable assuming a particular of the random variable assuming a particular value.value. Instead, we talk about the probability of the Instead, we talk about the probability of the random variable assuming a value within a random variable assuming a value within a given interval.given interval.
6262
6363
We denote the ‘density function’ by f(x). AlsoWe denote the ‘density function’ by f(x). Also
2
( ) 0; ( ) 1
( ) ( )
( ) ( ) ( )
f x f x dx
E X xf x dx
Var X x E X f x dx
2
( ) 0; ( ) 1
( ) ( )
( ) ( ) ( )
f x f x dx
E X xf x dx
Var X x E X f x dx
Area as a Measure of ProbabilityArea as a Measure of Probability
The area under the graph of The area under the graph of ff((xx) and ) and probability are identical.probability are identical.
This is valid for all continuous random This is valid for all continuous random variables.variables. The probability that The probability that xx takes on a value takes on a value between some lower value between some lower value xx11 and some higher and some higher value value xx22 can be found by computing the area can be found by computing the area under the graph of under the graph of ff((xx) over the interval from ) over the interval from xx11 to to xx22. .
6464
The normal probability distribution is the most important distribution for describing a continuous random variable.
It is used in a wide variety of applicationsIt is used in a wide variety of applications
including:including:
• Heights of Heights of peoplepeople• Rainfall Rainfall amountsamounts
• Test scoresTest scores• Scientific Scientific measurementsmeasurements
Normal Probability DistributionNormal Probability Distribution
For a large number of similar variables that are For a large number of similar variables that are unrelated, sum and average are approximately unrelated, sum and average are approximately normal.normal.
6565
Normal Probability Density Function
2 2( ) / 21( )
2xf x e
2 2( ) / 21( )
2xf x e
= mean= mean
= standard deviation= standard deviation
= 3.14159= 3.14159
ee = 2.71828 = 2.71828
where:where:
Normal Probability DistributionNormal Probability Distribution
6666
The distribution is The distribution is symmetricsymmetric; its skewness; its skewness measure is zero.measure is zero. The distribution is The distribution is symmetricsymmetric; its skewness; its skewness measure is zero.measure is zero.
Normal Probability DistributionNormal Probability Distribution
CharacteristicsCharacteristics
xx
6767
The The highest pointhighest point on the normal curve is at the on the normal curve is at the meanmean, the middle point., the middle point. The The highest pointhighest point on the normal curve is at the on the normal curve is at the meanmean, the middle point., the middle point.
Normal Probability DistributionNormal Probability Distribution
CharacteristicsCharacteristics
xx
6868
Normal Probability DistributionNormal Probability Distribution
CharacteristicsCharacteristics
-10-10 00 2525
The mean can be any numerical value: negative,The mean can be any numerical value: negative, zero, or positive.zero, or positive. The mean can be any numerical value: negative,The mean can be any numerical value: negative, zero, or positive.zero, or positive.
xx
6969
Normal Probability DistributionNormal Probability Distribution
CharacteristicsCharacteristics
= 15= 15
= 25= 25
The standard deviation determines the width of theThe standard deviation determines the width of thecurve: larger values result in wider, flatter curves.curve: larger values result in wider, flatter curves.The standard deviation determines the width of theThe standard deviation determines the width of thecurve: larger values result in wider, flatter curves.curve: larger values result in wider, flatter curves.
xx
7070
Probabilities for the normal random variable areProbabilities for the normal random variable are given by given by areas under the curveareas under the curve. The total area. The total area under the curve is 1 (.5 to the left of the mean andunder the curve is 1 (.5 to the left of the mean and .5 to the right)..5 to the right).
Probabilities for the normal random variable areProbabilities for the normal random variable are given by given by areas under the curveareas under the curve. The total area. The total area under the curve is 1 (.5 to the left of the mean andunder the curve is 1 (.5 to the left of the mean and .5 to the right)..5 to the right).
Normal Probability DistributionNormal Probability Distribution
CharacteristicsCharacteristics
.5.5 .5.5
xx
7171
Normal Probability DistributionNormal Probability Distribution
Characteristics (basis for the empirical rule)Characteristics (basis for the empirical rule)
of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean. of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean.68.26%68.26%68.26%68.26%
+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation
of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean. of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean.95.44%95.44%95.44%95.44%
+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations
of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean. of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean.99.72%99.72%99.72%99.72%
+/- 3 standard deviations+/- 3 standard deviations+/- 3 standard deviations+/- 3 standard deviations
7272
Normal Probability DistributionNormal Probability Distribution
Characteristics (basis for the empirical rule)Characteristics (basis for the empirical rule)
xx – – 33 – – 11
– – 22 + 1+ 1
+ 2+ 2 + 3+ 3
68.26%68.26%95.44%95.44%99.72%99.72%
7373
A random variable having a normal distributionA random variable having a normal distribution with a mean of 0 and a standard deviation of 1 iswith a mean of 0 and a standard deviation of 1 is said to have a said to have a standard normal probabilitystandard normal probability distributiondistribution..
A random variable having a normal distributionA random variable having a normal distribution with a mean of 0 and a standard deviation of 1 iswith a mean of 0 and a standard deviation of 1 is said to have a said to have a standard normal probabilitystandard normal probability distributiondistribution..
CharacteristicsCharacteristics
Standard Normal Probability DistributionStandard Normal Probability Distribution
7474
00zz
The letter The letter z z is used to designate the standardis used to designate the standard normal random variable.normal random variable. The letter The letter z z is used to designate the standardis used to designate the standard normal random variable.normal random variable.
Standard Normal Probability DistributionStandard Normal Probability Distribution
CharacteristicsCharacteristics
7575
Converting to the Standard Normal Converting to the Standard Normal DistributionDistribution
Standard Normal Probability DistributionStandard Normal Probability Distribution
zx
zx
7676
The daily demand of the new ipad in a store The daily demand of the new ipad in a store seems to follow a normal distribution with an seems to follow a normal distribution with an average of 15 and a standard deviation of 6. average of 15 and a standard deviation of 6.
Example: DemandExample: Demand
The manager, who does not want to keep The manager, who does not want to keep more than 20 ipads in his store at a time, more than 20 ipads in his store at a time, would like to know the probability of a would like to know the probability of a stockout, i.e. that the demand in a day will stockout, i.e. that the demand in a day will exceed 20. exceed 20.
PP((xx > 20) = ? > 20) = ?
7777
zz = ( = (xx - - )/)/ = (20 - 15)/6= (20 - 15)/6 = .83= .83
zz = ( = (xx - - )/)/ = (20 - 15)/6= (20 - 15)/6 = .83= .83
Solving for the Stockout ProbabilitySolving for the Stockout Probability
Step 1: Convert Step 1: Convert xx to the standard normal distribution. to the standard normal distribution.Step 1: Convert Step 1: Convert xx to the standard normal distribution. to the standard normal distribution.
7878
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
Cumulative Probability Table for Standard Normal Cumulative Probability Table for Standard Normal DistributionDistribution
PP((zz << .83) .83)
These values are available in Table 1 of our textbook.
Step 2: Find the area under the standard normalStep 2: Find the area under the standard normal curve to the left of curve to the left of zz = .83. = .83.Step 2: Find the area under the standard normalStep 2: Find the area under the standard normal curve to the left of curve to the left of zz = .83. = .83.
7979
8080
In Excel, use ‘=NORMDIST(0.83,0,1,TRUE)’
Just f(0.83) if FALSE, area upto 0.83 if
TRUE
In fact, you can straightaway use ‘=NORMDIST(20,15,6,TRUE)’
P(X ≤ 20) with μ = 15,
σ = 6
PP((z z > .83) = 1 – > .83) = 1 – PP((zz << .83) .83) = 1- .7967= 1- .7967
= .2033= .2033
PP((z z > .83) = 1 – > .83) = 1 – PP((zz << .83) .83) = 1- .7967= 1- .7967
= .2033= .2033
Solving for the Stockout ProbabilitySolving for the Stockout Probability
Step 3: Compute the area under the standard normalStep 3: Compute the area under the standard normal curve to the right of curve to the right of zz = .83. = .83.Step 3: Compute the area under the standard normalStep 3: Compute the area under the standard normal curve to the right of curve to the right of zz = .83. = .83.
ProbabilityProbability of a of a stockoutstockout
PP((xx > > 20)20)
Standard Normal Probability DistributionStandard Normal Probability Distribution
8181
Solving for the Stockout ProbabilitySolving for the Stockout Probability
00 .83.83
Area = .7967Area = .7967Area = 1 - .7967Area = 1 - .7967
= .2033= .2033
zz
Standard Normal Probability DistributionStandard Normal Probability Distribution
These values are available in Table 1 of our textbook.
8282
Standard Normal Probability DistributionStandard Normal Probability Distribution
If the manager of wants the probability of a If the manager of wants the probability of a stockout during replenishment lead-time to stockout during replenishment lead-time to be no more than .05, what should the reorder be no more than .05, what should the reorder point be?point be?
------------------------------------------------------------------------------------------------------------------------------(Hint: Given a probability, we can use the (Hint: Given a probability, we can use the
standardstandard
normal table in an inverse fashion to find thenormal table in an inverse fashion to find the
corresponding corresponding zz value. Give it a try.) value. Give it a try.)
8383
End of Lecture
8484