poisson statistics

Measures of Distribution Shape,Relative Location, and Detecting Outliers

Distribution Shape z-Scoresz-Scores Empirical RuleEmpirical Rule

Detecting OutliersDetecting Outliers

22

An important measure of the shape of a An important measure of the shape of a distribution is called distribution is called skewnessskewness..

The formula for the skewness of sample data The formula for the skewness of sample data isis

Skewness can be easily computed using Skewness can be easily computed using statistical software.statistical software.

3

)2)(1(Skewness

s

xx

nn

n i

3

)2)(1(Skewness

s

xx

nn

n i

Distribution Shape: SkewnessDistribution Shape: Skewness

33


Rela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = Skewness = 0 0

• Skewness is zero.Skewness is zero.

• Mean and median are equal.Mean and median are equal.

Symmetric (not skewed)Symmetric (not skewed)

44

Rela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = Skewness = .31 .31

• Skewness is negative.Skewness is negative.

• Mean will usually be less than the median.Mean will usually be less than the median.


Moderately Skewed LeftModerately Skewed Left

55

Rela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = .31 Skewness = .31

• Skewness is positive.Skewness is positive.

• Mean will usually be more than the median.Mean will usually be more than the median.


Moderately Skewed RightModerately Skewed Right

66


Highly Skewed RightHighly Skewed RightR

ela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = 1.25 Skewness = 1.25

• Skewness is positive (often above 1.0).Skewness is positive (often above 1.0).

• Mean will usually be more than the median.Mean will usually be more than the median.

77

Seventy efficiency apartments were randomlySeventy efficiency apartments were randomly

sampled in a college town. The monthly rent sampled in a college town. The monthly rent pricesprices

for the apartments are listed below in for the apartments are listed below in ascending order. ascending order.


Example: Apartment RentsExample: Apartment Rents

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

88

Rela

tive F

req

uen

cyR

ela

tive F

req

uen

cy

.05.05

.10.10

.15.15

.20.20

.25.25

.30.30

.35.35

00

Skewness = .92 Skewness = .92



99

The The z-scorez-score is often called the standardized value. is often called the standardized value. The The z-scorez-score is often called the standardized value. is often called the standardized value.

It denotes the number of standard deviations a dataIt denotes the number of standard deviations a data value value xxii is from the mean. is from the mean. It denotes the number of standard deviations a dataIt denotes the number of standard deviations a data value value xxii is from the mean. is from the mean.

z-Scoresz-Scores

zx xsii

zx xsii

Excel’s STANDARDIZE function can be used toExcel’s STANDARDIZE function can be used to compute the z-score.compute the z-score. Excel’s STANDARDIZE function can be used toExcel’s STANDARDIZE function can be used to compute the z-score.compute the z-score.

1010

z-Scoresz-Scores

A data value less than the sample mean will have aA data value less than the sample mean will have a z-score less than zero.z-score less than zero. A data value greater than the sample mean will haveA data value greater than the sample mean will have a z-score greater than zero.a z-score greater than zero. A data value equal to the sample mean will have aA data value equal to the sample mean will have a z-score of zero.z-score of zero.

An observation’s z-score is a measure of the relativeAn observation’s z-score is a measure of the relative location of the observation in a data set.location of the observation in a data set.

1111

• z-Score of Smallest Value (425)

425 490.80 1.20

54.74ix x

zs

425 490.80

1.2054.74

ix xz

s

Standardized Values for Apartment RentsStandardized Values for Apartment Rents-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27


z-Scoresz-Scores

1212

Empirical RuleEmpirical Rule

When the data are believed to approximate aWhen the data are believed to approximate a bell-shaped distribution with moderate skew …bell-shaped distribution with moderate skew …

The empirical rule is based on the normalThe empirical rule is based on the normal distribution, which we will discuss later.distribution, which we will discuss later. The empirical rule is based on the normalThe empirical rule is based on the normal distribution, which we will discuss later.distribution, which we will discuss later.

The The empirical ruleempirical rule can be used to determine the can be used to determine the percentage of data values that must be within apercentage of data values that must be within a specified number of standard deviations of thespecified number of standard deviations of the mean.mean.

The The empirical ruleempirical rule can be used to determine the can be used to determine the percentage of data values that must be within apercentage of data values that must be within a specified number of standard deviations of thespecified number of standard deviations of the mean.mean.

1313

For data having a bell-shaped distribution, approximately

of the values are withinof the values are within of its mean.of its mean.

of the values are withinof the values are within of its mean.of its mean.

68.26%68.26%68.26%68.26%

+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation

of the values are within of the values are within of its mean.of its mean.

of the values are within of the values are within of its mean.of its mean.

95.44%95.44%95.44%95.44%

+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations+/- 2 standard deviations

of the values are within of the values are within of its mean.of its mean. of the values are within of the values are within of its mean.of its mean.

99.72%99.72%99.72%99.72%



1414

xx – – 33 – – 11

– – 22 + 1+ 1

+ 2+ 2 + 3+ 3

68.26%68.26%95.44%95.44%99.72%99.72%


1515

An An outlieroutlier is an unusually small or unusually large is an unusually small or unusually large value in a data set.value in a data set. A data value with a z-score less than -3 or greaterA data value with a z-score less than -3 or greater than +3 might be considered an outlier.than +3 might be considered an outlier.

It might be:It might be:• an incorrectly recorded data valuean incorrectly recorded data value• a data value that was incorrectly included in thea data value that was incorrectly included in the data setdata set• a data value that has occurred by chancea data value that has occurred by chance


1616

• The most extreme z-scores are -1.20 and 2.27The most extreme z-scores are -1.20 and 2.27

• Using |Using |zz| | >> 3 as the criterion for an outlier, there 3 as the criterion for an outlier, there are no outliers in this data set.are no outliers in this data set.

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Standardized Values for Apartment RentsStandardized Values for Apartment Rents



1717

Exploratory Data Analysis

Exploratory data analysisExploratory data analysis is looking at methods is looking at methods toto summarize data.summarize data. Exploratory data analysisExploratory data analysis is looking at methods is looking at methods toto summarize data.summarize data.

For now we simply sort the data values into ascending For now we simply sort the data values into ascending order and identify the order and identify the five-number summaryfive-number summary and then and thenconstruct a construct a box plotbox plot..

For now we simply sort the data values into ascending For now we simply sort the data values into ascending order and identify the order and identify the five-number summaryfive-number summary and then and thenconstruct a construct a box plotbox plot..

1818

Five-Number Summary

1111 Smallest ValueSmallest Value Smallest ValueSmallest Value

First QuartileFirst Quartile First QuartileFirst Quartile

MedianMedian MedianMedian

Third QuartileThird Quartile Third QuartileThird Quartile

Largest ValueLargest Value Largest ValueLargest Value

2222

3333

4444

5555

1919

Five-Number Summary

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Lowest Value = 425Lowest Value = 425 First Quartile = 445First Quartile = 445

Median = 475Median = 475

Third Quartile = 525Third Quartile = 525Largest Value = 615Largest Value = 615


2020

Box PlotBox Plot

A A box plotbox plot is a graphical summary of data that is is a graphical summary of data that is based on a five-number summary.based on a five-number summary. A A box plotbox plot is a graphical summary of data that is is a graphical summary of data that is based on a five-number summary.based on a five-number summary.

A key to the development of a box plot is theA key to the development of a box plot is the computation of the median and the quartiles computation of the median and the quartiles QQ11 and and QQ33..

A key to the development of a box plot is theA key to the development of a box plot is the computation of the median and the quartiles computation of the median and the quartiles QQ11 and and QQ33..

Box plots provide another way to identify outliers.Box plots provide another way to identify outliers. Box plots provide another way to identify outliers.Box plots provide another way to identify outliers.

2121

They also tell us whether the data are skewed.They also tell us whether the data are skewed. They also tell us whether the data are skewed.They also tell us whether the data are skewed.

400400

425425

450450

475475

500500

525525

550550

575575

600600

625625

• A box is drawn with its ends located at the first andA box is drawn with its ends located at the first and

third quartiles (Qthird quartiles (Q11 & Q & Q33).).

Box PlotBox Plot

• A vertical line is drawn in the box at the location ofA vertical line is drawn in the box at the location of the median (second quartile).the median (second quartile).

Q1 = 445Q1 = 445 Q3 = 525Q3 = 525

Q2 = 475Q2 = 475


2222

Box PlotBox Plot

Limits are located (not drawn) using the Limits are located (not drawn) using the interquartile range (IQR = Qinterquartile range (IQR = Q33-Q-Q11): they are ): they are 1.5IQR below Q1.5IQR below Q11 and 1.5 IQR above Q and 1.5 IQR above Q33..

Data outside these limits are considered Data outside these limits are considered outliersoutliers..

The locations of each outlier is shown with the The locations of each outlier is shown with the

symbolsymbol * * ..

2323

Box PlotBox Plot

Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• The lower limit is located 1.5(IQR) below The lower limit is located 1.5(IQR) below QQ1.1.

• The upper limit is located 1.5(IQR) above The upper limit is located 1.5(IQR) above QQ3.3.

• There are no outliers (values less than 325 orThere are no outliers (values less than 325 or greater than 645) in the apartment rent data.greater than 645) in the apartment rent data.


2424

• Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits.

400400

425425

450450

475475

500500

525525

550550

575575

600600

625625

Smallest valueSmallest valueinside limits = 425inside limits = 425

Largest valueLargest valueinside limits = 615inside limits = 615


Box PlotBox Plot

2525

An excellent graphical technique for An excellent graphical technique for makingmaking

comparisons among two or more comparisons among two or more groups.groups.

Box PlotBox Plot

2626

Measures of Association Between Two Variables

Thus far we have examined numerical methods usedThus far we have examined numerical methods used to summarize the data for one variable at a time.to summarize the data for one variable at a time. Thus far we have examined numerical methods usedThus far we have examined numerical methods used to summarize the data for one variable at a time.to summarize the data for one variable at a time.

Often a manager or decision maker is interested inOften a manager or decision maker is interested in the the relationship between two variablesrelationship between two variables.. Often a manager or decision maker is interested inOften a manager or decision maker is interested in the the relationship between two variablesrelationship between two variables..

Two descriptive measures of the relationship Two descriptive measures of the relationship between two variables are between two variables are covariancecovariance and and correlationcorrelation coefficientcoefficient..

Two descriptive measures of the relationship Two descriptive measures of the relationship between two variables are between two variables are covariancecovariance and and correlationcorrelation coefficientcoefficient..

2727

Positive values indicate a positive relationship.Positive values indicate a positive relationship. Positive values indicate a positive relationship.Positive values indicate a positive relationship.

Negative values indicate a negative relationship.Negative values indicate a negative relationship. Negative values indicate a negative relationship.Negative values indicate a negative relationship.

The The covariancecovariance is a measure of the linear association is a measure of the linear association between two variables.between two variables. The The covariancecovariance is a measure of the linear association is a measure of the linear association between two variables.between two variables.

CovarianceCovariance

2828

CovarianceCovariance

The covariance is computed as follows:The covariance is computed as follows:

The covariance is computed as follows:The covariance is computed as follows:

forforsamplessamples

forforpopulationspopulations

1 1 n nxy

(x x)(y y) (x x)(y y)s

n 1

1 1 n n

xy

(x x)(y y) (x x)(y y)s

n 1

1 x 1 y N x N yxy

(x )(y ) (x )(y )

N

1 x 1 y N x N yxy

(x )(y ) (x )(y )

N

2929

Correlation CoefficientCorrelation Coefficient

There are also other types of associations not captured There are also other types of associations not captured by correlation.by correlation. There are also other types of associations not captured There are also other types of associations not captured by correlation.by correlation.

Correlation is a measure of linear association.Correlation is a measure of linear association. Correlation is a measure of linear association.Correlation is a measure of linear association.

3030

The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:

The correlation coefficient is computed as follows:The correlation coefficient is computed as follows:

forforsamplessamples

forforpopulationspopulations

rs

s sxyxy

x yrs

s sxyxy

x y

xyxy

x y

xyxy

x y


3131

2 22 1 x N x

x

(x ) (x )

N

2 22 1 x N x

x

(x ) (x )

N

2 22 1 nx

(x x) (x x)s

n 1

2 2

2 1 nx

(x x) (x x)s

n 1

2 21 y N y2

y

(y ) (y )

N

2 21 y N y2

y

(y ) (y )

N

2 22 1 ny

(y y) (y y)s

n 1

2 2

2 1 ny

(y y) (y y)s

n 1

Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linear relationshiprelationship.. Values near +1 indicate a Values near +1 indicate a strong positive linearstrong positive linear relationshiprelationship..

Values near -1 indicate a Values near -1 indicate a strong negative linearstrong negative linear relationshiprelationship. . Values near -1 indicate a Values near -1 indicate a strong negative linearstrong negative linear relationshiprelationship. .

The coefficient can take on values between -1 and +1.The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1.The coefficient can take on values between -1 and +1.

The closer the correlation is to zero, the weaker theThe closer the correlation is to zero, the weaker the relationship.relationship. The closer the correlation is to zero, the weaker theThe closer the correlation is to zero, the weaker the relationship.relationship.


3232

CorrelationCorrelation

A Positive Relationship: correlation close to 1

xx

yy

3333


xx

yy

3434

A Negative Relationship: correlation close to -1


No Apparent Relationship: Correlation near 0

xx

yy

3535

A golfer is interested in investigating theA golfer is interested in investigating the

relationship, if any, between driving distance relationship, if any, between driving distance and 18-hole score.and 18-hole score.

277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

Average DrivingAverage DrivingDistance (yds.)Distance (yds.)

AverageAverage18-Hole Score18-Hole Score

Covariance and Correlation CoefficientCovariance and Correlation Coefficient

Example: Golfing StudyExample: Golfing Study

3636


277.6277.6259.5259.5269.1269.1267.0267.0255.6255.6272.9272.9

696971717070707071716969

xx yy

10.6510.65 -7.45-7.45 2.152.15 0.050.05-11.35-11.35 5.955.95

-1.0-1.0 1.01.0 00 00 1.01.0-1.0-1.0

-10.65-10.65 -7.45-7.45 00 00-11.35-11.35 -5.95-5.95

( )ix x( )ix x ( )( )i ix x y y ( )( )i ix x y y ( )iy y( )iy y

AverageAverageStd. Dev.Std. Dev.

267.0267.0 70.070.0 -35.40-35.408.21928.2192.8944.8944

TotalTotal


3737

• Sample CovarianceSample Covariance

• Sample Correlation CoefficientSample Correlation Coefficient


7.08 -.9631

(8.2192)(.8944)xy

xyx y

sr

s s

7.08

-.9631(8.2192)(.8944)

xyxy

x y

sr

s s

( )( ) 35.40 7.08

1 6 1i i

xy

x x y ys

n

( )( ) 35.40

7.081 6 1

i ixy

x x y ys

n


3838

So, increasing driving distance decreases score, and the relationis really strong.

A A random variablerandom variable is a numerical description of the is a numerical description of the outcome of an experiment.outcome of an experiment. A A random variablerandom variable is a numerical description of the is a numerical description of the outcome of an experiment.outcome of an experiment.

Random VariablesRandom Variables

A A discrete random variablediscrete random variable may assume either a may assume either a finite number of values or an infinite sequence offinite number of values or an infinite sequence of values.values.

A A discrete random variablediscrete random variable may assume either a may assume either a finite number of values or an infinite sequence offinite number of values or an infinite sequence of values.values.

A A continuous random variablecontinuous random variable may assume any may assume any numerical value in an interval or collection ofnumerical value in an interval or collection of intervals.intervals.

A A continuous random variablecontinuous random variable may assume any may assume any numerical value in an interval or collection ofnumerical value in an interval or collection of intervals.intervals.

3939

Random VariablesRandom Variables

QuestionQuestion Random Variable Random Variable xx TypeType

FamilyFamilysizesize

xx = Number of dependents = Number of dependents reported on tax returnreported on tax return

DiscreteDiscrete

Distance fromDistance fromhome to storehome to store

xx = Distance in miles from = Distance in miles from home to the store sitehome to the store site

ContinuousContinuous

Own dogOwn dogor cator cat

xx = 1 if own no pet; = 1 if own no pet; = 2 if own dog(s) only; = 2 if own dog(s) only; = 3 if own cat(s) only; = 3 if own cat(s) only; = 4 if own dog(s) and cat(s)= 4 if own dog(s) and cat(s)

DiscreteDiscrete

4040

The The probability distributionprobability distribution for a random variable for a random variable describes how probabilities are distributed overdescribes how probabilities are distributed over the values of the random variable.the values of the random variable.

The The probability distributionprobability distribution for a random variable for a random variable describes how probabilities are distributed overdescribes how probabilities are distributed over the values of the random variable.the values of the random variable.

Discrete Probability DistributionsDiscrete Probability Distributions

4141

The probability distribution is defined by aThe probability distribution is defined by a probability functionprobability function, denoted by , denoted by ff((xx), which provides), which provides the probability for each value of the random variable.the probability for each value of the random variable.

The probability distribution is defined by aThe probability distribution is defined by a probability functionprobability function, denoted by , denoted by ff((xx), which provides), which provides the probability for each value of the random variable.the probability for each value of the random variable.

The required conditions for a discrete probabilityThe required conditions for a discrete probability function are:function are: The required conditions for a discrete probabilityThe required conditions for a discrete probability function are:function are:

Discrete Probability DistributionsDiscrete Probability Distributions

ff((xx) ) >> 0 0 (probabilities are not negative)ff((xx) ) >> 0 0 (probabilities are not negative)

ff((xx) = 1 ) = 1 (sum of all probabilities =1)ff((xx) = 1 ) = 1 (sum of all probabilities =1)

Remember that any probability is a number between 0 and 1.

4242

Expected Value

The The expected valueexpected value, or mean, of a random variable, or mean, of a random variable is a measure of its central location.is a measure of its central location. The The expected valueexpected value, or mean, of a random variable, or mean, of a random variable is a measure of its central location.is a measure of its central location.

The expected value does not have to be a value theThe expected value does not have to be a value the random variable can assume.random variable can assume. The expected value does not have to be a value theThe expected value does not have to be a value the random variable can assume.random variable can assume.

EE((xx) = ) = = = xfxf((xx))EE((xx) = ) = = = xfxf((xx))

4343

Variance and Standard DeviationVariance and Standard Deviation

The The variancevariance summarizes the variability in the summarizes the variability in the values of a random variable.values of a random variable. The The variancevariance summarizes the variability in the summarizes the variability in the values of a random variable.values of a random variable.

Var(Var(xx) = ) = 22 = = ((xx - - ))22ff((xx))Var(Var(xx) = ) = 22 = = ((xx - - ))22ff((xx))

The The standard deviationstandard deviation, , , is defined as the positive, is defined as the positive square root of the variance.square root of the variance. The The standard deviationstandard deviation, , , is defined as the positive, is defined as the positive square root of the variance.square root of the variance.

4444

Four Properties of a Binomial Experiment

3. The probability of a success, denoted by 3. The probability of a success, denoted by pp, does, does not change from trial to trial.not change from trial to trial.3. The probability of a success, denoted by 3. The probability of a success, denoted by pp, does, does not change from trial to trial.not change from trial to trial.

4. The trials are independent.4. The trials are independent.4. The trials are independent.4. The trials are independent.

2. Two outcomes, 2. Two outcomes, successsuccess and and failurefailure, are possible, are possible on each trial.on each trial.2. Two outcomes, 2. Two outcomes, successsuccess and and failurefailure, are possible, are possible on each trial.on each trial.

1. The experiment consists of a sequence of 1. The experiment consists of a sequence of nn identical trials.identical trials.1. The experiment consists of a sequence of 1. The experiment consists of a sequence of nn identical trials.identical trials.

Binomial Probability DistributionBinomial Probability Distribution

4545

Our interest is in the Our interest is in the number of successesnumber of successes occurring in the occurring in the nn trials. trials. Our interest is in the Our interest is in the number of successesnumber of successes occurring in the occurring in the nn trials. trials.


4646

where:where:

xx = the number of successes = the number of successes

pp = the probability of a success on one trial = the probability of a success on one trial

nn = the number of trials = the number of trials ff((xx) = the probability of ) = the probability of xx successes in successes in nn trials trials

( )( ) (1 ) n xxn

f x p px

( )( ) (1 ) n xxn

f x p px


Binomial Probability FunctionBinomial Probability Function

1 2 3! =

( )! ! 1 2 3 ( ) 1 2 3

n nnx n x x n x x

= `n choose x’ = number of ways x people can be chosen out of n 4747

( )( ) (1 ) n xxn

f x p px

( )( ) (1 ) n xxn

f x p px


Binomial Probability Binomial Probability FunctionFunction

Probability of a particularProbability of a particular sequence of trial outcomessequence of trial outcomes with x successes in with x successes in nn trials trials

Probability of a particularProbability of a particular sequence of trial outcomessequence of trial outcomes with x successes in with x successes in nn trials trials

Number of experimentalNumber of experimental outcomes providing exactlyoutcomes providing exactlyxx successes in successes in nn trials trials

Number of experimentalNumber of experimental outcomes providing exactlyoutcomes providing exactlyxx successes in successes in nn trials trials

These values are available in Table 5 of our textbook.

4848


Example: IIT EntranceExample: IIT Entrance

It is known that about 10% of the examinees It is known that about 10% of the examinees taking the IIT entrance qualify.taking the IIT entrance qualify.

Choosing 3 examinees at random, what isChoosing 3 examinees at random, what is

the probability that exactly 1 of them will the probability that exactly 1 of them will qualify?qualify?

Thus, for any examinee chosen at random, Thus, for any examinee chosen at random, there is a probability of 0.1 that the person there is a probability of 0.1 that the person will qualify.will qualify.

4949


f xn

x n xp px n x( )

!!( )!

( )( )

1f xn

x n xp px n x( )

!!( )!

( )( )

1

1 23!(1) (0.1) (0.9) 3(.1)(.81) .243

1!(3 1)!f

1 23!

(1) (0.1) (0.9) 3(.1)(.81) .2431!(3 1)!

f

pp = .10, = .10, nn = 3, = 3, xx = 1 = 1

Example: IIT EntranceExample: IIT EntranceUsing theUsing theprobabilityprobabilityfunctionfunction

Using theUsing theprobabilityprobabilityfunctionfunction

You can just check the binomial probability table in textbook forn= 3, p = 0.1, x = 1.

Or, in Excel, use ‘=BINOMDIST(1,3,0.1,FALSE)’

Just f(1) if FALSE,

f(0)+f(1) if TRUE 5050

(1 )np p (1 )np p

EE((xx) = ) = = = npnp

Var(Var(xx) = ) = 22 = = npnp(1 (1 pp))

Expected ValueExpected Value

VarianceVariance

Standard DeviationStandard Deviation


5151

3(.1)(.9) .52 em ployees 3(.1)(.9) .52 em ployees

EE((xx) = ) = npnp = 3(.1) = .3 employees out of 3 = 3(.1) = .3 employees out of 3

Var(Var(xx) = ) = npnp(1 – (1 – pp) = 3(.1)(.9) = .27) = 3(.1)(.9) = .27

• Expected ValueExpected Value

• VarianceVariance

• Standard DeviationStandard Deviation

Example: Evans ElectronicsExample: Evans Electronics


5252

A Poisson distributed random variable is oftenA Poisson distributed random variable is often useful in estimating the number of occurrencesuseful in estimating the number of occurrences over a over a specified interval of time or spacespecified interval of time or space

A Poisson distributed random variable is oftenA Poisson distributed random variable is often useful in estimating the number of occurrencesuseful in estimating the number of occurrences over a over a specified interval of time or spacespecified interval of time or space

It is a discrete random variable that may assumeIt is a discrete random variable that may assume an an infinite sequence of valuesinfinite sequence of values (x = 0, 1, 2, . . . ). (x = 0, 1, 2, . . . ). It is a discrete random variable that may assumeIt is a discrete random variable that may assume an an infinite sequence of valuesinfinite sequence of values (x = 0, 1, 2, . . . ). (x = 0, 1, 2, . . . ).

Poisson Probability DistributionPoisson Probability Distribution

5353

Examples of a Poisson distributed random variable:Examples of a Poisson distributed random variable: Examples of a Poisson distributed random variable:Examples of a Poisson distributed random variable:

the number of defects in 14 pages of a bookthe number of defects in 14 pages of a book the number of defects in 14 pages of a bookthe number of defects in 14 pages of a book

the number of customers arriving at the postthe number of customers arriving at the postoffice in one houroffice in one hour the number of customers arriving at the postthe number of customers arriving at the postoffice in one houroffice in one hour


Bell Labs used the Poisson distribution to model theBell Labs used the Poisson distribution to model the arrival of phone calls.arrival of phone calls. Bell Labs used the Poisson distribution to model theBell Labs used the Poisson distribution to model the arrival of phone calls.arrival of phone calls.

5454


Two Properties of a Poisson ExperimentTwo Properties of a Poisson Experiment

2.2. The occurrence or nonoccurrence in any timeThe occurrence or nonoccurrence in any time interval is independent of the occurrence orinterval is independent of the occurrence or nonoccurrence in any other time interval.nonoccurrence in any other time interval.

2.2. The occurrence or nonoccurrence in any timeThe occurrence or nonoccurrence in any time interval is independent of the occurrence orinterval is independent of the occurrence or nonoccurrence in any other time interval.nonoccurrence in any other time interval.

1.1. The probability of an occurrence is the sameThe probability of an occurrence is the same for any two time intervals of equal length.for any two time intervals of equal length.1.1. The probability of an occurrence is the sameThe probability of an occurrence is the same for any two time intervals of equal length.for any two time intervals of equal length.

5555

Poisson Probability Function

f xex

x( )

!

f x

ex

x( )

!

where:where:

xx = the number of occurrences in an interval = the number of occurrences in an interval

f(x) f(x) = the probability of = the probability of xx occurrences in an interval occurrences in an interval

= mean number of occurrences in an interval= mean number of occurrences in an interval

ee = 2.71828 = 2.71828



5656

Poisson Probability FunctionPoisson Probability Function

In practical applications, In practical applications, xx will eventually become will eventually becomelarge enough so that large enough so that ff((xx) is very small and negligible.) is very small and negligible.In practical applications, In practical applications, xx will eventually become will eventually becomelarge enough so that large enough so that ff((xx) is very small and negligible.) is very small and negligible.

Since there is no stated upper limit for the numberSince there is no stated upper limit for the numberof occurrences, the probability function of occurrences, the probability function ff((xx) is) isapplicable for values applicable for values xx = 0, 1, 2, … without limit. = 0, 1, 2, … without limit.

Since there is no stated upper limit for the numberSince there is no stated upper limit for the numberof occurrences, the probability function of occurrences, the probability function ff((xx) is) isapplicable for values applicable for values xx = 0, 1, 2, … without limit. = 0, 1, 2, … without limit.


5757

Example: Mercy HospitalExample: Mercy Hospital

Patients arrive at the emergency room of Patients arrive at the emergency room of MercyMercy

Hospital at the average rate of 6 per hour onHospital at the average rate of 6 per hour on

weekend evenings.weekend evenings.What is the probability of 4 arrivals in 30 What is the probability of 4 arrivals in 30 minutesminutes

on a weekend evening?on a weekend evening?


5858


4 33 (2.71828)

(4) .168014!

f

4 33 (2.71828)

(4) .168014!

f

= 6/hour = 3/half-hour, = 6/hour = 3/half-hour, xx = 4 = 4

Example: Mercy HospitalExample: Mercy Hospital Using theUsing theprobabilityprobabilityfunctionfunction

Using theUsing theprobabilityprobabilityfunctionfunction

Or, simply check the table for Poisson probabilities in the bookfor μ = 3, x = 4.

In Excel, use ‘=POISSON(4,3,FALSE)’Just f(4) if FALSE, f(0)+f(1)+...+f(4)

if TRUE

5959


Poisson Probabilities

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10

Number of Arrivals in 30 Minutes

Pro

bab

ilit

y

actually, actually, the the sequencesequencecontinues:continues:11, 12, …11, 12, …

Example: Mercy HospitalExample: Mercy Hospital

6060


A property of the Poisson distribution is thatA property of the Poisson distribution is thatthe mean and variance are equal.the mean and variance are equal.

A property of the Poisson distribution is thatA property of the Poisson distribution is thatthe mean and variance are equal.the mean and variance are equal.

= = 22

6161

Continuous Probability Distributions

A A continuous random variablecontinuous random variable can assume any can assume any value in an interval on the real line or in a value in an interval on the real line or in a collection of intervals.collection of intervals.

It is not possible to talk about the probability It is not possible to talk about the probability of the random variable assuming a particular of the random variable assuming a particular value.value. Instead, we talk about the probability of the Instead, we talk about the probability of the random variable assuming a value within a random variable assuming a value within a given interval.given interval.

6262

6363

We denote the ‘density function’ by f(x). AlsoWe denote the ‘density function’ by f(x). Also

2

( ) 0; ( ) 1

( ) ( )

( ) ( ) ( )

f x f x dx

E X xf x dx

Var X x E X f x dx

2

( ) 0; ( ) 1

( ) ( )

( ) ( ) ( )

f x f x dx

E X xf x dx

Var X x E X f x dx

Area as a Measure of ProbabilityArea as a Measure of Probability

The area under the graph of The area under the graph of ff((xx) and ) and probability are identical.probability are identical.

This is valid for all continuous random This is valid for all continuous random variables.variables. The probability that The probability that xx takes on a value takes on a value between some lower value between some lower value xx11 and some higher and some higher value value xx22 can be found by computing the area can be found by computing the area under the graph of under the graph of ff((xx) over the interval from ) over the interval from xx11 to to xx22. .

6464

The normal probability distribution is the most important distribution for describing a continuous random variable.

It is used in a wide variety of applicationsIt is used in a wide variety of applications

including:including:

• Heights of Heights of peoplepeople• Rainfall Rainfall amountsamounts

• Test scoresTest scores• Scientific Scientific measurementsmeasurements

Normal Probability DistributionNormal Probability Distribution

For a large number of similar variables that are For a large number of similar variables that are unrelated, sum and average are approximately unrelated, sum and average are approximately normal.normal.

6565

Normal Probability Density Function

2 2( ) / 21( )

2xf x e

2 2( ) / 21( )

2xf x e

= mean= mean

= standard deviation= standard deviation

= 3.14159= 3.14159

ee = 2.71828 = 2.71828

where:where:


6666

The distribution is The distribution is symmetricsymmetric; its skewness; its skewness measure is zero.measure is zero. The distribution is The distribution is symmetricsymmetric; its skewness; its skewness measure is zero.measure is zero.


CharacteristicsCharacteristics

xx

6767

The The highest pointhighest point on the normal curve is at the on the normal curve is at the meanmean, the middle point., the middle point. The The highest pointhighest point on the normal curve is at the on the normal curve is at the meanmean, the middle point., the middle point.



xx

6868



-10-10 00 2525

The mean can be any numerical value: negative,The mean can be any numerical value: negative, zero, or positive.zero, or positive. The mean can be any numerical value: negative,The mean can be any numerical value: negative, zero, or positive.zero, or positive.

xx

6969



= 15= 15

= 25= 25

The standard deviation determines the width of theThe standard deviation determines the width of thecurve: larger values result in wider, flatter curves.curve: larger values result in wider, flatter curves.The standard deviation determines the width of theThe standard deviation determines the width of thecurve: larger values result in wider, flatter curves.curve: larger values result in wider, flatter curves.

xx

7070

Probabilities for the normal random variable areProbabilities for the normal random variable are given by given by areas under the curveareas under the curve. The total area. The total area under the curve is 1 (.5 to the left of the mean andunder the curve is 1 (.5 to the left of the mean and .5 to the right)..5 to the right).

Probabilities for the normal random variable areProbabilities for the normal random variable are given by given by areas under the curveareas under the curve. The total area. The total area under the curve is 1 (.5 to the left of the mean andunder the curve is 1 (.5 to the left of the mean and .5 to the right)..5 to the right).



.5.5 .5.5

xx

7171


Characteristics (basis for the empirical rule)Characteristics (basis for the empirical rule)

of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean. of values of a normal random variableof values of a normal random variable are within of its mean.are within of its mean.68.26%68.26%68.26%68.26%

+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation+/- 1 standard deviation





7272


Characteristics (basis for the empirical rule)Characteristics (basis for the empirical rule)

xx – – 33 – – 11

– – 22 + 1+ 1

+ 2+ 2 + 3+ 3

68.26%68.26%95.44%95.44%99.72%99.72%

7373

A random variable having a normal distributionA random variable having a normal distribution with a mean of 0 and a standard deviation of 1 iswith a mean of 0 and a standard deviation of 1 is said to have a said to have a standard normal probabilitystandard normal probability distributiondistribution..

A random variable having a normal distributionA random variable having a normal distribution with a mean of 0 and a standard deviation of 1 iswith a mean of 0 and a standard deviation of 1 is said to have a said to have a standard normal probabilitystandard normal probability distributiondistribution..


Standard Normal Probability DistributionStandard Normal Probability Distribution

7474

00zz

The letter The letter z z is used to designate the standardis used to designate the standard normal random variable.normal random variable. The letter The letter z z is used to designate the standardis used to designate the standard normal random variable.normal random variable.



7575

Converting to the Standard Normal Converting to the Standard Normal DistributionDistribution


zx

zx

7676

The daily demand of the new ipad in a store The daily demand of the new ipad in a store seems to follow a normal distribution with an seems to follow a normal distribution with an average of 15 and a standard deviation of 6. average of 15 and a standard deviation of 6.

Example: DemandExample: Demand

The manager, who does not want to keep The manager, who does not want to keep more than 20 ipads in his store at a time, more than 20 ipads in his store at a time, would like to know the probability of a would like to know the probability of a stockout, i.e. that the demand in a day will stockout, i.e. that the demand in a day will exceed 20. exceed 20.

PP((xx > 20) = ? > 20) = ?

7777

zz = ( = (xx - - )/)/ = (20 - 15)/6= (20 - 15)/6 = .83= .83

zz = ( = (xx - - )/)/ = (20 - 15)/6= (20 - 15)/6 = .83= .83

Solving for the Stockout ProbabilitySolving for the Stockout Probability

Step 1: Convert Step 1: Convert xx to the standard normal distribution. to the standard normal distribution.Step 1: Convert Step 1: Convert xx to the standard normal distribution. to the standard normal distribution.

7878

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

. . . . . . . . . . .

.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549

.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852

.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389

. . . . . . . . . . .

Cumulative Probability Table for Standard Normal Cumulative Probability Table for Standard Normal DistributionDistribution

PP((zz << .83) .83)


Step 2: Find the area under the standard normalStep 2: Find the area under the standard normal curve to the left of curve to the left of zz = .83. = .83.Step 2: Find the area under the standard normalStep 2: Find the area under the standard normal curve to the left of curve to the left of zz = .83. = .83.

7979

8080

In Excel, use ‘=NORMDIST(0.83,0,1,TRUE)’

Just f(0.83) if FALSE, area upto 0.83 if

TRUE

In fact, you can straightaway use ‘=NORMDIST(20,15,6,TRUE)’

P(X ≤ 20) with μ = 15,

σ = 6

PP((z z > .83) = 1 – > .83) = 1 – PP((zz << .83) .83) = 1- .7967= 1- .7967

= .2033= .2033

PP((z z > .83) = 1 – > .83) = 1 – PP((zz << .83) .83) = 1- .7967= 1- .7967

= .2033= .2033


Step 3: Compute the area under the standard normalStep 3: Compute the area under the standard normal curve to the right of curve to the right of zz = .83. = .83.Step 3: Compute the area under the standard normalStep 3: Compute the area under the standard normal curve to the right of curve to the right of zz = .83. = .83.

ProbabilityProbability of a of a stockoutstockout

PP((xx > > 20)20)


8181


00 .83.83

Area = .7967Area = .7967Area = 1 - .7967Area = 1 - .7967

= .2033= .2033

zz



8282


If the manager of wants the probability of a If the manager of wants the probability of a stockout during replenishment lead-time to stockout during replenishment lead-time to be no more than .05, what should the reorder be no more than .05, what should the reorder point be?point be?

------------------------------------------------------------------------------------------------------------------------------(Hint: Given a probability, we can use the (Hint: Given a probability, we can use the

standardstandard

normal table in an inverse fashion to find thenormal table in an inverse fashion to find the

corresponding corresponding zz value. Give it a try.) value. Give it a try.)

8383

End of Lecture

8484

poisson statistics

Documents

skewness of sample data

data set

data value greater

data value equal

recorded data value

percentage of data values

sample mean

empirical rulefor data