poisson statistics

Download Poisson statistics

Post on 26-Jan-2015

111 views

Category:

Documents

3 download

Embed Size (px)

DESCRIPTION

POISSON PROBABILITYDISTRIBUTION COVARIANCE BINOMIAL DISTRIBUTIION

TRANSCRIPT

  • 1. Measures of Distribution Shape,Relative Location, and Detecting OutliersDistribution Shapez-ScoresEmpirical RuleDetecting Outliers2

2. Distribution Shape: SkewnessAn important measure of the shape of a distributionis called skewness.The formula for the skewness of sample data is 3 n xi x Skewness = s (n 1)( n 2) Skewness can be easily computed using statisticalsoftware.3 3. Distribution Shape: SkewnessSymmetric (not skewed) Skewness is zero. Mean and median are equal. .35 Skewness = 0 .30Relative Frequency .25 .20 .15 .10 .0504 4. Distribution Shape: Skewness Moderately Skewed Left Skewness is negative. Mean will usually be less than the median. .35 Skewness = .31 .30Relative Frequency .25 .20 .15 .10 .0505 5. Distribution Shape: Skewness Moderately Skewed Right Skewness is positive. Mean will usually be more than the median. .35 Skewness = .31 .30Relative Frequency .25 .20 .15 .10 .0506 6. Distribution Shape: SkewnessHighly Skewed Right Skewness is positive (often above 1.0). Mean will usually be more than the median..35 Skewness = 1.25.30 Relative Frequency.25.20.15.10.05 07 7. Distribution Shape: SkewnessExample: Apartment RentsSeventy efficiency apartments were randomlysampled in a college town. The monthly rent pricesfor the apartments are listed below in ascending order. 425 430 430 435 435 435 435 435 440 440 440 440 440 445 445 445 445 445 450 450 450 450 450 450 450 460 460 460 465 465 465 470 470 472 475 475 475 480 480 480 480 485 490 490 490 500 500 500 500 510 510 515 525 525 525 535 549 550 570 570 575 575 580 590 600 600 600 600 615 615 8 8. Distribution Shape: SkewnessExample: Apartment Rents.35 Skewness = .92.30 Relative Frequency.25.20.15.10.05 0 9 9. z-Scores The z-score is often called the standardized value. The z-score is often called the standardized value. It denotes the number of standard deviations a data It denotes the number of standard deviations a data value xii is from the mean. value x is from the mean.xi x zi = s Excels STANDARDIZE function can be used to Excels STANDARDIZE function can be used to compute the z-score. compute the z-score. 10 10. z-Scores An observations z-score is a measure of the relativelocation of the observation in a data set. A data value less than the sample mean will have az-score less than zero. A data value greater than the sample mean will havea z-score greater than zero. A data value equal to the sample mean will have az-score of zero.11 11. z-Scores Example: Apartment Rents z-Score of Smallest Value (425)xi x 425 490.80 z= = = 1.20 s54.74Standardized Values for Apartment Rents-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.170.350.35 0.440.620.620.62 0.81 1.06 1.08 1.451.451.54 1.541.631.811.99 1.99 1.99 1.99 2.272.2712 12. Empirical RuleWhen the data are believed to approximate abell-shaped distribution with moderate skew The empirical rule can be used to determine the The empirical rule can be used to determine the percentage of data values that must be within a percentage of data values that must be within a specified number of standard deviations of the specified number of standard deviations of the mean. mean. The empirical rule is based on the normal The empirical rule is based on the normal distribution, which we will discuss later. distribution, which we will discuss later. 13 13. Empirical RuleFor data having a bell-shaped distribution, approximately 68.26% of the values are withinof the values are within+/- 1 standard deviationof its mean. of its mean. 95.44% values are within of the values are within of the +/- 2 standard deviations of its mean.of its mean. 99.72% values are within of the values are within of the +/- 3 standard deviations its mean. of its mean. of14 14. Empirical Rule99.72%95.44%68.26% x 3 1 + 1 + 3 2 + 2 15 15. Detecting Outliers An outlier is an unusually small or unusually largevalue in a data set. A data value with a z-score less than -3 or greaterthan +3 might be considered an outlier. It might be: an incorrectly recorded data value a data value that was incorrectly included in the data set a data value that has occurred by chance16 16. Detecting Outliers Example: Apartment Rents The most extreme z-scores are -1.20 and 2.27 Using |z| > 3 as the criterion for an outlier, thereare no outliers in this data set. Standardized Values for Apartment Rents-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.010.17 0.170.170.17 0.35 0.350.44 0.620.62 0.620.81 1.061.081.45 1.45 1.541.54 1.631.81 1.991.99 1.991.992.27 2.2717 17. Exploratory Data Analysis Exploratory data analysis is looking at methods Exploratory data analysis is looking at methodsto summarize data.to summarize data. For now we simply sort the data values into ascending For now we simply sort the data values into ascendingorder and identify the five-number summary and thenorder and identify the five-number summary and thenconstruct a box plot..construct a box plot 18 18. Five-Number Summary1 Smallest Value2 First Quartile3 Median4 Third Quartile5 Largest Value19 19. Five-Number Summary Example: Apartment RentsLowest Value = 425First Quartile = 445Median = 475Third Quartile = 525 Largest Value = 615425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 61520 20. Box Plot A box plot is a graphical summary of data that is A box plot is a graphical summary of data that is based on a five-number summary. based on a five-number summary. A key to the development of a box plot is the A key to the development of a box plot is the computation of the median and the quartiles Q11 and computation of the median and the quartiles Q and Q33.. Q Box plots provide another way to identify outliers. Box plots provide another way to identify outliers. They also tell us whether the data are skewed. They also tell us whether the data are skewed. 21 21. Box Plot Example: Apartment Rents A box is drawn with its ends located at the first andthird quartiles (Q1 & Q3). A vertical line is drawn in the box at the location ofthe median (second quartile).400 425 450 475 500 525 550 575 600 625Q1 = 445Q3 = 525 Q2 = 475 22 22. Box Plot Limits are located (not drawn) using the interquartilerange (IQR = Q3-Q1): they are 1.5IQR below Q1 and1.5 IQR above Q3. Data outside these limits are considered outliers. The locations of each outlier is shown with thesymbol * . 23 23. Box Plot Example: Apartment Rents The lower limit is located 1.5(IQR) below Q1. Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325 The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645 There are no outliers (values less than 325 or greater than 645) in the apartment rent data.24 24. Box Plot Example: Apartment Rents Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits. 400 425 450 475 500 525 550 575 600 625Smallest valueLargest valueinside limits = 425 inside limits = 61525 25. Box PlotAn excellent graphical technique for making comparisons among two or more groups.26 26. Measures of AssociationBetween Two Variables Thus far we have examined numerical methods used Thus far we have examined numerical methods used to summarize the data for one variable at a time. to summarize the data for one variable at a time. Often a manager or decision maker is interested inOften a manager or decision maker is interested in the relationship between two variables..the relationship between two variables Two descriptive measures of the relationship Two descriptive measures of the relationship between two variables are covariance and correlation between two variables are covariance and correlation coefficient.. coefficient27 27. Covariance The covariance is a measure of the linear association The covariance is a measure of the linear association between two variables. between two variables. Positive values indicate a positive relationship. Positive values indicate a positive relationship. Negative values indicate a negative relationship. Negative values indicate a negative relationship. 28 28. CovarianceThe covariance is computed as follows:The covariance is computed as follows:(x1 x )(y1 y ) + L + (x N x )(y N y ) xy = for Npopulations (x1 x)(y1 y) + L + (x n x)(y n y)s xy = for n1 samples29 29. Correlation Coefficient Correlation is a measure of linear association. Correlation is a measure of linear association. There are also other types of associations not captured There are also other types of associations not captured by correlation. by correlation. 30 30. Correlation Coefficient The correlation coefficient is computed as follows: The correlation coefficient is computed as follows: sxy xy rxy = xy = sx s y x yforforsamplespopulations(x1 x ) + L + (x N x ) 2 2 (y1 y )2 + L + (y N y )2 =2 2= xN y N (x1 x)2 + L + (x n x) 2 (y1 y) + L + (y n y)22s2 = x s =2n1 y n131 31. Correlation Coefficient The coefficient can take on values between -1 and +1. The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear Values near -1 indicate a strong negative linear relationship.. relationship Values near +1 indicate a strong positive linear Values near +1 indicate a strong positive linear relationship.. relationship The closer the