south dakota school of mines & technology introduction to probability & statistics...
TRANSCRIPT
South Dakota South Dakota
School of Mines & School of Mines & TechnologyTechnology
Introduction to Introduction to Probability & Statistics Probability & Statistics
Industrial EngineeringIndustrial Engineering
Introduction to Introduction to Probability & StatisticsProbability & Statistics
Data AnalysisData Analysis
Industrial EngineeringIndustrial Engineering
Data AnalysisData Analysis
HistogramsHistograms
Industrial EngineeringIndustrial Engineering
Experimental DataExperimental Data Suppose we wish to make some estimates on time to fail
for a new power supply. 40 units are randomly selected and tested to failure. Failure times are recorded follow:
2.7 25.8 19.6 4.5 0.56.4 18.3 41.6 5.8 73.813.9 32.2 27.7 5.1 12.034.9 21.0 10.2 46.1 37.914.9 24.1 1.0 29.8 3.37.1 59.9 9.4 12.9 7.911.1 2.1 16.0 22.5 8.63.8 51.8 1.6 17.1 14.7
HistogramHistogram Perhaps the most useful method,
histograms give the analyst a feel for the distribution from which the data was obtained.
Count observations within a set of ranges Average 5 observations per interval class
HistogramHistogram Perhaps the most useful method, histograms give
the analyst a feel for the distribution from which the data was obtained.
Count observations with a set of ranges Average 5 observations per interval class
Range for power supply data: 0.5-73.8
Intervals: 0.0 - 10.0 40.1 - 50.010.1 - 20.0 50.1 - 60.020.1 - 30.0 60.1 - 70.030.1 - 40.0 70.1 - 80.0
HistogramHistogram
Class Interval 0.0 - 10.0 Count = 15
2.7 25.8 19.6 4.5 0.56.4 18.3 41.6 5.8 73.813.9 32.2 27.7 5.1 12.034.9 21.0 10.2 46.1 37.914.9 24.1 1.0 29.8 3.37.1 59.9 9.4 12.9 7.911.1 2.1 16.0 22.5 8.63.8 51.8 1.6 17.1 14.7
HistogramHistogram
Class Interval 10.1 - 20.0Count = 11
2.7 25.8 19.6 4.5 0.56.4 18.3 41.6 5.8 73.813.9 32.2 27.7 5.1 12.034.9 21.0 10.2 46.1 37.914.9 24.1 1.0 29.8 3.37.1 59.9 9.4 12.9 7.911.1 2.1 16.0 22.5 8.63.8 51.8 1.6 17.1 14.7
HistogramHistogramClass Intervals
Frequency 0.0 - 10.0 1510.1 - 20.0 1120.1 - 30.0 630.1 - 40.0 340.1 - 50.0 250.1 - 60.0 260.1 - 70.0 070.1 - 80.0 1
Power Supply Failure Times
0
5
10
15
20
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
Time Class
Fre
qu
en
cy
Exponential Exponential DistributionDistribution
f x e x( ) Density
Cumulative
Mean 1/
Variance 1/2
F x e x( ) 1
, x > 0
Exponential Life
0.0
0.5
1.0
0 0.5 1 1.5 2 2.5 3
Time to Service
De
ns
ity
Histogram; Change Histogram; Change IntervalInterval
Class IntervalsFrequency
0.0 - 15.0 21 15.1 - 30.0 10 30.1 - 45.0 4 45.1 - 60.0 3 60.1 - 75.0 1
Change of Interval
0
5
10
15
20
25
0-15 15-30 30-45 45-60 60-75
Failure Times
Fre
qu
en
cy
Histogram; Change Histogram; Change IntervalInterval
Class Intervals Frequency 0.0 - 5.0 8 5.1 - 10.0 7 10.1 - 15.0 6 15.1 - 20.0 4 20.1 - 25.0 3 25.1 - 30.0 3 30.1 - 35.0 2 35.1 - 40.0 1 40.1 - 45.0 1 45.1 - 50.0 1 50.1 - 55.0 1 55.1 - 60.0 1
Change of Interval
0
2
4
6
8
10
Failure Times
Fre
qu
en
cy
Histogram; Change Class Histogram; Change Class MarkMark
Class IntervalsFrequency
-5.0 - 5.0 8 5.1 - 15.0 1315.1 - 25.0 725.1 - 35.0 535.1 - 45.0 245.1 - 55.0 255.1 - 65.0 165.1 - 75.0 1
Change of Class Mark
02468
101214
-5 -5
5-15
15-25
25-35
35-45
45-55
55-65
65-75
Failure Times
Fre
qu
en
cy
Class ProblemClass Problem The following data represents independent
observations on deviations from the desired diameter of ball bearings produced on a new high speed machine.
Deviations from desired diameters of ball bearings2.31 0.94 1.70 1.00 -0.161.49 2.48 2.58 0.19 1.712.10 1.97 0.56 2.28 1.180.30 0.59 0.38 0.01 1.590.48 1.15 0.77 0.31 1.631.71 0.95 2.29 -0.12 0.440.19 0.45 1.55 0.89 2.440.00 -0.51 0.27 0.60 2.200.66 2.36 2.29 0.21 2.30-1.27 1.03 1.55 1.90 1.301.01 0.24 -0.54 2.66 0.22
Class ProblemClass Problem
Diameter Errors
0
5
10
15
Error
Fre
qu
en
cy
Class ProblemClass Problem
Diameter Error
02468
1012
Error
Fre
qu
en
cy
HistogramHistogram
Intervals & Class marks can alter the histogram too many intervals leaves too many voids too few intervals doesn’t give a good picture
Rule of Thumb # Intervals = n/5 Sturges’ Rule
k = [1 + log2n] = [1 + 3.322 log10n]
Class ProblemClass Problem The following represents demand for a particular
inventory during a 70 day period. Construct a histogram and hypothesize a distribution.
Inventory Demand Data2 7 1 3 6 1 32 0 1 5 11 5 32 8 1 7 4 8 44 0 2 20 0 2 51 6 12 7 0 5 118 6 2 0 4 2 48 10 6 6 5 2 63 6 5 0 1 3 10 2 1 8 5 6 10 1 9 4 1 4 2
Relative HistogramRelative HistogramClass Freq Rel. 0.0 - 10.0 15 0.37510.1 - 20.0 11 0.27520.1 - 30.0 6 0.15030.1 - 40.0 3 0.07540.1 - 50.0 2 0.05050.1 - 60.0 2 0.05060.1 - 70.0 0 0.00070.1 - 80.0 1 0.025
Relative Frequency Histogram
0
0.1
0.2
0.3
0.4
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
Failure Times
Re
lati
ve
Fre
qu
en
cy
Relative HistogramRelative HistogramHistogram vs Relative Histogram
0
2
4
6
8
10
12
14
16
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Failure Times
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Freq.
Rel. Freq.
HistogramHistogram
Class Excel Exercise
South Dakota South Dakota
School of Mines & School of Mines & TechnologyTechnology
Data Analysis Data Analysis
Industrial EngineeringIndustrial Engineering
Data AnalysisData Analysis
Empirical DistributionsEmpirical Distributions
Industrial EngineeringIndustrial Engineering
Empirical CumulativeEmpirical Cumulative Rank Order the data smallest to largest
Example: Suppose we collect gpa’s on 10 students 3.5, 2.8, 2.7, 3.3, 3.0, 3.9, 2.9, 3.0, 2.4, 3.1
n
ixF i
0.5)(
Empirical CumulativeEmpirical Cumulative
RankObsn
ixF i
0.5)(
1 2.4 0.052 2.7 0.153 2.8 0.254 2.9 0.355 3.0 0.456 3.0 0.557 3.1 0.658 3.3 0.759 3.5 0.8510 3.9 0.95
Empirical CumulativeEmpirical Cumulative
xF i )(
2.4 0.052.7 0.152.8 0.252.9 0.353.0 0.453.0 0.553.1 0.653.3 0.753.5 0.853.9 0.95
ObsEmpirical Cumulative
0.0
0.2
0.4
0.6
0.8
1.0
2.0 2.5 3.0 3.5 4.0
Gpa
F(x
)
Time to FailureTime to Failure
Time to Fail
0
0.20.4
0.60.8
11.2
0.0 20.0 40.0 60.0 80.0
Time
Cu
mu
lati
ve
Exponential Exponential DistributionDistribution
f x e x( ) Density
Cumulative
Mean 1/
Variance 1/2
F x e x( ) 1
, x > 0
Exponential Distribution
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0 1 2 3 4 5
x
F(x
)
Inventory DataInventory Data
Inventory Demand
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0 5 10 15 20 25
Demand Size
Cu
mu
lati
ve
Diameter ErrorsDiameter Errors
Diameter Errors
0.000
0.200
0.400
0.600
0.800
1.000
1.200
-2.00 -1.00 0.00 1.00 2.00 3.00
Error
Cu
mu
lati
ve
Normal DistributionNormal DistributionNormal Distribution
0.0
0.2
0.4
0.6
0.8
1.0
1.2
-4.00 -2.00 0.00 2.00 4.00
Z
F(Z
)
Scatter Plots (Paired Scatter Plots (Paired Data)Data)
Shows the relationship between paired data
Example: Suppose for example we wish to look at state per student expenditures versus achievement results on the Stanford Achievement Test
Scatter Plots (Paired Scatter Plots (Paired Data)Data)
SAT vs Student Expenditure
30
40
50
60
0 2,000 4,000 6,000 8,000 10,000 12,000
$ per student
Av
era
ge
SA
T S
co
re
South Dakota South Dakota
School of Mines & School of Mines & TechnologyTechnology
Data Analysis Data Analysis
Industrial EngineeringIndustrial Engineering
Data AnalysisData Analysis
Box Plots Box Plots
Industrial EngineeringIndustrial Engineering
Box PlotsBox Plots
Problem with empirical is we may simply not have enough data
For small data sets, analysts often like to provide a rough graphical measure of how data is dispersed
Consider our student data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Box PlotsBox Plots
Ranked student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Min = 2.4 Max = 3.9
2.4 3.9
Box PlotsBox Plots
Ranked student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Median = (3.0+3.0)/2 = 3.0
2.4 3.93.0
Box PlotsBox Plots
Ranked student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Median Bottom= (2.7+2.8)/2 = 2.75
2.4 3.93.02.75
Box PlotsBox Plots
Ranked student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Median Top = (3.3+3.5)/2 = 3.4
2.4 3.93.02.75 3.4
Box PlotsBox Plots
Ranked student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
2.4 3.93.02.75 3.4
Fail Time DataFail Time Data
Min = 0.5 Max = 73.8
Lower Quartile = 6.1
Median = 14.3
Upper Quartile = 26.7
0.5 6.1 14.3 26.7 73.8
Class ProblemClass Problem The following data represents sorted observations
on deviations from desired diameters of ball bearings. Compute a box plot.
-1.27 0.24 0.66 1.49 2.20-0.54 0.27 0.77 1.55 2.28-0.51 0.30 0.89 1.55 2.29-0.16 0.31 0.94 1.59 2.29-0.12 0.38 0.95 1.63 2.300.00 0.44 1.00 1.70 2.310.01 0.45 1.01 1.71 2.360.19 0.48 1.03 1.71 2.440.19 0.56 1.15 1.90 2.480.21 0.59 1.18 1.97 2.580.22 0.60 1.30 2.10 2.66
South Dakota South Dakota
School of Mines & School of Mines & TechnologyTechnology
Data Analysis Data Analysis
Industrial EngineeringIndustrial Engineering
Data AnalysisData Analysis
Statistical Measures Statistical Measures
Industrial EngineeringIndustrial Engineering
Aside: Mean, Aside: Mean, VarianceVariance
Mean:
Variance:
xp x xdiscretex
( ) ,
2 2 ( ) ( )x p xx
ExampleExample
Consider the discrete uniform die example:
x 1 2 3 4 5 6
p(x) 1/6 1/6
1/6 1/6
1/6 1/6
= E[X] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6)
= 3.5
ExampleExample
Consider the discrete uniform die example:
x 1 2 3 4 5 6
p(x) 1/6 1/6
1/6 1/6
1/6 1/6
2 = E[(X-)2] = (1-3.5)2(1/6) + (2-3.5)2(1/6) + (3-3.5)2(1/6)
+ (4-3.5)2(1/6) + (5-3.5)2(1/6) + (6-3.5)2(1/6)
= 2.92
Binomial MeanBinomial Mean
= 1p(1) + 2p(2) + 3p(3) + . . . + np(n)
xp xx
( )
xnxn
x
ppxnx
nx
)1()!(!
!
0
Binomial MeanBinomial Mean
= 1p(1) + 2p(2) + 3p(3) + . . . + np(n)
xp xx
( )
xnxn
x
ppxnx
nx
)1()!(!
!
0
Miracle 1 occurs
= np
Binomial MeasuresBinomial Measures
Mean:
Variance:
xp xx
( )
2 2 ( ) ( )x p xx
= np
= np(1-p)
Binomial DistributionBinomial Distribution
0.0
0.1
0.2
0.3
0.4
0.5
0 1 2 3 4 5
x
P(x
)
0.0
0.1
0.2
0.3
0.4
0.5
0 1 2 3 4 5
x
P(x
)
n=5, p=.3 n=8, p=.5
x
0.0
0.1
0.2
0.3
0.4
0.5
0 1 2 3 4 5 6 7 8
P(x
)
n=4, p=.8
0.0
0.1
0.2
0.3
0.4
0.5
0 2 4
x
P(x
)
n=20, p=.5
Measures of Measures of CentralityCentrality
Mean
Median
Mode
Measures of Measures of CentralityCentrality
Mean
xp x xdiscretex
( ) ,
xf x dx xcontinuous( ) ,
Sample Mean
n
i
i
n
xX
1
Measures of Measures of CentralityCentrality
Exercise: Compute the sample mean for the student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Measures of Measures of CentralityCentrality
Failure Data
2.7 25.8 19.6 4.5 0.56.4 18.3 41.6 5.8 73.813.9 32.2 27.7 5.1 12.034.9 21.0 10.2 46.1 37.914.9 24.1 1.0 29.8 3.37.1 59.9 9.4 12.9 7.911.1 2.1 16.0 22.5 8.63.8 51.8 1.6 17.1 14.7
X 1.19
Measures of Measures of CentralityCentrality
MedianCompute the median for the student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
0.32
0.30.3
X
Measures of Measures of CentralityCentrality
ModeClass mark of most frequently occurring interval
For Failure data, mode = class mark first interval
0.5X
Measures of Measures of CentralityCentrality
Measure Student Gpa Failure DataMean 3.00 19.10Median 3.04 14.40Mode --- 5.00
Sample mean X is a blue estimator of true mean
X E[ X ] = u.b.
Measures of Measures of DispersionDispersion
Range
Sample Variance
Measures of Measures of DispersionDispersion
RangeCompute the range for the student Gpa data2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
Min = 2.4 Max = 3.9
Range = 3.9 - 2.4 = 1.5
Measures of Measures of DispersionDispersion
Variance
2 2 ( ) ( )x p xx
2 2
( ) ( )x f x dx
Sample variance
x
11
22
2
n
nxs
n
ii
Measures of Measures of DispersionDispersion
Exercise: Compute the sample variance for the student Gpa data
2.4, 2.7, 2.8, 2.9, 3.0, 3.0, 3.1, 3.3, 3.5, 3.9
I Xi Xi
2
1 2.4 5.762 2.7 7.293 2.8 7.844 2.9 8.415 3.0 9.006 3.0 9.007 3.1 9.618 3.3 10.899 3.5 12.2510 3.9 15.21Sum = 30.6 95.3Avg = 3.1 9.5
x
11
22
2
n
nxs
n
ii
Measures of Measures of DispersionDispersion
Exercise: Compute the variance for failure time data
s2 = 302.76
2.7 25.8 19.6 4.5 0.56.4 18.3 41.6 5.8 73.813.9 32.2 27.7 5.1 12.034.9 21.0 10.2 46.1 37.914.9 24.1 1.0 29.8 3.37.1 59.9 9.4 12.9 7.911.1 2.1 16.0 22.5 8.63.8 51.8 1.6 17.1 14.7
An AsideAn Aside
For Failure Time data, we now have three measures for the data
Expontial ??
s2 = 302.76
X 1.19
Power Supply Failure Times
0
5
10
15
20
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
Time Class
Fre
qu
en
cy
An AsideAn Aside
Recall that for the exponential distribution = 1/
s2 = 1/2
If E[ X ] = and E [s2 ] = s2, then
1/ = 19.1or
1/2 = 302.76
s2 = 302.76X 1.19
0575.ˆ
0524.ˆ