m07-numerical summaries 1 1 department of ism, university of alabama, 1995-2003 lesson objectives ...
TRANSCRIPT
M07-Numerical Summaries 1 1 Department of ISM, University of Alabama, 1995-2003
Lesson Objectives
Learn when each measure of a “typical value” is appropriate.Also called “central tendency” or “location.”
Learn when each measure ofa “variation” are appropriate.Also called “scatter” or “dispersion.”
See how these measures relate to statistical inference, which will coveredlater in the course.
M07-Numerical Summaries 1 2 Department of ISM, University of Alabama, 1995-2003
Statistics is the science of
• collecting
• organizing
• summarizing
• interpreting
DATA
for making decisions.
M07-Numerical Summaries 1 3 Department of ISM, University of Alabama, 1995-2003
Organize / SummarizeOrganize / Summarize Data Data
GraphicalGraphical NumericalNumerical
M07-Numerical Summaries 1 4 Department of ISM, University of Alabama, 1995-2003
Key Features of Data Distributions
Shape
Typical Value
Spread
Outliers
This sectioncovers thesetwo.
M07-Numerical Summaries 1 5 Department of ISM, University of Alabama, 1995-2003
Measures of Location
Give “middle” or “typical” valuesor “central tendency.”
Measures of VariationDescribe “spread” or “scatter”or “dispersion” in the data.
M07-Numerical Summaries 1 6 Department of ISM, University of Alabama, 1995-2003
Measures of Location
1. Meanthe “center of gravity”
of the data (histogram).
M07-Numerical Summaries 1 7 Department of ISM, University of Alabama, 1995-2003
formula for mean
SampleMean =
Sum of observationsdivided by
sample size
Xi
nX =
X1 + X2 + ··· +Xn
n=
M07-Numerical Summaries 1 8 Department of ISM, University of Alabama, 1995-2003
The mean is ________________
to extreme values (outliers).
M07-Numerical Summaries 1 9 Department of ISM, University of Alabama, 1995-2003
2. Median - midpoint of distribution
At least half of the observations
are at or less than the median,
and at least half are
at or greater than the median.
M07-Numerical Summaries 1 10 Department of ISM, University of Alabama, 1995-2003
Note: For n observations,the median is located at the
n + 1
2
in the ordered sample.
-th observation
M07-Numerical Summaries 1 11 Department of ISM, University of Alabama, 1995-2003
Example 1
Data: 14, 18, 20, 12, 24, 15, 14 (n = 7 “odd”)
7 + 12
= 4th location of median
Median is the middle value of the “ordered” data.
At least half the values are at or greater; at least half are at or lower.
M07-Numerical Summaries 1 12 Department of ISM, University of Alabama, 1995-2003
median example
Data: 14, 18, 20, 12, 24, 15, 14 (n = 7 “odd”) 94 (outlier)
Original, Original, X = X = with outlier, X =with outlier, X =
Example 2
Median is still the middle value.
Median is resistant to outliers.
M07-Numerical Summaries 1 13 Department of ISM, University of Alabama, 1995-2003
Data: 14, 18, 20, 12, 24, 15, 14, 214 (n = 8 “even,” outlier)
Median is the average of the two middle values.
Exactly half the values are greater, half lower.
Example 3
8 + 12
= 4.5th location of median
M07-Numerical Summaries 1 14 Department of ISM, University of Alabama, 1995-2003
1. Order the data.
2. For odd n, the median is the center observation.
3. For even n, the median is the average of the two center observations.
Summary for finding Median
M07-Numerical Summaries 1 15 Department of ISM, University of Alabama, 1995-2003
3. Mode - most frequently occurring number
In a histogram, modal class is the one having largest frequency,
i.e., highest bar.
M07-Numerical Summaries 1 16 Department of ISM, University of Alabama, 1995-2003
If categorical, use the mode. “Average” is meaningless; look at “percentages” of occurrences.
If variable is quantitative, first look at a graph:
Skewed or outliers? Skewed or outliers?
More or less symmetric? More or less symmetric?
Use medianUse median..
Use meanUse mean..
When should each estimator be used?
What type of variable is it?
M07-Numerical Summaries 1 17 Department of ISM, University of Alabama, 1995-2003
Numerical Summary
Location Variation
MeanMedianMode
RangeStd. DeviationIQR
M07-Numerical Summaries 1 18 Department of ISM, University of Alabama, 1995-2003
Mountain Climbing Rope.Two suppliers; sample and
test three ropes from each.
“Snap Breaking Strength”
Why does variation matter?
M07-Numerical Summaries 1 19 Department of ISM, University of Alabama, 1995-2003
Measures of Variation
1. Range
2. Variance & Standard Deviation
3. Mean Absolute Deviation (Mad)
4. Interquartile Range (IQR)
M07-Numerical Summaries 1 20 Department of ISM, University of Alabama, 1995-2003
Highest minus lowest value in the sample.
1. Range
M07-Numerical Summaries 1 21 Department of ISM, University of Alabama, 1995-2003
Example 4: 3, 4, 1, 7, 4,
5
1 2 3 4 5 6 7
Example 5: 1, 1, 1, 7, 7,
7
1 2 3 4 5 6 7
Range =
Range =
M07-Numerical Summaries 1 22 Department of ISM, University of Alabama, 1995-2003
Advantage: _________
_________________.
Disadvantage:
_______ most of the data.
______________ to outliers.
Range
M07-Numerical Summaries 1 23 Department of ISM, University of Alabama, 1995-2003
How far are the data from the middle, on average?
2. Variance & Standard Deviation
Sample Variance = s2
Sample Std. Dev. = sPopulation Variance = 2
Population Std. Dev. =
Notation:
M07-Numerical Summaries 1 24 Department of ISM, University of Alabama, 1995-2003
Example 4: 3, 4, 1, 7, 4, 5
1 2 3 4 5 6 7
M07-Numerical Summaries 1 25 Department of ISM, University of Alabama, 1995-2003
We need to keep the negatives from canceling the positives.
We can do this by 1. _____________, ______
2. _____________, _____
Note: The average of the deviations from the mean will always be zero.
M07-Numerical Summaries 1 26 Department of ISM, University of Alabama, 1995-2003
Equation for Variance:
2 = N
(Xi - )2
(see page 88)
For a population:
s2 = n - 1
(Xi - X)2
For a sample:
M07-Numerical Summaries 1 27 Department of ISM, University of Alabama, 1995-2003
Equation for Variance:
s2 = n - 1
(Xi - X)2(see page 88)
=
= units?
Example 4 data:
(3-4)2 + (4-4)2 + (1-4)2 + (7-4)2 + (4-4)2 + (5-4)2
6 - 1=
M07-Numerical Summaries 1 28 Department of ISM, University of Alabama, 1995-2003
Equations for Variance:
(see page 88)s2 = n - 1
(Xi - X)2
(see page 90)sX Xn
n 12 i
2 2 =
sX
( X )n
n 12 i
2 i2
=
1.
2.
3.
M07-Numerical Summaries 1 29 Department of ISM, University of Alabama, 1995-2003
Example 4: 3, 4, 1, 7, 4, 5X3417
4 524
X9
161
491625
116
2
X =
X2 =
X =
X2 =
M07-Numerical Summaries 1 30 Department of ISM, University of Alabama, 1995-2003
sX
( X )n
n 12
i2 i
2
=s2
6 - 1= 4.0
M07-Numerical Summaries 1 31 Department of ISM, University of Alabama, 1995-2003
• Both equations shouldgive the same answer.
• First is easier when data and the mean are integers.
• Second is easier for larger data sets, or data not integer.
• More chance of round-off errorwith first equation.
Comments
M07-Numerical Summaries 1 32 Department of ISM, University of Alabama, 1995-2003
Advantage: ________________; ________________.
Disadvantages: Units are _________.
____ resistant to outliers.
Variance
M07-Numerical Summaries 1 33 Department of ISM, University of Alabama, 1995-2003
Standard DeviationS = S
2 “The square root of the variance.”
= 4.0
= 2.0
Advantage: Easier to interpret than variance, Units same as data.
M07-Numerical Summaries 1 34 Department of ISM, University of Alabama, 1995-2003
3. Mean Absolute Deviation, MAD
MAD = xi – N
This will be used extensively in OM 300
for population data
MAD = xi – x n
for sample data
(see page 87)
M07-Numerical Summaries 1 35 Department of ISM, University of Alabama, 1995-2003
IQR = Q - Q 13
IQR is the range of the middle 50% of the data.
Observations more than 1.5 IQR’s beyond quartiles are considered outliers.
4. Interquartile Range (IQR)
Mor
e on
this
late
r.
C07-Numerical Summaries 1 36 Department of ISM, University of Alabama, 1995-2003
Statistical Inference
Generalizing from a sample to a population,
by using a statisticto estimate
a parameter.
C07-Numerical Summaries 1 37 Department of ISM, University of Alabama, 1995-2003
ParameterParameterStatisticStatistic
Mean:
Standard deviation:
Proportion:
s
X
estimates
estimates
estimatesp
from sample
from entire population
C07-Numerical Summaries 1 38 Department of ISM, University of Alabama, 1995-2003
Descriptive
NumericalGraphical
Statistics
C07-Numerical Summaries 1 39 Department of ISM, University of Alabama, 1995-2003
Estimate the true mean net weight of Estimate the true mean net weight of 16 oz. bags of Golden Flake Potato Chips 16 oz. bags of Golden Flake Potato Chips with a 95% confidence interval. with a 95% confidence interval.
16.0516.0115.9215.6816.1016.0115.7215.8016.2115.70
15.9516.2416.0215.9016.0716.0516.1815.4516.0416.05
Measured Weights in ounces.
Use MinitabUse Minitab
Is the filling machinedoing what it shouldbe doing?
Is the filling machinedoing what it shouldbe doing?
(Not real (Not real
data)data)
Example 5:
C07-Numerical Summaries 1 40 Department of ISM, University of Alabama, 1995-2003
Data window
name of worksheet file
Most commonlyused features.
Session window
C07-Numerical Summaries 1 41 Department of ISM, University of Alabama, 1995-2003
“Stat”
“Basic Statistics ”
“Display descriptive statistics”
C07-Numerical Summaries 1 42 Department of ISM, University of Alabama, 1995-2003
C07-Numerical Summaries 1 43 Department of ISM, University of Alabama, 1995-2003
Results for: c07 Weight of chips.MTW
Descriptive Statistics: Weights
Variable N Mean Median TrMean StDev SE Mean
Weights 20 15.958 16.015 15.970 0.199 0.045
Variable Minimum Maximum Q1 Q3
Weights 15.450 16.240 15.825 16.065
Executing from file: C:\Program Files\MTBWIN\MACROS\Describe.MAC
Descriptive Statistics Graph: Weights
“Session Window” results“Session Window” results
“Five number” summary
Histogram with Normal distribution
curve superimposed
Box plot
“95% Confidence Interval”
for the population mean.
____, because 16.000 is a plausible value for the truepopulation mean.
____, because 16.000 is a plausible value for the truepopulation mean.
A confidence interval gives the limits of the plausible values of the true population mean, .
A confidence interval gives the limits of the plausible values of the true population mean, .
Our sample mean was 15.957 oz.This is less than 16.000.Should we be concerned?
Our sample mean was 15.957 oz.This is less than 16.000.Should we be concerned?
“95% Confidence Interval”
for the population mean.