dicky dermawan [email protected]
TRANSCRIPT
Nature & Purpose of Statistics
In statistics we are concerned with method for designing and evaluating experiments to obtain information about practical problems.
In most cases the inspection of each item of population would be too expensive, time-consuming, or even impossible. Hence a few of sample are drawn at random and from this inspection conclusion about the population are inferred.
Parameter Estimation
POPULATION SAMPLE
n
yyˆ
n
1ii
1n
yySˆ
2n
1ii
22
N
1jjj )x(fxMean Average
VarianceVariance )x(fx jj
2j
2
Size large number N
Size small number n
Probability function/density f(x)
Relative frequency function
Distribution function F(x) Cumulative frequency function
Processing of Sample:Frequency Table
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
320 380 340 410 380 340 360 350 320 370350 340 350 360 370 350 380 370 300 420370 390 390 440 330 390 330 360 400 370320 350 360 340 340 350 350 390 380 340400 360 350 390 400 350 360 340 370 420420 400 350 370 330 320 390 380 400 370390 330 360 380 350 330 360 300 360 360360 390 350 370 370 350 390 370 370 340370 400 360 350 380 380 360 340 330 370340 360 390 400 370 410 360 400 340 360
Processing of Sample:Absolute Frequency Function
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
300 310 320 330 340 350 360 370 380 390 400 410 420 430 4400
2
4
6
8
10
12
14
16
18
Tensile Strength
Absolu
te F
requency
Processing of Sample:Relative Frequency Function
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
00.020.040.060.080.1
0.120.140.160.18
Tensile Strength
Rela
tive F
req
uen
cy
Processing of Sample:Cumulative Absolute Frequency
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
0
20
40
60
80
100
120
Tensile Strength
Cu
mu
lati
ve A
bso
lute
Fre
qu
en
cy
Processing of Sample:Cumulative Relative Frequency
Sample of 100 Values of the Splitting Tensile Strength (lb/in2)
300
320
340
360
380
400
420
440
00.10.20.30.40.50.60.70.80.9
1
Tensile Strength
Cu
mu
lati
ve R
ela
tive
Fre
qu
en
cy
Processing of Sample: Box & Whisker Plot
Min
Lower Quartile
Middle Quartile = Median
Upper Quartile
Interquartile range
Max
DOX 6E Montgomery 10
Assignment: Box & Whisker PlotPortland Cement Formulation
Inferential Statistic
Experimental errorHypothesis testing: null hypothesis, alternative hypothesis
Type I error α : rejecting a true hypothesisType II error β: accepting a false hypohesis
One-tail test vs Two-tail testConfidence level = Significance LevelP-valueConfidence interval
Confidence Interval of a normal distribution if σ known: Comparing a single mean to a specified/’standar’ value
If Y1, …….Yn are independent normal random variables each of which has mean and variance σ2, then the normal random variable:
Is normal with the mean and variance σ2/n and the random variable
Is normal with the mean 0 and variance 1
The confidence interval for is
)Y....YYY(n
1Y n321
n/
YZ
So far we have regarded the value y1, y2, ….of a sample as n observed value of a single random variable Y. We may equally well regard these n values as single observations of n random variables Y1, Y2,….that have the same distribution and are independent
n
cy
n
cyCONF
Problem: Example 2.1
A vendor submits lots of fabric to a textile manufacturer.
The manufacturer wants to know if the lot average breaking strength exceeds 200 psi. If so, she wants to accept the lot.
Past experience indicates that a reasonable value for the variance of breaking strength is 100 (psi)2.
Four speciments are randomly selected, and the average breaking strength observed is
psi. 214y
Example 2.1The hypothesis to be tested are:
This is a one-sided alternative hypothesis The value of the test statistic is:
If the confidence level of 95% is chosen, i.e. type I error α = 0.05, we find Zα = 1.645
Thus the difference is significant: H0 is rejected and we conclude that the lot average breaking strength exceeds 200 psi.
Thus, we accept the lot.
The confidence interval for at 95% confidence level is 205.8 ≤ ≤ 222.2. Clearly, 200 is outside the interval.
The P-value is 0.0026.
200:H
200:H
1
0
80.2n/
yZ 0
0
Problems
Problems
Comparing a single mean to a specified/’standar’ value Confidence Interval of a normal distribution if σ unknown
n/S
Yt 0
0
n
Sty
n
StyCONF
The test statistic is
The confidence interval is
At (n-1) degree of freedom
The same as previous, but we use…..t distribution instead of normal distribution
Sample standard deviation S instead of σ tNormal
S
Comparing 2 Treatments MeansIf Variance Known
2
22
1
21
210
nn
yyZ
The test statistic is
The confidence interval is
1
21
1
21
2/21211
21
1
21
2/21 nnZyyyy
nnZyyCONF
Normal
nnn
y
2
22
1
21
2
Comparing 2 Treatments MeansIf Variance Unknown, but σ1
2 = σ22
1 20
1 2
The test statistic is
1 1
p
y yt
Sn n
)1n()1n(
S)1n(S)1n(S
21
222
2112
P
Choose confidence level, usually 95%, then find critical t value at associated degree of freedom, i.e. t/2,
If |t0|> t /2,, we have enough reason to reject null hypothesis and conclude that the two method differ significantlyAlternatively, calculate P value, i.e. the risk of wrongly rejecting the null hypothesisOr set confidence interval and reject null hypothesis if 0 is not included in the interval
2nn 21
2nn,tNormal
n
1
n
1S
n
y
21
21P
2
Comparing 2 Treatments MeansIf Variance Unknown, σ1
2 ≠ σ22
1n1n
2
nS
nS
2
22
1
21
2
2
2
2n
22S
1
2
1n
21S
2
22
1
21
tNormal
n
S
n
S
n
y
2
22
1
21
210
nS
nS
yyt
The test statistic is
Problems
DOX 6E Montgomery 22
Example:Portland Cement Formulation – Dot Diagram
Tension bond strength of portland cement mortar is an important characteristics of the product. An engineer is interested in comparing the strength of a modified formulation in which polymer latex emulsions have been added during mixing to the strength of the unmodified mortar. He collected 10 observations (Table 2.1)
Plot the dot diagram.Plot the Box & Whisker plot
Are the two formulations really different?
Or perhaps the observed difference is the results of sampling fluctuation and the two formulations are really identical?
Problems
Problems
Problems
Inference about the difference in meansBloking: Paired Comparison Design
Bloking is a design technique used to improve the precision with which the comparisons among the factors of interest are made. Often blocking is used to reduce or eliminate the variability transmitted from nuisance factors, i.e. factors that may influence the experimental response but in which we are not interested.
The term block refers to a relatively homogeneous experimental unit, and the block represents a restriction on complete randomization because the treatment combinations are only randomized within the block. Blocking is carried out by making comparisons within matched pairs of experimental material.
The confidence interval based on paired analysis usually much narrower than that from the independent analysis. This illustrates the noise reduction property of blocking.
Inference about the difference in meansBloking: Paired Comparison Design
Statistical model 4 complete randomization:
with (2ni -1) degree of freedom
Statistical model with blocking:
with only (ni pair -1) degree of freedom
The test statistic:
The confidence interval for 2-sided test:
n/S
dt
d
0
i
ijiij n1,2,...,j
1,2iy
i
ijjiij n1,2,...,j
1,2iy
j2j1j yyd
n/Std dn,2/
Inference about the difference in meansBloking: Paired Comparison DesignExample: The Story
Consider a hardness testing machine that presses a rod with a pointed tip into a metal specimen with a known force. Two different tips are available for this machine, and it is suspected that one tip produces different hardness readings than the other.
The test could be performed as follows: a number of metal specimens could randomly be selected. Half are tested by tip 1 and the other half by tip 2.
The metal specimens might be cut from different bar stock that were not exactly different in their hardness. To protect against this possibility, an alternative experimental design should be considered: divide each specimen into two part and randomly assign each tip to ½ of each specimen
Inference about the difference in meansBloking: Paired Comparison DesignExample: Data
- Use the paired data to determine a 95% confidence interval for the difference- What if we use pooled or independent analysis?
Speciment Tip 1 Tip 21 7 62 3 33 3 54 4 35 8 86 3 27 2 48 9 99 5 410 4 5
Problems
Problems
Inference aboutThe Variances of Normal Distributions
In some experiments it is the comparison of variability in the data that is important.For example, in chemical laboratories, we may wish to compare the variability of two analytical methods.Unlike the tests on means, the procedures for tests on variances are rather sensitive to the normality assumption.
Suppose we wish to test the hypothesis weather or not the variance of a normal population equals a constant, viz. σ0
2 . The test statistic is:
The appropriate distribution for 02 is chi-square distribution with (n-1) degree of freedom. The confidence interval for σ0
2 is
20
2
20
20
S)1n(SS
2
1n,1
22
2
1n,
2
22
S)1n(S)1n(
Inference about The Variances of Normal DistributionsSuppose we wish to test equality of the variances of two normal populations. If independent random samples of size n1 and n2 are taken from populations 1 & 2, respectively, the test statistic for:
Is the ratio of the sample variances:
The appropriate distribution for F0 is the F distribution with (n1-1)
numerator degree of freedom and (n2-1) denominator degree of freedom. The null hypothesis would be rejected if F0 > Fα/2,n1-1,n2-1
The confidence interval for σ12 / σ2
2 is
22
21
0S
SF
1n,1n,22
21
22
21
1n,1n,122
21
122122F
S
SF
S
S
20
21
20
20
:H
:H
1n,1n,1n,1n,1
122
212 F
1F:Note
Checking for Normality:Normal Probability Plot
Probability plotting is a graphical technique for determining whether sample data conform to a hypothesized distribution based on a subjective visual examination of the data.
To construct a probability plot, the observation in the sample are first rank from smallest to largest. That is, the sample y1,y2,…,yn is arranged as y(1) ,y(2) ,….,y(n) where y(1) is the smallest observation, with y(n) the largest. The ordered observations y(j) are then plotted against their observed cumulative frequency (j-0.5)/n. The cumulative frequency scale has been arranged so that if the hypothesized distribution adequately describes the data, the plotted points will fall approximately along a straight line. Usually, this is subjective.
Problems
Problems
Problems
DOX 6E Montgomery 40
Introduction to DOX
An experiment is a test or a series of tests
Experiments are used widely in the engineering world Process characterization & optimization Evaluation of material properties Product design & development Component & system tolerance determination
“All experiments are designed experiments, some are poorly designed, some are well-designed”
DOX 6E Montgomery 41
The Basic Principles of DOX Randomization
Running the trials in an experiment in random order
Notion of balancing out effects of “lurking” variables
Replication Sample size (improving precision of effect
estimation, estimation of error or background noise)
Replication versus repeat measurements? (see page 13)
Blocking Dealing with nuisance factors
DOX 6E Montgomery 42
Strategy of Experimentation “Best-guess” experiments
Used a lot More successful than you might suspect, but
there are disadvantages… One-factor-at-a-time (OFAT)
experiments Sometimes associated with the “scientific” or
“engineering” method Devastated by interaction, also very inefficient
Statistically designed experiments Based on Fisher’s factorial concept