dicky dermawan [email protected]

Statistika & Rancangan Percobaan

Dicky [email protected]

2- Statistics

http://www.dickydermawan.net78.net/

Nature & Purpose of Statistics

In statistics we are concerned with method for designing and evaluating experiments to obtain information about practical problems.

In most cases the inspection of each item of population would be too expensive, time-consuming, or even impossible. Hence a few of sample are drawn at random and from this inspection conclusion about the population are inferred.

Parameter Estimation

POPULATION SAMPLE

n

yyˆ

n

1ii

1n

yySˆ

2n

1ii

22

N

1jjj )x(fxMean Average

VarianceVariance )x(fx jj

2j

2

Size large number N

Size small number n

Probability function/density f(x)

Relative frequency function

Distribution function F(x) Cumulative frequency function

Processing of Sample:Frequency Table

Sample of 100 Values of the Splitting Tensile Strength (lb/in2)

320 380 340 410 380 340 360 350 320 370350 340 350 360 370 350 380 370 300 420370 390 390 440 330 390 330 360 400 370320 350 360 340 340 350 350 390 380 340400 360 350 390 400 350 360 340 370 420420 400 350 370 330 320 390 380 400 370390 330 360 380 350 330 360 300 360 360360 390 350 370 370 350 390 370 370 340370 400 360 350 380 380 360 340 330 370340 360 390 400 370 410 360 400 340 360

Processing of Sample:Absolute Frequency Function


300 310 320 330 340 350 360 370 380 390 400 410 420 430 4400

2

4

6

8

10

12

14

16

18

Tensile Strength

Absolu

te F

requency

Processing of Sample:Relative Frequency Function


300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

00.020.040.060.080.1

0.120.140.160.18

Tensile Strength

Rela

tive F

req

uen

cy

Processing of Sample:Cumulative Absolute Frequency


300

310

320

330

340

350

360

370

380

390

400

410

420

430

440

0

20

40

60

80

100

120

Tensile Strength

Cu

mu

lati

ve A

bso

lute

Fre

qu

en

cy

Processing of Sample:Cumulative Relative Frequency


300

320

340

360

380

400

420

440

00.10.20.30.40.50.60.70.80.9

1

Tensile Strength

Cu

mu

lati

ve R

ela

tive

Fre

qu

en

cy

Processing of Sample: Box & Whisker Plot

Min

Lower Quartile

Middle Quartile = Median

Upper Quartile

Interquartile range

Max

DOX 6E Montgomery 10

Assignment: Box & Whisker PlotPortland Cement Formulation

Inferential Statistic

Experimental errorHypothesis testing: null hypothesis, alternative hypothesis

Type I error α : rejecting a true hypothesisType II error β: accepting a false hypohesis

One-tail test vs Two-tail testConfidence level = Significance LevelP-valueConfidence interval

Confidence Interval of a normal distribution if σ known: Comparing a single mean to a specified/’standar’ value

If Y1, …….Yn are independent normal random variables each of which has mean and variance σ2, then the normal random variable:

Is normal with the mean and variance σ2/n and the random variable

Is normal with the mean 0 and variance 1

The confidence interval for is

)Y....YYY(n

1Y n321

n/

YZ

So far we have regarded the value y1, y2, ….of a sample as n observed value of a single random variable Y. We may equally well regard these n values as single observations of n random variables Y1, Y2,….that have the same distribution and are independent

n

cy

n

cyCONF

Problem: Example 2.1

A vendor submits lots of fabric to a textile manufacturer.

The manufacturer wants to know if the lot average breaking strength exceeds 200 psi. If so, she wants to accept the lot.

Past experience indicates that a reasonable value for the variance of breaking strength is 100 (psi)2.

Four speciments are randomly selected, and the average breaking strength observed is

psi. 214y

Example 2.1The hypothesis to be tested are:

This is a one-sided alternative hypothesis The value of the test statistic is:

If the confidence level of 95% is chosen, i.e. type I error α = 0.05, we find Zα = 1.645

Thus the difference is significant: H0 is rejected and we conclude that the lot average breaking strength exceeds 200 psi.

Thus, we accept the lot.

The confidence interval for at 95% confidence level is 205.8 ≤ ≤ 222.2. Clearly, 200 is outside the interval.

The P-value is 0.0026.

200:H

200:H

1

0

80.2n/

yZ 0

0

Problems

Comparing a single mean to a specified/’standar’ value Confidence Interval of a normal distribution if σ unknown

n/S

Yt 0

0

n

Sty

n

StyCONF

The test statistic is

The confidence interval is

At (n-1) degree of freedom

The same as previous, but we use…..t distribution instead of normal distribution

Sample standard deviation S instead of σ tNormal

S

Comparing 2 Treatments MeansIf Variance Known

2

22

1

21

210

nn

yyZ


The confidence interval is

1

21

1

21

2/21211

21

1

21

2/21 nnZyyyy

nnZyyCONF

Normal

nnn

y

2

22

1

21

2

Comparing 2 Treatments MeansIf Variance Unknown, but σ1

2 = σ22

1 20

1 2


1 1

p

y yt

Sn n

)1n()1n(

S)1n(S)1n(S

21

222

2112

P

Choose confidence level, usually 95%, then find critical t value at associated degree of freedom, i.e. t/2,

If |t0|> t /2,, we have enough reason to reject null hypothesis and conclude that the two method differ significantlyAlternatively, calculate P value, i.e. the risk of wrongly rejecting the null hypothesisOr set confidence interval and reject null hypothesis if 0 is not included in the interval

2nn 21

2nn,tNormal

n

1

n

1S

n

y

21

21P

2

Comparing 2 Treatments MeansIf Variance Unknown, σ1

2 ≠ σ22

1n1n

2

nS

nS

2

22

1

21

2

2

2

2n

22S

1

2

1n

21S

2

22

1

21

tNormal

n

S

n

S

n

y

2

22

1

21

210

nS

nS

yyt


Problems


Example:Portland Cement Formulation – Dot Diagram

Tension bond strength of portland cement mortar is an important characteristics of the product. An engineer is interested in comparing the strength of a modified formulation in which polymer latex emulsions have been added during mixing to the strength of the unmodified mortar. He collected 10 observations (Table 2.1)

Plot the dot diagram.Plot the Box & Whisker plot

Are the two formulations really different?

Or perhaps the observed difference is the results of sampling fluctuation and the two formulations are really identical?

Problems

Inference about the difference in meansBloking: Paired Comparison Design

Bloking is a design technique used to improve the precision with which the comparisons among the factors of interest are made. Often blocking is used to reduce or eliminate the variability transmitted from nuisance factors, i.e. factors that may influence the experimental response but in which we are not interested.

The term block refers to a relatively homogeneous experimental unit, and the block represents a restriction on complete randomization because the treatment combinations are only randomized within the block. Blocking is carried out by making comparisons within matched pairs of experimental material.

The confidence interval based on paired analysis usually much narrower than that from the independent analysis. This illustrates the noise reduction property of blocking.

Inference about the difference in meansBloking: Paired Comparison Design

Statistical model 4 complete randomization:

with (2ni -1) degree of freedom

Statistical model with blocking:

with only (ni pair -1) degree of freedom

The test statistic:

The confidence interval for 2-sided test:

n/S

dt

d

0

i

ijiij n1,2,...,j

1,2iy

i

ijjiij n1,2,...,j

1,2iy

j2j1j yyd

n/Std dn,2/

Inference about the difference in meansBloking: Paired Comparison DesignExample: The Story

Consider a hardness testing machine that presses a rod with a pointed tip into a metal specimen with a known force. Two different tips are available for this machine, and it is suspected that one tip produces different hardness readings than the other.

The test could be performed as follows: a number of metal specimens could randomly be selected. Half are tested by tip 1 and the other half by tip 2.

The metal specimens might be cut from different bar stock that were not exactly different in their hardness. To protect against this possibility, an alternative experimental design should be considered: divide each specimen into two part and randomly assign each tip to ½ of each specimen

Inference about the difference in meansBloking: Paired Comparison DesignExample: Data

- Use the paired data to determine a 95% confidence interval for the difference- What if we use pooled or independent analysis?

Speciment Tip 1 Tip 21 7 62 3 33 3 54 4 35 8 86 3 27 2 48 9 99 5 410 4 5

Problems

Inference aboutThe Variances of Normal Distributions

In some experiments it is the comparison of variability in the data that is important.For example, in chemical laboratories, we may wish to compare the variability of two analytical methods.Unlike the tests on means, the procedures for tests on variances are rather sensitive to the normality assumption.

Suppose we wish to test the hypothesis weather or not the variance of a normal population equals a constant, viz. σ0

2 . The test statistic is:

The appropriate distribution for 02 is chi-square distribution with (n-1) degree of freedom. The confidence interval for σ0

2 is

20

2

20

20

S)1n(SS

2

1n,1

22

2

1n,

2

22

S)1n(S)1n(

Inference about The Variances of Normal DistributionsSuppose we wish to test equality of the variances of two normal populations. If independent random samples of size n1 and n2 are taken from populations 1 & 2, respectively, the test statistic for:

Is the ratio of the sample variances:

The appropriate distribution for F0 is the F distribution with (n1-1)

numerator degree of freedom and (n2-1) denominator degree of freedom. The null hypothesis would be rejected if F0 > Fα/2,n1-1,n2-1

The confidence interval for σ12 / σ2

2 is

22

21

0S

SF

1n,1n,22

21

22

21

1n,1n,122

21

122122F

S

SF

S

S

20

21

20

20

:H

:H

1n,1n,1n,1n,1

122

212 F

1F:Note

Checking for Normality:Normal Probability Plot

Probability plotting is a graphical technique for determining whether sample data conform to a hypothesized distribution based on a subjective visual examination of the data.

To construct a probability plot, the observation in the sample are first rank from smallest to largest. That is, the sample y1,y2,…,yn is arranged as y(1) ,y(2) ,….,y(n) where y(1) is the smallest observation, with y(n) the largest. The ordered observations y(j) are then plotted against their observed cumulative frequency (j-0.5)/n. The cumulative frequency scale has been arranged so that if the hypothesized distribution adequately describes the data, the plotted points will fall approximately along a straight line. Usually, this is subjective.

Problems


Introduction to DOX

An experiment is a test or a series of tests

Experiments are used widely in the engineering world Process characterization & optimization Evaluation of material properties Product design & development Component & system tolerance determination

“All experiments are designed experiments, some are poorly designed, some are well-designed”


The Basic Principles of DOX Randomization

Running the trials in an experiment in random order

Notion of balancing out effects of “lurking” variables

Replication Sample size (improving precision of effect

estimation, estimation of error or background noise)

Replication versus repeat measurements? (see page 13)

Blocking Dealing with nuisance factors


Strategy of Experimentation “Best-guess” experiments

Used a lot More successful than you might suspect, but

there are disadvantages… One-factor-at-a-time (OFAT)

experiments Sometimes associated with the “scientific” or

“engineering” method Devastated by interaction, also very inefficient

Statistically designed experiments Based on Fisher’s factorial concept

dicky dermawan [email protected]

Documents