1 of 45 how many samples do i need? part 1 presenter: sebastian tindall 60 minutes (15 minute 1st...

45
1 of 45 How Many Samples do I Need? Part 1 Presenter: Sebastian Tindall 60 minutes (15 minute 1st Afternoon Break) DQO Training Course Day 1 Module 4

Upload: neil-lucas

Post on 03-Jan-2016

225 views

Category:

Documents


5 download

TRANSCRIPT

1 of 45

How Many Samples do I Need?Part 1

Presenter: Sebastian Tindall

60 minutes(15 minute 1st Afternoon Break)

DQO Training CourseDay 1

Module 4

2 of 45

Topics to Discuss in Module 4

How many samples based on

– Census

– Sampling Types of decision error Definitions of common statistical terms

3 of 45

How Many Samples do I Need?

n = 5

Quick & Dirty Method n = (total $) ($ per sample)

Budget Method

4 of 45

What is the underlying variationin the materialbeing sampled?

How Many Samples do I Need?

It depends!What is the decision?

What is thetolerance formistakes?

How will thedata be used?

5 of 45

How Many Samples do I Need?

(The Real Answer)

Just Enough!

6 of 45

How Many Samples do I Need?

REMEMBER:

HETEROGENEITY

IS THE RULE!

7 of 45

Decisions with Absolute Certainty

Requires knowing the “true condition” of the population in question

– Perform a census Collect and analyze every possible

member of the population in question

8 of 45

Population

– Universe of items (elements) within the spatial boundary

All the possible soil samples in the Smith’s backyard

All the people in the U.S.A.

– Translation: you have to count/measure (sample) EVERY single member of the population

Decisions with Absolute Certainty (cont.)

9 of 45

Football Field

One-AcreFootball Field

30'0"

10 of 45

Number of Samples in a One-Acre Field

...there are = 1,000,000 possible surface soil samples in a one-acre field.

If one surface soil sample = 2.5” x 2.5” x 6”deep, then….

The perimeter of a one-acre field measures 272.25 feet by 160 feet.

How many surfacesoil samples can I take from a one-acre field?

11 of 45

Cost of Sampling Entire One-Acre Field

How much would itcost to know thetrue condition ofthe one-acre field?

If it costs $3000 to test one surface soil sample, it would cost $3,000,000,000 to test all possible populationunits.

12 of 45

Testing All Possible Samples

CENSUS Testing all possible population units

(samples) is the ONLY way to know the true condition of the site with absolute certainty

However, time and money considerations usually prevent us from doing this

13 of 45

Decisions with Absolute Certainty

Perform a census

– totally impractical Therefore, we can never make a decision

with absolute certainty So what’s left to do?

14 of 45

Testing a Few Samples(from the larger population)

ESTIMATION Estimates of the true condition of the site are usually

made from a few (representative) samples– Taking a few samples (making a few

measurements) and using them to represent the site – Make inferences (even sweeping claims) about the

population of interest based on these few samples

15 of 45

The Process of Estimation

An estimate is just an educated guess based on incomplete information

Educated guesses will be wrong, to some degree

In other words, the process of estimation contains inherent errors

16 of 45

Estimation Errors

Are NOT mistakes. They do not suggest that anything was done improperly

Are an inherent part of the process of estimation

Are simply deviations from the true condition of the site

Introduce uncertainty into the decision-making process

17 of 45

Consequences of Uncertainty

Decision errors are true mistakes Examples:

– Walking away from a dirty site

– Cleaning up a clean site Decision errors can be managed

Estimation Errors Decision Errors

18 of 45

Decision Errors

Are acceptable or tolerable …within limits We set tolerable limits on the percentage of

time we are willing to:

– Walk away from a dirty site

– Clean up a clean site

19 of 45

Where do errors occur?

Planning

Sampling

Analysis

Data Vs

Decision

20 of 45

Definition of Terms Population

– Everyone or everything of interest

– Example: All the people in this class

Sample

– Some subset of the population

– Example: Five people randomly chosen from the class

21 of 45

Definition of Terms Population Parameter

– The true value of the population characteristic (e.g., age) that can only be known if all possible samples are measured

– Example: true mean age of all the people in the class, calculated using data from every member of the population

Sample Statistic– The estimated value of

the population characteristic that is calculated from sample data

– Example: estimate of the true mean age of all people in the class, calculated using data from a subset (sample) of the population

22 of 45

Comparison Population Parameter

– Represents “true condition” of the population

– Decisions can be made with 100% certainty (0% uncertainty)

Sample Statistic– Represents

“estimated condition” of the population

– Decision cannot be made with 100% certainty

23 of 45

Class Question? What is the true mean age in this class?

What is the estimated mean age in this class?– Randomly select 5 ages

2nd estimated mean age in this class?– Randomly select 15 ages

(See Computer Age Demo)

24 of 45

True Mean Age of All the People in This Class

In this case - where we are only interested in measuring a small group of people who are all in the same room at the same time - it is not too difficult to determine the true mean age with 100% certainty. But:

– What if some people failed to respond?

– What if some people “fudged” a little?

– What if some of the response forms got lost?

25 of 45

Types of Decision Errors

Before we can talk about acceptable limits for making decision errors, we must first understand what correct decisions and decision errors look like and define some terms

There are two types of correct decisions and two types of decision errors that can be made

26 of 45

Chance of

Deciding Site is Dirty

1.0

0.5

0.0

6 pCi/g

Action Level

Low True Mean 226Ra concentration High

Ideal Decision Rule

Graph of Perfect Decision Making

27 of 45

Chance of

Deciding Site is Dirty

1.0

0.5

0.0

6 pCi/g

Action Level

Low True Mean 226Ra Concentration High

Typical Curve

Graph of Typical Decision Making

28 of 45

Site is dirtySite is clean

100

True State of Site

Alternative Action

Walk away from site Clean up site

75

Probability of deciding that the site

is dirty

0.0

0.5

1.0

Action LevelLower Bound of Gray Region

Typical Curve

Null Hypothesis:

The Site is dirty.

Decision Performance

GoalDiagram

True mean COPC Concentration

The Gray Region

29 of 45

Is Site dirty?Is Site clean?

Decision-Making Procedure:

Apply Decision Rule

Alternative Action

Walk away from site Clean up site

95 UCL% COPC Concentration

DL

PSQ

∞75

X A

100

Action Level

95

UCL 1A UCL 1B

110

30 of 45

Is Site dirty?Is Site clean?

Decision-Making Procedure:

Apply Decision Rule

Alternative Action

Walk away from site Clean up site

95 UCL% COPC Concentration

DL

PSQ

∞110

X B

100

Action Level

UCL B

120

31 of 45

Is Site dirty?Is Site clean?

100

Decision-Making Procedure: Apply Decision Rule

Alternative Action

Walk away from site Clean up site

Action Level

95 UCL% COPC Concentration

DL

PSQ

Conclusion:Site is dirty.

Action:Clean up a dirty site.A correct decision.

Sample Mean UCL

True Mean

Deviation

32 of 45

Is Site dirty?Is Site clean?

100

Decision-Making Procedure: Apply Decision Rule

Alternative Action

Walk away from site Clean up site

Action Level

95 UCL% COPC Concentration

DL

PSQ

Sample Mean UCL

True Mean

Conclusion:Site is clean.

Action:Walk away from a dirty site.An incorrect decision.

Deviation

33 of 45

Is Site dirty?Is Site clean?

100

Decision-Making Procedure: Apply Decision Rule

Alternative Action

Walk away from site Clean up site

Action Level

95 UCL% COPC Concentration

DL

PSQ

Sample Mean UCL

True Mean

Conclusion:

Site is clean.

Action:Walk awayfrom aclean site.A correct decision.

Deviation

34 of 45

Is Site dirty?Is Site clean?

100

Decision-Making Procedure: Apply Decision Rule

Alternative Action

Walk away from site Clean up site

Action Level

95 UCL% COPC Concentration

DL

PSQ

Sample Mean UCL

True Mean

Deviation

Conclusion:Site is dirty.

Action:Clean up a clean site.An incorrect decision.

35 of 45

100

True State of Site

Alternative Action

75

Probability of deciding that the True Mean is greater that or equal to the Action Level

0.0

0.5

1.0

True Mean

Sample Mean UCL

Deviation

Action LevelLower Bound of Gray Region

Null Hypothesis:

The Site is dirty.

Walk away from site Clean up site

True mean COPC Concentration

Site is dirtySite is clean

The Gray Region

When the True Mean iswell above the Action Level...

... and it is highly likely that we will correctly decide to clean up a dirty site.

... then there should be high a probability that the Sample Mean UCL will also be above the Action Level...

36 of 45

1.0

Null Hypothesis:

The Site is dirty.

100

True State of Site

Alternative Action

75

Probability of deciding

that the site is dirty

0.0

0.5

True Mean

Sample Mean UCL

Deviation

Action LevelLower Bound of Gray Region

Site is dirtySite is clean

The Gray Region

... then there should be a very low probability that the Sample Mean UCL will be above the Action Level...

Walk away from site Clean up site

True mean COPC Concentration... and it is highly unlikelythat we will incorrectlydecide to clean up a clean site.

If the True Meanis well below the LowerBound of the Gray Region...

37 of 45

Null Hypothesis:

The Site is dirty.

100

True State of Site

Alternative Action

75

Probability of deciding that the site is dirty

0.0

0.5

1.0

True Mean

Sample Mean UCL

Deviation

Action LevelLower Bound of Gray Region

Walk away from site Clean up site

True mean COPC Concentration

... and that we will agree to incorrectly decide to clean upa clean site.

Site is dirtySite is cleanWhen the True Meanis IN the gray region…..

... then there is anincreased probability that the Sample Mean UCL will be above the Action Level...

The Gray Region

38 of 45

Site is dirtySite is clean

100

True State of Site

Alternative ActionWalk away from site Clean up site

75

Probability of deciding

that the site is dirty

0.0

0.5

1.0

Action LevelLower Bound of Gray Region

Typical Curve

Null Hypothesis:

The Site is dirty.

Decision Performance

GoalDiagram

True mean COPC Concentration

The Gray Region

39 of 45

Sampling and

Analyses Cost

Unnecessary Disposal and/or

Cleanup Cost

$ $

Sampling and

Analyses Cost

Threatto Public Health

and Environment

$ $

PRP 1 Focus Regulatory 1 Focus

Managing Uncertainty is a Balancing Act

40 of 45

Key Points

We will never know the true condition of the site - time and money prevent this

Therefore we must estimate the true condition through sampling

Estimates based on samples are not factual statements about the site. They are educated guesses

Estimates must be in error - because they use incomplete information

41 of 45

Errors are not mistakes - just deviations from the truth

Errors (deviations) introduce uncertainty into the decision-making process

Errors and uncertainty can be managed so that you can still get the job done and prove that you did it

Key Points (cont.)

42 of 45

The DQO Process is designed to help you manage uncertainty and:

– Get the job done efficiently

– Prove that you did it defensibly

Key Points (cont.)

43 of 45

Primary Benefit of the DQO Process:

Managing uncertainty through

“FAILING TO PLAN…..IS PLANNING TO FAIL”

44 of 45

How Many Samples do I Need?

REMEMBER:

HETEROGENEITY

IS THE RULE!

45 of 45

Summary of Parts 1, 2, 3 will be at the end of Module 6

End of Module 4

Questions?

Thank you

We will now take a 15 minute break.Please be back in 15 minutes.