1 of 45 how many samples do i need? part 1 presenter: sebastian tindall 60 minutes (15 minute 1st...
TRANSCRIPT
1 of 45
How Many Samples do I Need?Part 1
Presenter: Sebastian Tindall
60 minutes(15 minute 1st Afternoon Break)
DQO Training CourseDay 1
Module 4
2 of 45
Topics to Discuss in Module 4
How many samples based on
– Census
– Sampling Types of decision error Definitions of common statistical terms
3 of 45
How Many Samples do I Need?
n = 5
Quick & Dirty Method n = (total $) ($ per sample)
Budget Method
4 of 45
What is the underlying variationin the materialbeing sampled?
How Many Samples do I Need?
It depends!What is the decision?
What is thetolerance formistakes?
How will thedata be used?
7 of 45
Decisions with Absolute Certainty
Requires knowing the “true condition” of the population in question
– Perform a census Collect and analyze every possible
member of the population in question
8 of 45
Population
– Universe of items (elements) within the spatial boundary
All the possible soil samples in the Smith’s backyard
All the people in the U.S.A.
– Translation: you have to count/measure (sample) EVERY single member of the population
Decisions with Absolute Certainty (cont.)
10 of 45
Number of Samples in a One-Acre Field
...there are = 1,000,000 possible surface soil samples in a one-acre field.
If one surface soil sample = 2.5” x 2.5” x 6”deep, then….
The perimeter of a one-acre field measures 272.25 feet by 160 feet.
How many surfacesoil samples can I take from a one-acre field?
11 of 45
Cost of Sampling Entire One-Acre Field
How much would itcost to know thetrue condition ofthe one-acre field?
If it costs $3000 to test one surface soil sample, it would cost $3,000,000,000 to test all possible populationunits.
12 of 45
Testing All Possible Samples
CENSUS Testing all possible population units
(samples) is the ONLY way to know the true condition of the site with absolute certainty
However, time and money considerations usually prevent us from doing this
13 of 45
Decisions with Absolute Certainty
Perform a census
– totally impractical Therefore, we can never make a decision
with absolute certainty So what’s left to do?
14 of 45
Testing a Few Samples(from the larger population)
ESTIMATION Estimates of the true condition of the site are usually
made from a few (representative) samples– Taking a few samples (making a few
measurements) and using them to represent the site – Make inferences (even sweeping claims) about the
population of interest based on these few samples
15 of 45
The Process of Estimation
An estimate is just an educated guess based on incomplete information
Educated guesses will be wrong, to some degree
In other words, the process of estimation contains inherent errors
16 of 45
Estimation Errors
Are NOT mistakes. They do not suggest that anything was done improperly
Are an inherent part of the process of estimation
Are simply deviations from the true condition of the site
Introduce uncertainty into the decision-making process
17 of 45
Consequences of Uncertainty
Decision errors are true mistakes Examples:
– Walking away from a dirty site
– Cleaning up a clean site Decision errors can be managed
Estimation Errors Decision Errors
18 of 45
Decision Errors
Are acceptable or tolerable …within limits We set tolerable limits on the percentage of
time we are willing to:
– Walk away from a dirty site
– Clean up a clean site
20 of 45
Definition of Terms Population
– Everyone or everything of interest
– Example: All the people in this class
Sample
– Some subset of the population
– Example: Five people randomly chosen from the class
21 of 45
Definition of Terms Population Parameter
– The true value of the population characteristic (e.g., age) that can only be known if all possible samples are measured
– Example: true mean age of all the people in the class, calculated using data from every member of the population
Sample Statistic– The estimated value of
the population characteristic that is calculated from sample data
– Example: estimate of the true mean age of all people in the class, calculated using data from a subset (sample) of the population
22 of 45
Comparison Population Parameter
– Represents “true condition” of the population
– Decisions can be made with 100% certainty (0% uncertainty)
Sample Statistic– Represents
“estimated condition” of the population
– Decision cannot be made with 100% certainty
23 of 45
Class Question? What is the true mean age in this class?
What is the estimated mean age in this class?– Randomly select 5 ages
2nd estimated mean age in this class?– Randomly select 15 ages
(See Computer Age Demo)
24 of 45
True Mean Age of All the People in This Class
In this case - where we are only interested in measuring a small group of people who are all in the same room at the same time - it is not too difficult to determine the true mean age with 100% certainty. But:
– What if some people failed to respond?
– What if some people “fudged” a little?
– What if some of the response forms got lost?
25 of 45
Types of Decision Errors
Before we can talk about acceptable limits for making decision errors, we must first understand what correct decisions and decision errors look like and define some terms
There are two types of correct decisions and two types of decision errors that can be made
26 of 45
Chance of
Deciding Site is Dirty
1.0
0.5
0.0
6 pCi/g
Action Level
Low True Mean 226Ra concentration High
Ideal Decision Rule
Graph of Perfect Decision Making
27 of 45
Chance of
Deciding Site is Dirty
1.0
0.5
0.0
6 pCi/g
Action Level
Low True Mean 226Ra Concentration High
Typical Curve
Graph of Typical Decision Making
28 of 45
Site is dirtySite is clean
100
True State of Site
Alternative Action
Walk away from site Clean up site
75
Probability of deciding that the site
is dirty
0.0
0.5
1.0
Action LevelLower Bound of Gray Region
Typical Curve
Null Hypothesis:
The Site is dirty.
Decision Performance
GoalDiagram
True mean COPC Concentration
The Gray Region
29 of 45
Is Site dirty?Is Site clean?
Decision-Making Procedure:
Apply Decision Rule
Alternative Action
Walk away from site Clean up site
95 UCL% COPC Concentration
DL
PSQ
∞75
X A
100
Action Level
95
UCL 1A UCL 1B
110
30 of 45
Is Site dirty?Is Site clean?
Decision-Making Procedure:
Apply Decision Rule
Alternative Action
Walk away from site Clean up site
95 UCL% COPC Concentration
DL
PSQ
∞110
X B
100
Action Level
UCL B
120
31 of 45
Is Site dirty?Is Site clean?
100
Decision-Making Procedure: Apply Decision Rule
Alternative Action
Walk away from site Clean up site
Action Level
95 UCL% COPC Concentration
DL
PSQ
∞
Conclusion:Site is dirty.
Action:Clean up a dirty site.A correct decision.
Sample Mean UCL
True Mean
Deviation
32 of 45
Is Site dirty?Is Site clean?
100
Decision-Making Procedure: Apply Decision Rule
Alternative Action
Walk away from site Clean up site
Action Level
95 UCL% COPC Concentration
DL
PSQ
∞
Sample Mean UCL
True Mean
Conclusion:Site is clean.
Action:Walk away from a dirty site.An incorrect decision.
Deviation
33 of 45
Is Site dirty?Is Site clean?
100
Decision-Making Procedure: Apply Decision Rule
Alternative Action
Walk away from site Clean up site
Action Level
95 UCL% COPC Concentration
DL
PSQ
∞
Sample Mean UCL
True Mean
Conclusion:
Site is clean.
Action:Walk awayfrom aclean site.A correct decision.
Deviation
34 of 45
Is Site dirty?Is Site clean?
100
Decision-Making Procedure: Apply Decision Rule
Alternative Action
Walk away from site Clean up site
Action Level
95 UCL% COPC Concentration
DL
PSQ
∞
Sample Mean UCL
True Mean
Deviation
Conclusion:Site is dirty.
Action:Clean up a clean site.An incorrect decision.
35 of 45
100
True State of Site
Alternative Action
75
Probability of deciding that the True Mean is greater that or equal to the Action Level
0.0
0.5
1.0
True Mean
Sample Mean UCL
Deviation
Action LevelLower Bound of Gray Region
Null Hypothesis:
The Site is dirty.
Walk away from site Clean up site
True mean COPC Concentration
Site is dirtySite is clean
The Gray Region
When the True Mean iswell above the Action Level...
... and it is highly likely that we will correctly decide to clean up a dirty site.
... then there should be high a probability that the Sample Mean UCL will also be above the Action Level...
36 of 45
1.0
Null Hypothesis:
The Site is dirty.
100
True State of Site
Alternative Action
75
Probability of deciding
that the site is dirty
0.0
0.5
True Mean
Sample Mean UCL
Deviation
Action LevelLower Bound of Gray Region
Site is dirtySite is clean
The Gray Region
... then there should be a very low probability that the Sample Mean UCL will be above the Action Level...
Walk away from site Clean up site
True mean COPC Concentration... and it is highly unlikelythat we will incorrectlydecide to clean up a clean site.
If the True Meanis well below the LowerBound of the Gray Region...
37 of 45
Null Hypothesis:
The Site is dirty.
100
True State of Site
Alternative Action
75
Probability of deciding that the site is dirty
0.0
0.5
1.0
True Mean
Sample Mean UCL
Deviation
Action LevelLower Bound of Gray Region
Walk away from site Clean up site
True mean COPC Concentration
... and that we will agree to incorrectly decide to clean upa clean site.
Site is dirtySite is cleanWhen the True Meanis IN the gray region…..
... then there is anincreased probability that the Sample Mean UCL will be above the Action Level...
The Gray Region
38 of 45
Site is dirtySite is clean
100
True State of Site
Alternative ActionWalk away from site Clean up site
75
Probability of deciding
that the site is dirty
0.0
0.5
1.0
Action LevelLower Bound of Gray Region
Typical Curve
Null Hypothesis:
The Site is dirty.
Decision Performance
GoalDiagram
True mean COPC Concentration
The Gray Region
39 of 45
Sampling and
Analyses Cost
Unnecessary Disposal and/or
Cleanup Cost
$ $
Sampling and
Analyses Cost
Threatto Public Health
and Environment
$ $
PRP 1 Focus Regulatory 1 Focus
Managing Uncertainty is a Balancing Act
40 of 45
Key Points
We will never know the true condition of the site - time and money prevent this
Therefore we must estimate the true condition through sampling
Estimates based on samples are not factual statements about the site. They are educated guesses
Estimates must be in error - because they use incomplete information
41 of 45
Errors are not mistakes - just deviations from the truth
Errors (deviations) introduce uncertainty into the decision-making process
Errors and uncertainty can be managed so that you can still get the job done and prove that you did it
Key Points (cont.)
42 of 45
The DQO Process is designed to help you manage uncertainty and:
– Get the job done efficiently
– Prove that you did it defensibly
Key Points (cont.)
43 of 45
Primary Benefit of the DQO Process:
Managing uncertainty through
“FAILING TO PLAN…..IS PLANNING TO FAIL”