should we forget noecs?
TRANSCRIPT
sample size, variability, and effect size) provides the a levelwith the lowest possible combination of Type I and II errorsfor a given study design (Figure 1A) (if we assume that the apriori probabilities of the alternative and null hypothesesare equal). For cases where the costs of Type I and II errors areknown to be unequal, a can instead be set to minimize theprobability-weighted cost of Type I and Type II errors atthe critical effect size (CIaþCIIb, where CI and CII representthe relative costs of Type I and Type II errors) to provide thea level with the lowest possible combined cost of Type I andII errors for a given study design (Figure 1B). As estimates ofrelative costs of Type I and Type II errors are subjective,minimizing average costs of Type I and II errors, instead ofminimizing average probabilities of Type I and II errors,should only be considered when there is a clear and defensiblea priori justification that 1 type of error has higher costs thanthe other. Optimal a can be calculated for any test for whichstatistical power can be calculated. R-code has been devel-oped to perform the iterative steps necessary to calculate anoptimal a for t tests, linear regression, and ANOVA, and isavailable from the authors.
IMPLICATIONS AND FUTURE RESEARCHThe a–b optimization method provides transparency of
Type I and Type II error rates and yields a clear representationof the amount of evidence provided by a study (Mudge et al.2012). Alpha–beta optimization can also allow sample sizes
to be estimated for a desired average probability or costof error. A cost-based approach has important implicationsfor research questions where the costs of mistakes can bereasonably estimated because it provides an objectiveapproach to decision-making that minimizes the dollars spenton mistakes. Where the costs of errors are more subjectiveand difficult to estimate, one important consequence ofsetting an optimal a is that it forces stakeholders to confrontdisagreements about the magnitude of the critical effect sizeand the relative costs of Type I and II errors (providing theoption to minimize probabilities of error when consensus onrelative costs cannot be reached). Implementation of optimala would demand dramatic changes in the way we approachhypothesis testing, would accelerate progress in an alreadyburgeoning area—identifying critical effect sizes—and wouldencourage increased attention to the almost unexaminedquestion of relative costs of Type I and Type II errors in pureand applied studies.
Acknowledgment—We thank Kelly Munkittrick, RemyRochette, Thijs Bosker, Peter Chapman, and attendees ofthe 2010 Aquatic Toxicology Workshop (Oct 3–6; Toronto,Ontario) for discussions and encouragement that aided in thedevelopment of this work.
REFERENCESDayton PK. 1998. Reversal of the burden of proof in fisheries management. Science
279:821–822.
Field SA, Tyre AJ, Jonzen N, Rhodes JR, Possingham HP. 2004. Minimizing the cost
of environmental management decisions by optimizing statistical thresholds.
Ecol Lett 7:669–675.
Mapstone BD. 1995. Scalable decision rules for environmental impact studies:
effect size, Type 1, and Type 2 errors. Ecol Appl 5:401–410.
Mudge JF, Baker LF, Edge CB, Houlahan JE. 2012. Setting an optimal a that
minimizes errors in null hypothesis significance tests. PLoS One 7:e32734.
Munkittrick KR, Arens CJ, Lowell RB, Kaminski GP. 2009. A review of potential
methods of determining critical effect size for designing environmental
monitoring programs. Environ Toxicol Chem 28:1361–1371.
Newman MC. 2008. What exactly are you inferring? A closer look at hypothesis
testing. Environ Toxicol Chem 27:1013–1019.
SHOULD WE FORGET NOECs?
Francisco Sanchez-Bayo*
University of Technology Sydney, Lidcombe, Australia
DOI: 10.1002/ieam.1312
In recent months there has been a great deal of criticismabout the no-observed effect concentration (NOEC) andrelated concepts (e.g., lowest-observed effect concentration,LOEC) in the two SETAC journals, Environmental Toxicologyand Chemistry (ET&C) and Integrated Environmental Assess-ment and Management (IEAM), to the point that a ban hasbeen proposed to their use in the scientific literature (Landisand Chapman 2011). Although not everyone agrees onimposing a ban (Fox 2012), it is my understanding that unlesssuch drastic measures are implemented, most authors willcontinue to estimate and use those outdated concepts.
The debate on this topic is overdue, as it is now 2 decadessince the statistical flaws of such concepts were pointed
564 Integr Environ Assess Manag 8, 2012—PM Chapman, Editor
0
0.1
0.2
0.3
0.4
0.5
0 0.1 0.2 0.3 0.4 0.5
(α+β
)/2
α
A
0
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3 0.4 0.5
CI*
α+C
II*β
α
B
Figure 1. Influence of a on (A) the average of the probabilities of Type I and II
error, and (B) the probability-weighted costs of Type I and II errors when Type I
errors are 4 times as costly as Type II errors (dotted line) and when Type I errors
are 1/4 as costly as Type II errors (dashed line). Data shown are for an
independent, 2-sample, 2-tailed t test with n1¼n2¼ 20 and a critical effect
size equal to the pooled standard deviation. Drop lines indicate optimal a
levels under each circumstance.
out (Skalski 1981). As a scientific community we still seemincapable of changing our way of doing things. And if wecannot change our entrenched bad habits, can we expectregulators to do otherwise?
Some critics have blamed a lack of understanding amongmost ecotoxicologists about the limitations of NOECs andLOECs and their statistical flaws as the main reason for nothaving changed direction (Warne and van Dam 2008). Wehave reached this situation because most textbooks ontoxicology teach how to estimate these measures of toxicitywithout any criticism. Correcting this problem may require: i)teaching undergraduate students the pitfalls of these proce-dures, while teaching them new alternative approaches andmethods for assessing dose-response relationships (Olmsteadand LeBlanc 2005; Fox 2010; Ashauer et al. 2011), and ii) re-training staff at universities, toxicity testing laboratories,chemical laboratories, government departments and agenciesinvolved in ecotoxicology and risk assessment. Some of uswould be happy to provide such teaching and training.
However, training alone will not solve the problemof continued use of NOECs and LOECs in scientificpublications. As Jager (2012) indicates, there are alreadylarge numbers of NOEC values estimated for thousands ofchemical compounds in the literature and in databases (e.g.,ECOTOX). Should we ignore this information built upon somany years of research? Maybe not. Despite all theirstatistical flaws, many NOEC values are similar to theno-effect concentration (NEC) values estimated by othertechniques. For instance, Fox (2010) reports that 3 of 4‘no effect estimates’ for waters from a mine in northernAustralia obtained by traditional NOEC methods werecomparable to calculated NEC values. The implication isclear: existing NOEC and LOEC data could be sufficientlyreliable in many cases to be used in risk assessments, and,therefore, should be allowed in regulatory assessmentswhenever more reliable data (e.g., NEC, EC10) are lackingor unavailable.
Perhaps we should read carefully what Landis and Chap-man (2011) said about their proposed ban. They called on theeditors of the SETAC journals to ban ‘‘statistical hypothesistests for the reporting of exposure-response from theirjournals.’’ The ban should be for publishing new NOEC orLOEC values that have been derived in accordance with thetraditional methods. It shouldn’t apply, as I understand it,when authors use such data (taken from databases, forinstance) to justify some point in their research. For example,in a recent work aimed at evaluating suitable endpoints forassessing the impacts of 2 insecticides at the community level(Sanchez-Bayo and Goka 2012), we compared the protectivelevels estimated by our methods with those derived fromspecies sensitivity distributions (SSDs) of the insecticides.We collected both LC50 and NOEC data from the ECOTOXdatabase to generate the respective SSDs using BurrliOz(http://www.cmis.csiro.au/envir/burrlioz/) and comparedthe protective values derived from these distributions withthose obtained by our methods. We found that NOECswere under-protective by a factor of 10 for one of 2insecticides, whereas for the other one there was nosignificant difference in the protective levels estimated byeither method.
Unfortunately, regulation has been and still is the driverthat encourages most ecotoxicologists to use NOECs. Whilereviewing recently a set of ecotoxicity testing protocols for
a regulatory agency, it struck me that the authors keptestimating NOEC values for all the surrogate species theyused. It seems they did so because they wanted to comparethe NOEC values estimated for the species they tested withNOEC values available for other species in other countries.Were they unaware of the various warnings concerning theuse of this metric? Or rather, did they regard the use ofNOECs as plausible for the regulatory purpose they wereintended for, in spite of other statistical or ‘academic’considerations? As Jager (2012) has indicated, the Organ-ization for Economic Co-operation and Development, theEuropean Chemicals Agency, and the International Organ-isation for Standardization all recommend the phasing out ofNOECs in their regulatory advisories. So, what can be done toensure that the very people involved in producing regulatoryprotocols abide by the new rules? I do not have an answer tothis crucial question, but I am sure readers will come up withsome practical solutions.
Coming back to my initial query, what can be done toextirpate the bad habits among us? As an advocate of usingalternative measures to assess toxicity and toxic relationships(e.g. time-event toxicity protocols - Ashauer et al. 2011), Igive my support to the ban proposed by Landis and Chapman(2011) in their own terms, i.e., as quoted above. In fact, Iwould like the ban to be extended to other journals thatpublish environmental toxicology research, like Ecotoxicology,Ecotoxicology and Environmental Safety, Archives of Environ-mental Contamination and Toxicology, etc. Can I suggest thatour Society approaches the editors of such journals to call forthis ban? Having said this, I would warn the editors to beaware of the difficulties in stopping the publication of NOECand LOEC data that are already there in the public domain.Science progresses step by step, using former concepts anddata as stepping stones for building new concepts and betterdata. Discarding all what has been done in the past would dono good to ecotoxicology.
REFERENCESAshauer R, Agatz A, Albert C, Ducrot V, Galic N, Hendriks J, Jager T, Kretschmann
A, O’Connor I, Rubach MN, and others. 2011. Toxicokinetic-toxicodynamic
modeling of quantal and graded sublethal endpoints: A brief discussion of
concepts. Environ Toxicol Chem 30:2519–2524.
Fox DR. 2010. A Bayesian approach for determining the no effect concentration
and hazardous concentration in ecotoxicology. Ecotoxicol Environ Saf 73:
123–131.
Fox DR. 2012. Response to Landis and Chapman. 2011. Integr Environ Assess
Manag 8:4.
Jager T. 2012. Bad habits die hard: The NOEC’s persistence reflects poorly on
ecotoxicology. Environ Toxicol Chem 31:228–229.
Landis WG, Chapman PM. 2011. Well past time to stop using NOELs and LOELs.
Integr Environ Assess Manag 7(4):vi–viii.
Olmstead AW, LeBlanc GA. 2005. Toxicity assessment of environmentally relevant
pollutant mixtures using a heuristic model. Integr Environ Assess Manag 1:114–122.
Sanchez-Bayo F, Goka K. 2012. Evaluation of suitable endpoints for assessing
the impacts of toxicants at the community level. Ecotoxicology 21:667–
680.
Skalski JR. 1981. Statistical inconsistencies in the use of no-observed-effect-levels in
toxicity testing. In: Branson DR, Dickson KL, (eds) Aquatic Toxicology and
Hazard Evaluation. Philadelphia (PA): American Society for Testing and
Materials, pp 337–387.
Warne MSJ, van Dam R. 2008. NOEC and LOEC data should no longer be
generated or used. Australas J Ecotoxicol 14:1–5.
Integr Environ Assess Manag 8, 2012—PM Chapman, Editor 565