  • sample size, variability, and effect size) provides the a levelwith the lowest possible combination of Type I and II errorsfor a given study design (Figure 1A) (if we assume that the apriori probabilities of the alternative and null hypothesesare equal). For cases where the costs of Type I and II errors areknown to be unequal, a can instead be set to minimize theprobability-weighted cost of Type I and Type II errors atthe critical effect size (CIaCIIb, where CI and CII representthe relative costs of Type I and Type II errors) to provide thea level with the lowest possible combined cost of Type I andII errors for a given study design (Figure 1B). As estimates ofrelative costs of Type I and Type II errors are subjective,minimizing average costs of Type I and II errors, instead ofminimizing average probabilities of Type I and II errors,should only be considered when there is a clear and defensiblea priori justication that 1 type of error has higher costs thanthe other. Optimal a can be calculated for any test for whichstatistical power can be calculated. R-code has been devel-oped to perform the iterative steps necessary to calculate anoptimal a for t tests, linear regression, and ANOVA, and isavailable from the authors.

    IMPLICATIONS AND FUTURE RESEARCHThe ab optimization method provides transparency of

    Type I and Type II error rates and yields a clear representationof the amount of evidence provided by a study (Mudge et al.2012). Alphabeta optimization can also allow sample sizes

    to be estimated for a desired average probability or costof error. A cost-based approach has important implicationsfor research questions where the costs of mistakes can bereasonably estimated because it provides an objectiveapproach to decision-making that minimizes the dollars spenton mistakes. Where the costs of errors are more subjectiveand difcult to estimate, one important consequence ofsetting an optimal a is that it forces stakeholders to confrontdisagreements about the magnitude of the critical effect sizeand the relative costs of Type I and II errors (providing theoption to minimize probabilities of error when consensus onrelative costs cannot be reached). Implementation of optimala would demand dramatic changes in the way we approachhypothesis testing, would accelerate progress in an alreadyburgeoning areaidentifying critical effect sizesand wouldencourage increased attention to the almost unexaminedquestion of relative costs of Type I and Type II errors in pureand applied studies.

    AcknowledgmentWe thank Kelly Munkittrick, RemyRochette, Thijs Bosker, Peter Chapman, and attendees ofthe 2010 Aquatic Toxicology Workshop (Oct 36; Toronto,Ontario) for discussions and encouragement that aided in thedevelopment of this work.

    Francisco Sanchez-Bayo*

    University of Technology Sydney, Lidcombe, Australia


    DOI: 10.1002/ieam.1312

    In recent months there has been a great deal of criticismabout the no-observed effect concentration (NOEC) andrelated concepts (e.g., lowest-observed effect concentration,LOEC) in the two SETAC journals, Environmental Toxicologyand Chemistry (ET&C) and Integrated Environmental Assess-ment and Management (IEAM), to the point that a ban hasbeen proposed to their use in the scientic literature (Landisand Chapman 2011). Although not everyone agrees onimposing a ban (Fox 2012), it is my understanding that unlesssuch drastic measures are implemented, most authors willcontinue to estimate and use those outdated concepts.

    The debate on this topic is overdue, as it is now 2 decadessince the statistical aws of such concepts were pointed

    0 0.1 0.2 0.3 0.4 0.5










    0 0.1 0.2 0.3 0.4 0.5





    Figure 1. Inuence of a on (A) the average of the probabilities of Type I and IIerror, and (B) the probability-weighted costs of Type I and II errors when Type I

    errors are 4 times as costly as Type II errors (dotted line) and when Type I errors

    are 1/4 as costly as Type II errors (dashed line). Data shown are for an

    independent, 2-sample, 2-tailed t test with n1n2 20 and a critical effectsize equal to the pooled standard deviation. Drop lines indicate optimal a

    levels under each circumstance.

  • out (Skalski 1981). As a scientic community we still seemincapable of changing our way of doing things. And if wecannot change our entrenched bad habits, can we expectregulators to do otherwise?

    Some critics have blamed a lack of understanding amongmost ecotoxicologists about the limitations of NOECs andLOECs and their statistical aws as the main reason for nothaving changed direction (Warne and van Dam 2008). Wehave reached this situation because most textbooks ontoxicology teach how to estimate these measures of toxicitywithout any criticism. Correcting this problem may require: i)teaching undergraduate students the pitfalls of these proce-dures, while teaching them new alternative approaches andmethods for assessing dose-response relationships (Olmsteadand LeBlanc 2005; Fox 2010; Ashauer et al. 2011), and ii) re-training staff at universities, toxicity testing laboratories,chemical laboratories, government departments and agenciesinvolved in ecotoxicology and risk assessment. Some of uswould be happy to provide such teaching and training.

    However, training alone will not solve the problemof continued use of NOECs and LOECs in scienticpublications. As Jager (2012) indicates, there are alreadylarge numbers of NOEC values estimated for thousands ofchemical compounds in the literature and in databases (e.g.,ECOTOX). Should we ignore this information built upon somany years of research? Maybe not. Despite all theirstatistical aws, many NOEC values are similar to theno-effect concentration (NEC) values estimated by othertechniques. For instance, Fox (2010) reports that 3 of 4no effect estimates for waters from a mine in northernAustralia obtained by traditional NOEC methods werecomparable to calculated NEC values. The implication isclear: existing NOEC and LOEC data could be sufcientlyreliable in many cases to be used in risk assessments, and,therefore, should be allowed in regulatory assessmentswhenever more reliable data (e.g., NEC, EC10) are lackingor unavailable.

    Perhaps we should read carefully what Landis and Chap-man (2011) said about their proposed ban. They called on theeditors of the SETAC journals to ban statistical hypothesistests for the reporting of exposure-response from theirjournals. The ban should be for publishing new NOEC orLOEC values that have been derived in accordance with thetraditional methods. It shouldnt apply, as I understand it,when authors use such data (taken from databases, forinstance) to justify some point in their research. For example,in a recent work aimed at evaluating suitable endpoints forassessing the impacts of 2 insecticides at the community level(Sanchez-Bayo and Goka 2012), we compared the protectivelevels estimated by our methods with those derived fromspecies sensitivity distributions (SSDs) of the insecticides.We collected both LC50 and NOEC data from the ECOTOXdatabase to generate the respective SSDs using BurrliOz( and comparedthe protective values derived from these distributions withthose obtained by our methods. We found that NOECswere under-protective by a factor of 10 for one of 2insecticides, whereas for the other one there was nosignicant difference in the protective levels estimated byeither method.

    Unfortunately, regulation has been and still is the driverthat encourages most ecotoxicologists to use NOECs. Whilereviewing recently a set of ecotoxicity testing protocols for

    a regulatory agency, it struck me that the authors keptestimating NOEC values for all the surrogate species theyused. It seems they did so because they wanted to comparethe NOEC values estimated for the species they tested withNOEC values available for other species in other countries.Were they unaware of the various warnings concerning theuse of this metric? Or rather, did they regard the use ofNOECs as plausible for the regulatory purpose they wereintended for, in spite of other statistical or academicconsiderations? As Jager (2012) has indicated, the Organ-ization for Economic Co-operation and Development, theEuropean Chemicals Agency, and the International Organ-isation for Standardization all recommend the phasing out ofNOECs in their regulatory advisories. So, what can be done toensure that the very people involved in producing regulatoryprotocols abide by the new rules? I do not have an answer tothis crucial question, but I am sure readers will come up withsome practical solutions.

    Coming back to my initial query, what can be done toextirpate the bad habits among us? As an advocate of usingalternative measures to assess toxicity and toxic relationships(e.g. time-event toxicity protocols - Ashauer et al. 2011), Igive my support to the ban proposed by Landis and Chapman(2011) in their own terms, i.e., as quoted above. In fact, Iwould like the ban to be extended to other journals thatpublish environmental toxicology research, like Ecotoxicology,Ecotoxicology and Environmental Safety, Archives of Environ-mental Contamination and Toxicology, etc. Can I suggest thatour Society approaches the editors of such journals to call forthis ban? Having said this, I would warn the editors to beaware of the difculties in stopping the publication of NOECand LOEC data that are already there in the public domain.Science progresses step by step, using former concepts anddata as stepping stones for building new concepts and betterdata. Discarding all what has been done in the past would dono good to ecotoxicology.

    Integr Environ Assess Manag 8, 2012PM Chapman, Editor 565