weight-of-evidence evaluation in environmental assessment: review of qualitative and quantitative...

7
Review Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches Igor Linkov a, , Drew Loney a,b , Susan Cormier c , F. Kyle Satterstrom d , Todd Bridges a a US Army Engineer Research and Development Center, 3909 Halls Ferry Rd, Vicksburg, MS 39180, United States b Massachussetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139-4307, United States c US Environmental Protection Agency, National Center for Environmental Assessment, 26 W. Martin Luther King Drive, Cincinnati, OH 45268, United States d Harvard University School of Engineering and Applied Sciences, 29 Oxford St., Cambridge, MA 02138, United States abstract article info Article history: Received 6 November 2008 Received in revised form 27 April 2009 Accepted 4 May 2009 Available online 19 July 2009 Keywords: Weight of evidence Environmental risk assessment Ecological risk assessment Human health risk assessment Multi-criteria decision analysis Assessments of human health and ecological risk draw upon multiple types and sources of information, requiring the integration of multiple lines of evidence before conclusions may be reached. Risk assessors often make use of weight-of-evidence (WOE) approaches to perform the integration, whether integrating evidence concerning potential carcinogenicity, toxicity, and exposure from chemicals at a contaminated site, or evaluating processes concerned with habitat loss or modication when managing a natural resource. Historically, assessors have relied upon qualitative WOE approaches, such as professional judgment, or limited quantitative methods, such as direct scoring, to develop conclusions from multiple lines of evidence. Current practice often lacks transparency resulting in risk estimates lacking quantied uncertainty. This paper reviews recent applications of weight of evidence used in human health and ecological risk assessment. Applications are sorted based on whether the approach relies on qualitative and quantitative methods in order to reveal trends in the use of the term weight of evidence, especially as a means to facilitate structured and transparent development of risk conclusions from multiple lines of evidence. Published by Elsevier B.V. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5199 2. Weight of evidence as a regulatory tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5200 3. Literature review and classication methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5200 3.1. Weight of evidence method classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5200 3.2. Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5202 4. WOE application statistics and trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5202 5. Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5203 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5204 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5204 1. Introduction The scientic community has long operated without a dened and transparent process for integrating different, and sometimes conict- ing, sources of information in reaching conclusions about a specic phenomenon or question (Good, 1991). In fact, different scientic disciplines have adopted different methods for developing, analyzing, and combining information, which presents an additional level of challenge when an assessment involves multiple disciplines (Gough, 2007). Some practices, such as data quality assessment, systematic scientic literature review, and peer review, are common elements of all scientic disciplines. While these common elements make important contributions to the overall decision process, they do not represent a comprehensive and structured approach for integrating information and lines of evidence. The process of synthesizing heterogeneous information and forming conclusions requires exercis- ing judgment (Good, 1991); in view of the complex problems and decisions motivating human health and ecological risk assessments, both supporting and evaluating those judgments through the Science of the Total Environment 407 (2009) 51995205 Corresponding author. Tel.: +1 617 233 9869; fax: +1 6016342263. E-mail address: [email protected] (I. Linkov). 0048-9697/$ see front matter. Published by Elsevier B.V. doi:10.1016/j.scitotenv.2009.05.004 Contents lists available at ScienceDirect Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

Upload: igor-linkov

Post on 12-Sep-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches

Science of the Total Environment 407 (2009) 5199–5205

Contents lists available at ScienceDirect

Science of the Total Environment

j ourna l homepage: www.e lsev ie r.com/ locate /sc i totenv

Review

Weight-of-evidence evaluation in environmental assessment: Review of qualitativeand quantitative approaches

Igor Linkov a,⁎, Drew Loney a,b, Susan Cormier c, F. Kyle Satterstrom d, Todd Bridges a

a US Army Engineer Research and Development Center, 3909 Halls Ferry Rd, Vicksburg, MS 39180, United Statesb Massachussetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139-4307, United Statesc US Environmental Protection Agency, National Center for Environmental Assessment, 26 W. Martin Luther King Drive, Cincinnati, OH 45268, United Statesd Harvard University School of Engineering and Applied Sciences, 29 Oxford St., Cambridge, MA 02138, United States

⁎ Corresponding author. Tel.: +1 617 233 9869; fax: +E-mail address: [email protected] (I. Linko

0048-9697/$ – see front matter. Published by Elsevierdoi:10.1016/j.scitotenv.2009.05.004

a b s t r a c t

a r t i c l e i n f o

Article history:Received 6 November 2008Received in revised form 27 April 2009Accepted 4 May 2009Available online 19 July 2009

Keywords:Weight of evidenceEnvironmental risk assessmentEcological risk assessmentHuman health risk assessmentMulti-criteria decision analysis

Assessments of human health and ecological risk draw upon multiple types and sources of information,requiring the integration of multiple lines of evidence before conclusions may be reached. Risk assessorsoften make use of weight-of-evidence (WOE) approaches to perform the integration, whether integratingevidence concerning potential carcinogenicity, toxicity, and exposure from chemicals at a contaminated site,or evaluating processes concerned with habitat loss or modification when managing a natural resource.Historically, assessors have relied upon qualitative WOE approaches, such as professional judgment, orlimited quantitative methods, such as direct scoring, to develop conclusions from multiple lines of evidence.Current practice often lacks transparency resulting in risk estimates lacking quantified uncertainty. Thispaper reviews recent applications of weight of evidence used in human health and ecological riskassessment. Applications are sorted based on whether the approach relies on qualitative and quantitativemethods in order to reveal trends in the use of the termweight of evidence, especially as a means to facilitatestructured and transparent development of risk conclusions from multiple lines of evidence.

Published by Elsevier B.V.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51992. Weight of evidence as a regulatory tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52003. Literature review and classification methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5200

3.1. Weight of evidence method classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52003.2. Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5202

4. WOE application statistics and trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52025. Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5203Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5204References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5204

1. Introduction

The scientific community has long operated without a defined andtransparent process for integrating different, and sometimes conflict-ing, sources of information in reaching conclusions about a specificphenomenon or question (Good, 1991). In fact, different scientificdisciplines have adopted different methods for developing, analyzing,and combining information, which presents an additional level of

1 601 6342263.v).

B.V.

challenge when an assessment involves multiple disciplines (Gough,2007). Some practices, such as data quality assessment, systematicscientific literature review, and peer review, are common elements ofall scientific disciplines. While these common elements makeimportant contributions to the overall decision process, they do notrepresent a comprehensive and structured approach for integratinginformation and lines of evidence. The process of synthesizingheterogeneous information and forming conclusions requires exercis-ing judgment (Good, 1991); in view of the complex problems anddecisions motivating human health and ecological risk assessments,both supporting and evaluating those judgments through the

Page 2: Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches

5200 I. Linkov et al. / Science of the Total Environment 407 (2009) 5199–5205

application of consistent and transparent analytical methods willstrengthen risk conclusions and resulting decisions.

Developing individual lines of evidence from available data toaddress a specific question requires describing the degree to whichthose lines of evidence support a specific conclusion or alternativeconclusions.Weight of evidence (WOE) can be defined as a frameworkfor synthesizing individual lines of evidence, using methods that areeither qualitative (examining distinguishing attributes) or quantita-tive (measuring aspects in terms of magnitude) to develop conclu-sions regarding questions concerned with the degree of impairmentor risk. In general, qualitative methods include presentation ofindividual lines of evidence without an attempt at integration, orintegration through a standardized evaluation of individual lines ofevidence based on qualitative considerations. Quantitative methodsinclude integration of multiple lines of evidence using weighting,ranking, or indexing as well as structured decision or statisticalmodels.

The concept of WOE is familiar and therefore potentially easy tocommunicate to decisionmakers and the public. Its familiarity may bedue to the fact that comparisons of alternative causes, actions, oreffects seem to be inherent in human cognition (Wilson and Bar-Anan,2008; for a historical review, see Good, 1991). To illustrate the point,consider that the Greeks embodied the concept of justice in the formof a goddess often depicted holding a scale. In Western societies, WOEhas been directly embraced by judiciaries for more than 300 years(Jackson, 1996), with the ubiquitous standards of evaluating evidencebased on qualities such as preponderance, clear and convincing, andbeyond a reasonable doubt (Krimsky, 2005). However, judicial WOE ispredicated on individuals aggregating information subjectively, ratherthan using formal analytical procedures.

The theoretical foundation for integrating lines of evidence usingstatistical theory formulation was advanced by Bernoulli (1738),Pierce (1878), Good (1991) and others. This work focused on the useof probability and other statistical techniques to formulate WOE as ameasure of how individual data adds to or subtracts from evidence ofrisk. Good champions a Bayesian approach in which new data areused to update prior information. However, implementation of thesestatistical methods requires assigning probabilities of impairment forall individual lines of evidence, which may be extremely difficult forpractical applications with uncertain or variable data interactingthrough unknown mechanisms (Good, 1991).

Use of the term WOE in the environmental literature ranges fromcasual and vague remarks to quantitatively well-defined analyticalmethods. WOE is often used in a descriptive way without sufficientproof for an argument (Weed, 2005). In other instances, WOE hasbeen used in a qualitative sense to refer to a preponderance ofevidence, such as when literature reviews are used in reachingdeterminations as to whether a chemical is a carcinogen (Goodmanet al., 2006). Some questions have motivated use of more quantitativemethods, such as Indexing, Causal Criteria, Scoring, or otherQuantitative methods (Weed, 2005; Chapman et al., 2002). However,these infrequently applied methods are not often transferable and aclear consensus on methodology remains elusive.

This paper explores the current application of WOE methods, witha focus on human health and ecological risk assessment. The paperbegins with a review of applicable regulatory requirements andfollows with an examination of literature from 2000 to the present fortrends in methods and application. It then develops a taxonomy ofspecific WOE approaches practiced in the field and presents aliterature review summarizing current state-of-WOE applications inthe field of risk assessment.

2. Weight of evidence as a regulatory tool

The US Environmental Protection Agency first introducedWOE as acomponent of health risk assessment in 1986 to measure chemical

carcinogenicity and mutagenicity (US EPA, 1986a,b,c). The frameworkincludes criteria for evaluating animal studies and epidemiologicaldata for risk assessment. Additionally, the framework directs assessorsto look at multiple criteria but does not include requirements forquantifying or weighing information streams. The framework isfurther refined in the 2005 “Guidelines for Carcinogen Risk Assess-ment” (US EPA, 2005a). In the updated framework, the US EPA lays outWOE criteria used for a chemical's qualitative placement within thefive-category system with classifications ranging from “Not Likely tobe Carcinogenic to Humans” to “Carcinogenic to Humans.”

In addition to establishing chemical carcinogenicity and muta-genicity, US EPA usesWOE for evaluating the toxic mode of action (i.e.,the mechanism through which chemicals cause toxic effects) (US EPA,2005a, 2007a). The WOE mode-of-action framework used by US EPAdraws upon the causal considerations articulated by Hill (1965). Asadopted for US EPA usage, a mode-of-action determination must besupported by a qualitative WOE assessment of the strength andspecificity of association, dose–response concordance, temporalrelationship, and biological plausibility and coherence, amongstother requirements. Recent US EPA applications of the WOEframework include risk assessments for 1,1,1-tricholoethane, 2,2,4-trimethylpentane, and phosgene (US EPA, 2005b, 2006, 2007c).

The US EPA also incorporates WOE within ecological riskassessments (US EPA, 1998). The ecological risk assessment frame-work requires the selection of assessment endpoints (potentiallyaffected receptors and their attributes) and measurement endpoints(metrics used to assess the potential impact of contaminants on thereceptors). A WOE evaluation treats each assessment and measure-ment endpoint as an individual line of evidence. Risk assessors arerequired to evaluate each line of evidence individually and form aconclusion about risks using Logic and Best Professional Judgment(BPJ). For example, a WOE evaluation might combine risk character-izations of single chemicals, ambient toxicity, biological field surveys,physiological biomarkers, or other sources of data (Suter, 2007;Chapman, 2007a,b). The US EPA instructs users to look critically ateach line of evidence and consider uncertainty and variability whenindividual lines point to different conclusions. A similar approach isrecommended by the US EPA for other ecological risk assessments atdifferent scales and stressors, including watershed management (USEPA, 2008), metals risk assessment (US EPA, 2007b), and riskassessment for suspended and bedded sediments (Cormier et al.,2008).

Parallel development of WOE has occurred in other US agencies toaddress problem-specific agency needs (Huguenin et al., 1996;Reinharz and Burlington, 1996; Reinharz and Michel, 1996; French,1996; California Office of Environmental Health Hazard Assessment,2008; US NRC, 2003; US DOE, 2005, 2007; US FWS, 2006, 2009; USDOT, 2006; USDA, 2008). In general, government and industrial riskassessments closely parallel those developed by US EPA. Beyond theUS, the EU has independently adopted its own standards coveringboth human health and environmental protection (Bardos, 2003;SCHER, 2008a,b,c) that integrate WOE in a method similar to the USEPA.

3. Literature review and classification methodology

The goal of this study is to characterize current WOE practiceswithin risk assessment and related areas. Thus, we have conducted aliterature review of the current state of WOE applications in humanhealth and ecological risk assessment, and we have developed ataxonomy describing WOE approaches practiced in the field.

3.1. Weight of evidence method classification

WOE is a broad term, and its practitioners employ numerousanalytical techniques for combining lines of evidence to reach

Page 3: Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches

Fig. 1. Weight of evidence classification system.

5201I. Linkov et al. / Science of the Total Environment 407 (2009) 5199–5205

conclusions. Though no classification system is able to fully char-acterize WOE methods, a system inclusive of the majority of the riskassessment literature was fashioned from a combination of Weed(2005) and Chapman et al.'s (2002) classification approaches(Table 1). The proposed classification includes Listing Evidence, BestProfessional Judgment, Causal Criteria, Logic, Scoring, Indexing, andQuantification. Definitions of Causal Criteria and Listing Evidence (or“metaphor” in Weed, 2005) and Indices and Scoring are taken directlyfromWeed (2005) and Chapman et al. (2002) with slight alterations.Narrative Reviews and BPJ were combined into the BPJ category. Thedefinition for the Logic category was expanded from Chapman et al.(2002) to include qualitative methodologies that follow a predefinedprocess. The Quantification category incorporates methods involvingformal use of decision analysis tools or framing the problems asstatistical hypothesis testing.

Even though all WOE methods may include both qualitative andquantitative considerations, the methods are ordered by increasingquantification. WOE for Listing Evidence and BPJ are consideredprimarily qualitative methods. WOE for Logic, and Causal Criteria areoften based on qualitative decision criteria although the logic itself is aform of mathematics. The Indexing and Scoring approaches use rating,statistical, and arithmetic manipulations and the final product may benumerical. Generally, Quantitative WOE methods statistically deriverisk probabilities from several lines of evidence and use formaldecision-analytical tools. When more than one method is depicted atthe same level (e.g., Logic/Causal Criteria and Indexing/Scoring), thisreflects that several methods have comparable quantitative rigorwithin the qualitative/quantitative continuum (Fig. 1).

Listing Evidence is the simplest application of WOE; it does notattempt to integrate lines of evidence together or into a larger pool ofknowledge. Rather, lines of evidence are simply presented, althoughthe assessor will at times make claims that the WOE points to specificconclusions. Examples include King and Richardson (2003) and Ollerand Erexson (2007).

All other methods include a form of integration. BPJ differs fromListing Evidence by attempting to integrate lines of evidence to form aconclusion, although usually by invoking a professional opinion that iscase specific. Examples of BPJ include Kavlock and Cummings's (2005)and Staples et al. (2004).

Methods for Causal Criteria (Lowell et al., 2000; Moraes et al.,2003) and Logic (Weeks and Comber, 2005; Chapman and Hollert,2006) provide a consistent structure for analysis, thus improvingtransferability of the method. Methods for Causal Criteria contain astructure for evaluating cause and effect relationships. Hill's (1965)criteria often serve as a basis, though some authors use a similarlydesigned process with their own criteria. Assessments using CausalCriteria generally step through the outlined criteria, providingevidence that criteria are met thus establishing a cause and effectrelationship. Logic-based processes use previously outlined methods

Table 1Weight of evidence methods.

Method Method description

Listing Evidence Presentation of individual lines of evidence without attempt atintegration

Best ProfessionalJudgment

Qualitative integration of multiple lines of evidence

Causal Criteria A criteria-based methodology for determining cause and effectrelationships

Logic Standardized evaluation of individual lines of evidence based onqualitative logic models

Scoring Quantitative integration of multiple lines of evidence using simpleweighting or ranking

Indexing Integration of lines of evidence into a single measure based onempirical models

Quantification Integrated assessment using formal decision analysis and statisticalmethods

to integrate lines of evidence, such as US EPA carcinogenicity orecological risk assessment guidelines (US EPA, 1998, 2005a). As acomparison of methods using Causal Criteria or Logic-based, considerthe differences when evaluating a toxic contamination. Utilizing Logicmethods, one enters the lines of evidence that either refute, discount,or corroborate one or more possible causes following a standardizedframework to judge contamination. For WOE by Causal Criteria, theprocess may be standardized, but there is no absolute list of evidencethat must be developed. So, if evidence shows that the toxicant is notpresent, a Logic-based system would judge no risk from thecontaminant. Whereas, with Causal Criteria other evidence may stillbe considered such as episodic exposures, bioavailability, or specificsymptoms and lead to a judgment that there is risk from thecontaminant. Causal Criterion and Logic methods make the integra-tion more transparent, but the information may be qualitative and thepractice may be biased by experience. Furthermore, both CausalCriteria and Logic rely on BPJ to synthesize lines of evidence.

Scoring (e.g., Coo and Aronson, 2004; McDonald et al., 2007) andIndexing (Hertzberg and Teuschler, 2002; Semenzin et al., 2008) areconsidered similar with respect to quantitative rigor. Scoring is thesimplest WOE method that assigns weights to various lines ofevidence. Various Scoring methods exist, with most determiningweights using BPJ based on qualities such as consistency, specificity, orstrength of the association. The weights assigned to the individuallines of evidence are often combined to develop a numerical WOEscore. Indexing assigns weights to lines of evidence to integrate thelines into a single value that determines the outcome of the analysis.Weights can be specified both in numerical values given to the linesand the proportion of the final index value that each line comprises. Akey consideration, however, is that neither Scoring nor Indexingquantifies judgments using formal decision analysis or probabilistictechniques. As a result, the transparency and reproducibility of thesemethods—as well as their ability to handle nonlinearity and correla-tion across criteria—serve as the delineating factors between Scoring/Indexing and Quantitative methods.

WOE approaches listed under the Quantification category useformalized mathematical methods involving quantitative methods toweight the evidence and quantitative methods to weigh the body ofevidence. Quantification methods, unlike Scoring/Indexing methods,are able to integrate nonlinearity and correlations into theirmethodologies. They also allow transparent and reproducible integra-tion of scientific results with individual expert or decision makerjudgment and comparison across multiple experts. MCDA is anexample of a Quantification method that uses likelihoods tosynthesize weights of evidence. It can be used to weigh scientificevidence for assessments and can incorporate utility theory, value

Page 4: Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches

Table 3Breakdown of literature review results by use and method.

Human health Ecological Generic Total

Listing Evidence 1 1 2Best Professional Judgment 35 8 43Causal Criteria 3 10 13Logic 10 15 1 26Scoring 1 2 3Indexing 4 2 6Quantification 5 5Unclassifiable 1 15 16Total 54 44 16 114

5202 I. Linkov et al. / Science of the Total Environment 407 (2009) 5199–5205

theory, and similar methods to integrate value-based lines of evidenceinto a management assessment and decision making (Linkov et al.,2006, 2007; Critto et al., 2007; Semenzin et al., 2007).

3.2. Literature review

We searched the Science Citation Index (SCI) database through theWeb of Science, which accesses articles from 5800 scholarly journals.The search phrase “weight of evidence” returned 1217 papers fromthese publications without restrictions. To limit search results tocurrent risk analysis papers, the term “weight of evidence” wascombined with “risk analysis” or “risk assessment,” and the searchwas limited to full papers published after 2000. A search of theWeb ofScience returned 323 articles: 212 for WOE and risk assessment, and111 for WOE and risk analysis. One-hundred forty four of these paperswere available through the Massachusetts Institute of Technologylibrary system. Papers were included in our review only if the phrase“weight of evidence” occurred within the abstract or keywords and inat least one other locationwithin the text. Also, we selected importantpapers published prior to 2000 to complete the search. The Google®

search engine was used to locate nongovernmental reports andfederal and state reports. Including supplemental articles, 114 papersmatched the outlined criteria. This literature search was not intendedto be exhaustive, but rather was designed to identify a representativeand accessible subset of papers for detailed review.

Articles were classified by application area (human health,ecological, and generic study) and by the manner of use of WOEwithin the human health and ecological risk assessment process(Table 2) (US EPA, 1986a,b,c, 1998). A full listing of reviewed articles,along with the WOE classification assigned to each, is included at theend of this paper.

4. WOE application statistics and trends

Search results summarized by method and application areasappear in Table 3. The number of human health and ecological usesare comparable, with slightly more human health citations. For themost part, human health methodologies are confined to BPJ with little

Table 2Weight of evidence applications.

Human health

Method development Develops analysis methodology for human health riskassessment

Toxicity analysis Determines if a substance has an adverse health impactMode of action determination Seeks the method by which a substance causes harmBenchmark development Recommends allowable exposure levels for various

substancesOther Uses WOE analysis for human health in a way that

does not fit the above categories

EcologicalMethod development Develops analysis methodology for ecological risk

assessmentHazard identification Part of problem formulation. Provides evidence that

an agent is potentially harmful to a susceptible entityExposure characterization Estimates exposure of organisms from contact and

uptakeEffects characterization Estimates the nature and magnitude of effects of

chemicals or other agents as a function of exposureRisk characterization Estimates and interprets the risks and uncertainties

of adverse effectsBenchmark development Identifies levels for removal or remediation or levels

protective of the resource.Other Uses WOE analysis in a way that does not fit the

above categories

GenericOther Uses WOE analysis in a way which does not fit into

any other category

use of quantitatively rigorous methods, which is likely due to thenumber of published carcinogenicity and mode-of-action determina-tions that use BPJ, as recommended by the US EPA guidance. Ecologicalmethodologies show increased use of Causal Criteria and Logic.Riverine and marine risk assessments tended to use Causal Criteria.Sediment quality studies relied heavily on the Logic approach, usuallyciting the sediment quality “triad” (which evaluates sedimentchemistry, toxicity, and effects to resident organisms). Overall, thisreview indicates that Qualitative WOE approaches tend to be usedmost often, greatly overshadowing Indexing, Scoring, and otherQuantification methods.

Search results are partitioned into specific application areas withinrisk assessment (Table 4). Among human health assessments, 52% ofthe citations indicated that WOE was used to evaluate toxicityanalyses, 22% indicated that they were developing WOE methods forrisk assessment, and the remaining 26% occurred in other applicationsof human health risk assessment. For all health categories, most of theWOE methods were qualitative. Within ecological applications, Logic(25%) and effects characterization using Causal Criteria (18%) weremore common than other methods. Other WOE approaches andapplication categories had approximately equal usage (2–7%) with theexception of exposure characterization, where we found only onecitation using Scoring.

This literature review yielded only one citation of WOE using theListing Evidence approach, whereas metaphorical uses of WOEaccounted for almost 50% of the citations reported by Weed (2005).The reason for this discrepancy may be the stricter screeningprocedures implemented in this study. Weed (2005) analyzedPubMed® for articles published between 1994 and 2002 with “weightof evidence” in the title, as well as articles published between 2003and 2004 with “weight of evidence” anywhere in the text, resulting in92 articles included in his paper. In contrast, using Weed's samplingtime frame but with the more stringent methodology from this paperrequiring repeated use of WOE in the abstract and body of the paper, asearch of PubMed® resulted in 74 articles for WOE and “riskassessment” and another 10 for WOE and “risk analysis.” Of these 74articles, only eight have WOE in the title, abstract, or keyword andagain in the text body. This leads one to believe that Weed's searchmethodology was more inclusive of publications using WOE in acolloquial sense, as also suggested by Weed, rather than research thatdeliberately incorporated WOE as a method for interpreting findings.

Within papers classifiable by both method and application, therewas no dominant ecological risk assessment method. Though Logicmethods appeared slightly less than twice as often as Causal Criteria(its next highest counterpart), the difference between the number ofcitations is small enough to cast doubt on any trend. Additionally, thesimilar frequencies in other categories further confound any dom-inance of a Logic approach or any other method. Very few used WOEfor risk characterization, the final step in risk assessment.

Within the ecological risk assessment literature, the number ofcitations for papers describing the development of WOE methods issimilar to the number of citations forWOE application, suggesting thatscientific consensus for integrative approaches is still under active

Page 5: Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches

Table 4Literature review search results.

ListingEvidence

Best ProfessionalJudgment

CausalCriteria

Logic Scoring Indexing Quantification Unclassifiable Total

Human Health Analytical methods development 7 1 4 12Benchmark development 3 2 5Mode of action determination 2 1 3Other 6 6Toxicity analysis 1 17 2 7 1 28

Ecological Analytical methods development 3 2 11 4 20Hazard identification or effects characterization 3 8 3 1 2 1 18Exposure characterization 1 1Other 1 2 1 1 5

Generic Other 1 15 16Total 2 43 13 26 3 6 5 16 114

5203I. Linkov et al. / Science of the Total Environment 407 (2009) 5199–5205

investigation and debate. In the human health literature, by contrast,development accounted for only 12 of 54 articles, suggesting eitherthat the debate is settled or that researchers are willing to apply WOEwhile debate continues.

Another trend is the lack of quantitative methods within reviewedWOE applications. Quantification, Indexing, and Scoring approacheswere identified in 14 publications, while Listing Evidence, BPJ, Logic,and Causal Criteria were used 84 times.

5. Discussion and conclusion

Risk management decisions and policies based upon risk assess-ments result in benefits and costs to human well-being, ecologicalresources, and economies, affecting both the private and publicsectors. Generic appeals for the use of “sound science” imply that ascientific approach to addressing a problem or question inevitablyleads to only one certain conclusion or one reasonable option. This isfar from reality, given the complexities and uncertainties character-istic of most environmental problems. However, it should be the casethat policy decisions are founded upon evidence-based conclusionsthat draw from good scientific practice.

Dissatisfaction over the “interpretation of science” frequently resultsfrom concerns about the lack of sufficient objectivity, certainty,transparency, repeatability, and consistency in the approaches used tointegrate lines of evidence in reaching conclusions about environmentalrisks. Despite the fact that all risk assessments make use of multiplesources of information in developing evidence, half of the papersreviewed in this study use Listing Evidence or BPJ rather than moreformal techniques for data integration. Authors that used analyticalmethods felt compelled to introduce the concept and framework ofWOE prior to beginning their respective analyses. Nevertheless, inapplication areas where a WOE methodology is well defined anddocumented, structured data integration appears more common. Forexample, ecological risk professionals often used the sediment qualitytriad, a structured WOE process classified as Logic in this paper, butseemed to hesitate to use similar processes with nonsedimentapplications. Furthermore,WOE seems to be associatedwithdevelopingindividual lines of evidence, but not with integrating these independentlines for thefinal risk characterization. Nevertheless, there appears to bea desire formore structured and quantitative approaches toWOE (Stahlet al., 2002; Weed, 2005; McDonald et al., 2007).

Our review shows that WOE, as it is practiced in the fields ofhuman and ecological risk assessment, includes several methodolo-gies ranging from qualitative evaluation through formal quantitativedecision analytical approaches. Our results showed that BPJ is themost widely used method, but it does not lend itself to transparencyor repeatability except in simple cases. While Logic methods wereoften adopted, they were limited to hazard assessments or risk effectscharacterization (Chapman, 2007a,b) and did not play a large role inthe final risk assessment. Causal Criterion methods were less

consistently applied and were primarily applied to hazard assessmentand effects characterization rather than risk estimation, and they oftendepended on the integration of qualitative concepts that are moredifficult to transfer to new users. Indexing and Scoring methods,though more quantitative, are generally not generic and have to beadapted by project teams in an ad-hoc manner to each application.Quantification (statistics and MCDA) is the most quantitative andtransparent, since they bring scientific disciplines to bear oninformation aggregation. Nevertheless, application of statisticalmethods requires significant amounts of information to developprobabilities, which could become prohibitive for many applications.Moreover, although qualitative analysis of individual lines of evidenceor even quantitative analysis using Scoring, Indexing, and Statisticalmethods can be powerful for informing a decision process, they do notinclude options for quantitatively integrating decision-maker valuesand judgment. As a result, consideration of stakeholder values, acrucial aspect of many controversial policy decisions, is not explicitlytaking place.

Multi-criteria decision analysis-based tools may be a meaningfulway to incorporate both qualitative and quantitative information(Linkov et al., 2006). With MCDA, scientific lines of evidence can beweighed independently from social, political, logistical, and economicconsiderations. And, all of these can later be weighed together in amanagement assessment and decision-making process. Thus MCDAcan preserve rigorous scientific assessments while also consideringvalue-based assessment and expert judgment.

The main advantage of MCDA-based WOE integration is itsdocumented and repeatable method of integrating individual linesof evidence, as well as its ability for evaluating the sensitivity of theconclusions to changes in the specific parameters or logic used toperform the integration. Most of MCDA applications rank alternativechoices against a set of objectives (e.g., selection of best environ-mental management alternatives, see Linkov et al. (2006) for review).USACE, DHS, US EPA, NOAA and other agencies are exploring MCDA(see Linkov et al. (2006) for review).

Finally, quantitative WOE methods, especially MCDA, providetransparency in decision making and an opportunity for consensusbuilding. Assigning a numerical value to each line of evidence allowsstakeholders to debate each line's weight and adjust the weights asnecessary to reach consensus. Most qualitativeWOEmethods found inthis review lack a process for discussing and recording a particularweight, thus losing the transparency provided to stakeholders.Quantitative methods allow stakeholders to understand the under-lying considerations involved in many decision-making steps. This isespecially important for transitioning from risk assessment to riskmanagement, which involves the incorporation of social, political, andeconomic considerations into the formulation of a decision and plan ofaction.

As discussed within this paper, we believe there are a number ofbenefits that support the use of quantitative WOE in environmental

Page 6: Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches

5204 I. Linkov et al. / Science of the Total Environment 407 (2009) 5199–5205

risk assessment. One of the intended purposes of this review paper isto encourage a dialogue on the subject of WOE among risk analystsand managers with the expectation that such a dialogue will lead toadvances in the development and use of quantitative WOE methods.This dialoguewould be facilitated by commitments within and amongthe government agencies that rely upon human health and ecologicalrisk assessment to strengthen commitments to use repeatable,defensible, and transparent integrative methodologies. Such acommitment will include expanding the descriptions of how evidenceis developed and integrated within environmental risk assessments toreach conclusions and ultimately decisions. Given the importance thatevidence development holds within the practice of risk assessment,we expect that additional research and development focused onWOEmethods would lead to a more effective use of science in environ-mental management.

Acknowledgements

We would like to thank Drs. Jongbum Kim, Joshua Gold, BurtonSuedel, and Tom Seager for their comments and useful discussions.Funding was provided by the US Army Corps of Engineers' DredgingOperations and Environmental Research (DOER) Program. Permissionwas granted by the USACE Chief of Engineers to publish this material.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.scitotenv.2009.05.004.

References

Bardos P. A review of the Contaminated Land Rehabilitation Network for EnvironmentalTechnologies in Europe (CLARINET). Part 2: Working group findings. Land ContamReclam 2003;11(1):15–30.

Bernoulli D. Exposition of a new theory on the measurement of risk. Translated by L.Sommer in Econometrica 1954[1738 original];22:22–36.

California Office of Environmental Health Hazard Assessment. Risk assessment.Accessed August 2008 from http://www.oehha.ca.gov/risk.html.

Chapman PM. Determining when contamination is pollution — weight of evidencedeterminations for sediments and effluents. Environ Int 2007a;33(4):492–501.

Chapman PM. Traditional ecological knowledge (TEK) and scientific weight of evidencedeterminations. Mar Pollut Bull 2007b;54(12):1839–40.

Chapman PM, Hollert H. Should the sediment quality triad become a tetrad, a pentad, orpossibly even a hexad? J Soils Sediments 2006;6(1):4–8.

Chapman PM, McDonald BG, Lawrence GS. Weight-of-evidence issues and frameworksfor sediment quality (and other) assessments. Hum Ecol Risk Assess 2002;8(7):1489–515.

Coo H, Aronson KJ. A systematic review of several potential non-genetic risk factors formultiple sclerosis. Neuroepidemiology 2004;23(1–2):1-12.

Cormier SM, Paul JF, Spehar RL, Shaw-Allen P, BerryWJ, Suter II GW. Using field data andweight of evidence to develop water quality criteria. Integr Environ Assess Manag2008;4:490–504.

Critto A, Torresan S, Semenzin E, Giove S, MesmanM, Schouten AJ, et al. Development ofa site-specific ecological risk assessment for contaminated sites: Part 1. A multi-criteria based system for the selection of ecotoxicological tests and ecologicalobservations. Sci Total Environ 2007;379:16–33.

French D. Specifications for use of NRDAM/CME version 2.4 to generate compensationformulas guidance document for natural resource damage assessment under theOil Pollution Act of 1990. National Oceanic and Atmospheric AdministrationDamage Assessment and Restoration Program; 1996. http://www.darrp.noaa.gov/library/pdf/cfd.pdf.

Good IJ. Weight of evidence and the Bayesian likelihood ratio. In: Aitken CGG, Stoney D,editors. The Use of Statistics in Forensics Science. Boca Raton: CRC Press; 1991.p. 85-106.

Goodman JE, McConnell EE, Sipes IG, Witorsch RJ, Slayton TM, Yu CJ, et al. An updatedweight of the evidence evaluation of reproductive and developmental effects of lowdoses of Bisphenol A. Crit Rev Toxicol 2006;36(5):387–457.

Gough D. Weight of evidence: A framework for the appraisal of the quality andrelevance of evidence. In: Furlong J, Oancea A, editors. Applied and Practice-BasedResearch, 22(2). Spec Ed Resh Papers Edu; 2007. p. 213–28.

Hertzberg RC, Teuschler LK. Evaluating quantitative formulas for dose–responseassessment of chemical mixtures. Environ Health Perspect 2002;110(Suppl 6):965–70.

Hill AB. The environment and disease: association or causation? Proc R Soc Med1965;58:295–300.

Huguenin MT, Haury DH, Weiss JC, Helton D, Manen C, Reinharz E, et al. Injuryassessment guidance document for natural resource damage assessment under theOil Pollution Act of 1990. National Oceanic and Atmospheric AdministrationDamage Assessment and Restoration Program; 1996. http://www.elaw.org/system/files/us.nrda.injuryassessment.pdf.

Jackson J. Analysing the new evidence scholarship: towards a newconception of the lawof evidence. Oxford J Legal Studies 1996;16(2):309–28.

Kavlock R, Cummings A. Mode of action: reduction of testosterone availability—molinate-induced inhibition of spermatogenesis. Crit Rev Toxicol 2005;35(8–9):685–90.

King RS, Richardson CJ. Integrating bioassessment and ecological risk assessment: anapproach todevelopingnumericalwater-qualitycriteria. EnvironManage2003;31(6):795–809.

Krimsky S. The weight of scientific evidence in policy and law. Am J Public Health2005;95(Suppl 1):S129-136.

Linkov I, Satterstrom FK, Kiker G, Batchelor C, Bridges T, Ferguson E. From comparativerisk assessment to multi-criteria decision analysis and adaptive management:recent developments and applications. Environ Int 2006;32(8):1072–93.

Linkov I, Satterstrom FK, Steevens J, Ferguson E, Pleus RC. Multi-criteria decisionanalysis and environmental risk assessment for nanomaterials. J Nanopart Res2007;9(4):543–54.

Lowell RB, Culp JM, Dube MG. A weight-of-evidence approach for Northern River riskassessment: integrating the effects of multiple stressors. Environ Toxicol Chem2000;19(4):1182–90.

McDonald BG, deBruyn AM, Wernick BG, Patterson L, Pellerin N, Chapman PM. Designand application of a transparent and scalable weight-of-evidence framework: anexample from Wabamun Lake, Alberta, Canada. Integr Environ Assess Manag2007;3(4):476–83.

Moraes R, Gerhard P, Andersson L, Sturve J, Rauch S, Molander S. Establishing causalitybetween exposure to metals and effects on fish. Hum Ecol Risk Assess 2003;9(1):149–69.

Oller A, Erexson G. Lack of micronuclei formation in bone marrow of rats after repeatedoral exposure to nickel sulfate hexahydrate. Mutat Res 2007;626(1–2):102–10.

Pierce CS. Pop Sci Mon 1878;12:707–9.Reinharz E, Burlington LB. Restoration planning guidance document for natural

resource damage assessment under the Oil Pollution Act of 1990. National Oceanicand Atmospheric Administration Damage Assessment and Restoration Program;1996. http://www.darrp.noaa.gov/library/pdf/rpd.pdf.

Reinharz E, Michel J. Preassessment phase guidance document for the natural resourcedamage assessment under the Oil Pollution Act of 1990. National Oceanic andAtmospheric Administration Damage Assessment and Restoration Program; 1996.http://www.darrp.noaa.gov/library/pdf/PPD-TP.PDF.

SCHER (Scientific Committee on Health and Environmental Risks). Risk assessmentreport on 1,3,4,6,7,8-hexahydro-4,6,6,7,8,8,-hexamethylcyclopenta-γ-2-benzopyran(HHCB) CAS No.: 1222-05-5); 2008a.

SCHER (Scientific Committee on Health and Environmental Risks). Risk assessmentreport on 4-tert-butylbenzoic acid (PTBBA) CAS No.: 98-73-7); 2008b.

SCHER (Scientific Committee onHealth and Environmental Risks). Risk assessment reporton 6-acetyl-1,1,2,4,4,7-hexamethyltertraline (AHTN) CAS No.: 1506-02-1); 2008c.

Semenzin E, Critto A, Carlon C, Rutgers M, Marcomini A. Development of a site-specificecological risk assessment for contaminated sites: Part II. A multi-criteria basedsystem for the selection of bioavailability assessment tools. Sci Total Environ2007;379(1):34–45.

Semenzin E, Critto A, Rutgers M,Marcomini A. Integration of bioavailability, ecology andecotoxicology by three lines of evidence into ecological risk indexes forcontaminated soil assessment. Sci Total Environ 2008;389(1):71–86.

Stahl C, Cimorelli A, ChowA. A newapproach to environmental decision analysis: multi-criteria integrated resource assessment (MIRA). Bull Sci Technol Soc 2002;22(6):443–59.

Staples C, Mihaich E, Carbone J, Woodburn K, Klecka G. Aweight of evidence analysis ofthe chronic ecotoxicity of nonylphenol ethoxylates, nonylphenol ether carbox-ylates, and nonylphenol. Hum Ecol Risk Assess 2004;10(6):999-1017.

Suter II GW. Ecological Risk Assessment. 2nd ed. Boca Raton, FL: CRC Press; 2007.USDA (United States Department of Agriculture). Water quality information center;

2008. Accessed August 20, 2008, from http://www.nal.usda.gov/wqic/risk.shtml.US DOE (United States Department of Energy). Weldon spring site LTS&M plan No. Doc.

No. S0079000; 2005. 12 pp.US DOE (United States Department of Energy). Final risk assessment report for the

FutureGen project environmental impact statement No. Contract No. DE-AT26-06NT42921; 2007. 398 pp.

US DOT (United States Department of Transportation). Risk assessment and allocationfor highway construction management No. FFHWA-PL-06-032; 2006. 72 pp.

US EPA (United States Environmental Protection Agency). Guidelines for carcinogen riskassessment, EPA/630/R-00/004; 1986a. 38 pp.

US EPA (United States Environmental Protection Agency). Guidelines for mutagenicityrisk assessment, EPA/630/R-98/003; 1986b. 23 pp.

US EPA (United States Environmental Protection Agency). Guidelines for the health riskassessment of chemical mixtures, EPA/630/R-98/002; 1986c. 29 pp.

US EPA (United States Environmental Protection Agency). Guidelines for ecological riskassessment, EPA/630/R-95/002F; 1998. 188 pp.

US EPA (United States Environmental Protection Agency). Guidelines for carcinogen riskassessment, EPA/630/P-03/001F; 2005a. 166 pp.

US EPA (United States Environmental Protection Agency). Toxicological review ofphosgene No. CAS No. 75-44-5, EPA/635/R-06/001; 2005b. 102 pp.

US EPA (United States Environmental Protection Agency). Integrated Risk InformationSystem (IRIS). IRIS toxicological review and summary documents for 2,2,4-

Page 7: Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches

5205I. Linkov et al. / Science of the Total Environment 407 (2009) 5199–5205

trimethylpentane (External Review Draft). Washington, DC: Office of Research andDevelopment, National Center for Environmental Assessment; 2006. Accessed April2009, from http://cfpub2.epa.gov/ncea/cfm/recordisplay.cfm?deid=161905(November).

US EPA (United States Environmental Protection Agency). Framework for determining amutagenic mode of action for carcinogenicity—Draft, EPA 120/R-07/002-A; 2007a.50 pp.

US EPA (United States Environmental Protection Agency). Framework for metals riskassessment, EPA120/R-07/001; 2007b. 172 pp.

US EPA (United States Environmental Protection Agency). Toxicological review of 1,1,1-trichloroethane, CAS No. 71-55-6, EPA/635/R-03/013; 2007c. 237 pp.

US EPA (United States Environmental Protection Agency). Application of watershedecological risk assessment methods to watershedmanagement. Washington, DC: U.S.Environmental Protection Agency, Office of Research and Development, NationalCenter for Environmental Assessment; 2008. No. EPA/600/R-06/037F.

US FWS (United States Fish & Wildlife Service). Cerulean warbler risk assessment &conservation planning workshop; 2006. AccessedMay 23, 2008, from http://www.fws.gov/midwest/eco_serv/soc/birds/cerw/documents/cerw_ra06.pdf (June).

US FWS (United States Fish & Wildlife Service). Managing invasive plants: Concepts,principles, and practices; 2009. Accessed February 23, 2009, from http://www.fws.gov/invasives/staffTrainingModule/index.html (last updated February 18).

US NRC (United States Nuclear Regulator Commission). Handbook of parameterestimation for probabilistic risk assessment. Washington, DC: U.S. NuclearRegulatory Commission, Office of Nuclear Regulatory Research; 2003. No.NUREG/CR-6823. SAND2003-3348P.

Weed D. Weight of evidence: a review of concept and methods. Risk Anal 2005;25(6):1545–57.

Weeks JM, Comber SDW. Ecological risk assessment of contaminated soil. Mineral Mag2005;69(5):601–13.

Wilson TD, Bar-Anan Y. The unseen mind. Science 2008;321:1046–7.