evaluation of the causal framework used for setting …...evaluation of the causal framework used...
TRANSCRIPT
2013
http://informahealthcare.com/txcISSN: 1040-8444 (print), 1547-6898 (electronic)
Crit Rev Toxicol, 2013; 43(10): 829–849! 2013 Informa Healthcare USA, Inc. DOI: 10.3109/10408444.2013.837864
REVIEW ARTICLE
Evaluation of the causal framework used for setting National AmbientAir Quality Standards
Julie E. Goodman, Robyn L. Prueitt, Sonja N. Sax, Lisa A. Bailey, and Lorenz R. Rhomberg
Gradient, Cambridge, MA, USA
Abstract
A scientifically sound assessment of the potential hazards associated with a substance requiresa systematic, objective and transparent evaluation of the weight of evidence (WoE) for causalityof health effects. We critically evaluated the current WoE framework for causal determinationused in the United States Environmental Protection Agency’s (EPA’s) assessments of thescientific data on air pollutants for the National Ambient Air Quality Standards (NAAQS) reviewprocess, including its methods for literature searches; study selection, evaluation andintegration; and causal judgments. The causal framework used in recent NAAQS evaluationshas many valuable features, but it could be more explicit in some cases, and some features aremissing that should be included in every WoE evaluation. Because of this, it has not alwaysbeen applied consistently in evaluations of causality, leading to conclusions that are not alwayssupported by the overall WoE, as we demonstrate using EPA’s ozone Integrated ScienceAssessment as a case study. We propose additions to the NAAQS causal framework based onbest practices gleaned from a previously conducted survey of available WoE frameworks. Arevision of the NAAQS causal framework so that it more closely aligns with these best practicesand the full and consistent application of the framework will improve future assessments of thepotential health effects of criteria air pollutants by making the assessments more thorough,transparent, and scientifically sound.
Keywords
Air quality, causal framework, criteriapollutants, risk assessment, systematicreview, weight of evidence
History
Received 1 July 2013Revised 20 August 2013Accepted 20 August 2013Published online 1 October 2013
Table of Contents
Abstract ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 829Introduction ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 829The NAAQS causal framework ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 830WoE best practices ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 834
Phase 1: Define the causal question and develop criteria for studyselection ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 835
Phase 2: Develop and apply criteria for review of individualstudies ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 839
Phase 3: Integrate and evaluate evidence ... ... ... ... ... ... ... ... ... 840Phase 4: Draw conclusions based on inferences ... ... ... ... ... ... 841
Case study: Ozone ISA ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 842Phase 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 843Phase 2 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 843
Evaluation of study quality ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...843Consideration of study limitations ... ... ... ... ... ... ... ... ... ... ... ...843
Measurement error ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 843Confounders ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 844Model specification ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 844
Phase 3 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 845Consideration of modified Bradford Hill criteria ... ... ... ... ...845Weighing alternative views ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...846
Phase 4 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 846Conclusions ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 847
Acknowledgements ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 847Declaration of interest ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 848References ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 848
Introduction
A scientifically sound assessment of the potential risks
associated with a substance requires a systematic, objective
and transparent evaluation of the weight of evidence (WoE)
for causality of health effects. Many different methods for
conducting WoE evaluations have been used by authoritative
bodies and reported in the scientific literature (e.g. Adami
et al., 2011; Borgert et al., 2011; ECETOC, 2009; IARC,
2006; Rhomberg et al., 2010, 2011a, 2013; Swaen & van
Amelsvoort, 2009). The United States Environmental
Protection Agency (EPA) uses WoE approaches to assess
the risks of health effects from exposures to chemicals in the
environment, and these assessments are used to support a
wide range of regulatory activities. For example, as part of the
process for setting health-based National Ambient Air Quality
Standards (NAAQS), EPA periodically evaluates the WoE for
potential health risks of exposure to each of six substances
that are classified as criteria air pollutants [particulate matter
(PM), sulfur dioxide, nitrogen dioxide, carbon monoxide,
ozone and lead] because of their national-scale occurrence
and public health significance. These evaluations are pre-
sented as an Integrated Science Assessment (ISA), in which
Address for correspondence: Julie E. Goodman, Gradient, 20 UniversityRoad, Cambridge, MA 02138, USA. Tel: 617-395-5000. E-mail:[email protected]
EPA assesses the body of relevant scientific literature to draw
conclusions regarding the causal relationships between
exposure to each criteria pollutant and human health and
welfare effects (US EPA, 2013a). Information and conclu-
sions from the ISA are used by EPA in the Risk and Exposure
Assessment (REA) to evaluate human health or environmen-
tal risks associated with recent air quality conditions or air
quality that is estimated to just meet current or alternative
standards (US EPA, 2013b). In its Policy Assessment, EPA
bridges the gap between the scientific assessments in the ISA
and REA and judgments required by the EPA Administrator
regarding the retention or revision of any of the four elements
of each NAAQS (i.e. indicator, averaging time, level, and
statistical form) (US EPA, 2013b). In the ISA, decisions
regarding causality near the level of the standard carry
substantial weight in the Administrator’s judgments
(Bachmann, 2007; McClellan, 2012).
The current WoE framework for causal determination in
the NAAQS process (referred to herein as the ‘‘NAAQS
causal framework’’) was first introduced in the ISAs for
oxides of nitrogen and sulfur (US EPA, 2008a,b) and updated
in the ISA for PM (US EPA, 2009); it has been refined since
in the ISAs for carbon monoxide (US EPA, 2010a), ozone (US
EPA, 2013a), and lead (US EPA, 2012).
In addition to assessments of air pollutants in the NAAQS
process, EPA evaluates potential human health effects from a
broad variety of chemicals in its Integrated Risk and
Information System (IRIS) Program. An example of this is
EPA’s 2010 draft IRIS assessment of formaldehyde (US EPA,
2010b), for which the evidence for potential adverse effects
was compiled and evaluated in a similar manner as was the
evidence in the criteria pollutant ISAs. In 2011, a panel
assembled by the National Research Council (NRC) of the
National Academy of Sciences (NAS) published its Review of
the Environmental Protection Agency’s Draft IRIS Assessment
of Formaldehyde (NRC, 2011). In this report, the NAS panel
identified several issues with the WoE methodology used by
EPA, including some that were identified by other NRC
committees that had reviewed a number of IRIS assessments
over the previous decade. The panel called on EPA to
undertake a program to develop a transparent and defensible
methodology for WoE evaluations to improve the agency’s
IRIS assessments.
The NAS panel proposed a ‘‘roadmap’’ for reform and
improvement of the risk assessment process, noting that there
are many proven models for evidence-based reviews that
might provide guidance to EPA, including systematic WoE
approaches for hazard identification. The panel recommended
that EPA’s WoE evaluation include clear narratives that
provide the rationale for conclusions; furthermore, the panel
noted that conclusions suggesting an association in the face of
mixed results (e.g. positive, weak, and null studies for a given
endpoint) should be accompanied by a thorough discussion of
the strengths and weaknesses of the underlying studies. The
NAS panel stated that EPA’s current process for NAAQS
reviews, including the development of an ISA, is a useful
example of revising an established process for conducting
evaluations in a relatively short period of time. It noted,
however, that the current NAAQS process is not an example
of a specific approach for evidence review that should be
adopted for revision of EPA’s IRIS process. In fact, several
issues that the NAS panel noted with the formaldehyde
assessment could also be considered applicable to revision
and improvement of the NAAQS WoE evaluation
methodology.
After describing the NAAQS causal framework below, we
discuss best practices for WoE evaluations gleaned from other
available WoE frameworks and note approaches that should
be stated explicitly in the NAAQS causal framework to
decrease the likelihood of biased evaluations. We then present
a case study in which we cite several examples from the ozone
ISA that demonstrate how EPA’s use of the current NAAQS
causal framework leads to an inconsistent evaluation that
often leads to conclusions of causality that are not fully
supported by the evidence.
The NAAQS causal framework
The current NAAQS causal framework used by EPA to
conduct evaluations of potential health effects associated with
criteria air pollutants is described in the Preamble section of
the recent lead and ozone ISAs (US EPA, 2012, 2013a). The
Preamble states that the NAAQS causal framework draws
from language in sources across the federal government and
scientific community, particularly the Institute of Medicine
(IOM) report Improving the Presumptive Disability Decision-
making Process for Veterans (IOM, 2008). Although the IOM
(2008) report provides guidance for making decisions
regarding health effects in veterans, and not from exposures
to criteria air pollutants, the Preamble notes that it is
applicable because it is a comprehensive report on the
evaluation of causality.
The NAAQS causal framework is used to determine the
WoE in support of causation and characterize the strength of
any resulting causal classification after a consideration of the
evidence. In addition, EPA uses the causality classification to
decide if a quantitative risk assessment will be conducted on a
particular health effect. Table 1 lists the general steps of the
NAAQS causal framework.
In the Preamble of the ozone ISA, EPA first discusses how
studies are identified for inclusion in the evaluation and then
describes the framework itself. For brevity, and because study
identification and selection are necessary steps prior to an
evaluation of causality, hereafter we include these initial steps
of study identification and selection methods described in the
Preamble, as well as study evaluation and integration, when
referring to the NAAQS causal framework. As part of the
identification step, for each periodic NAAQS review, EPA
maintains an ongoing literature search process to identify
relevant studies published since the last NAAQS review of a
given pollutant. The Preamble states that search strategies are
designed for each pollutant and iteratively modified to
optimize the identification of pertinent studies. Studies are
also identified for inclusion through reviewing the tables of
contents of journals in which relevant papers may be
published, identification of relevant studies by expert scien-
tists, reviewing citations from previous assessments, and
identification of papers by advisory committees and the
public. Studies are considered for inclusion if they have
undergone scientific peer review or if they include analyses
830 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
conducted by EPA using publicly available data. EPA states
that all relevant controlled human exposure, epidemiology
and toxicology studies are considered.
The Preamble in the ozone ISA states that the selection of
studies for inclusion is based on the general scientific quality
of the study, with consideration of the extent to which the
study is informative and relevant to the NAAQS process. The
assessment of study quality and relevance considers factors
such as whether the study subjects, populations or animal
models are adequately selected and sufficiently well-defined
to allow for meaningful comparisons between groups; the
statistical analyses are appropriate and properly performed;
the exposure data are of adequate quality and representative of
ambient conditions; the health effect measurements are
meaningful, valid and reliable; and the analytical methods
provide adequate sensitivity and precision to support conclu-
sions. Other considerations discussed by EPA include whether
potential confounders and effect modifiers are assessed in
epidemiology studies; control exposures to filtered air are
included in controlled human exposure studies; and there is
sufficient statistical power to assess findings in both
controlled human exposure and animal toxicology studies.
EPA states that of most relevance for study inclusion is
whether it provides useful qualitative or quantitative infor-
mation regarding exposure–response relationships at expos-
ures relevant to ambient conditions that can inform decisions
on whether to retain or change the NAAQS.
After the discussion of study selection, the Preamble
describes the NAAQS causal framework. This includes a
discussion of some of the general strengths and limitations
that should be considered when evaluating controlled human
exposure, epidemiology, and animal toxicology studies. The
framework uses a modified Bradford Hill approach (Hill,
1965) to aid in judgments regarding causal determinations,
including consideration of criteria (as they are commonly
referred to) such as consistency of observed associations,
coherence, biological plausibility, biological gradient,
strength of the observed association, experimental evidence,
temporality, specificity and analogy (Table 2). The original
Bradford Hill criteria were developed mainly for the inter-
pretation of epidemiology results and are not meant to be
specific rules to follow. EPA modified them in the NAAQS
causal framework for use with the broad array of study types
that are considered in the ISAs (e.g. epidemiology, toxicology,
mechanistic), to be more consistent with EPA’s Guidelines for
Carcinogen Risk Assessment (US EPA, 2005). Regarding how
Table 1. EPA NAAQS process for causal determination.
Step
Literature searching Specialized searches on specific topicsReview tables of contents of relevant journalsIdentification by expert scientistsReview citations in previous assessmentsIdentification by the public and advisory committeesIterative modification to optimize identification of pertinent publications
Selection of studies for inclusion Peer-reviewedEPA analyses of publicly-available dataBased on study qualityConsiders policy-relevance of studiesConsiders relevance to ambient exposures
Consideration of general limitations of each study type Controlled human exposureEpidemiologyToxicology
Use of modified Bradford Hill aspects to aid in judging causality ConsistencyCoherenceBiological plausibilityBiological gradientStrength of associationExperimental evidenceTemporalitySpecificityAnalogy
Evaluate evidence for major health outcome categories
Integrate evidence from across disciplines (controlled exposure, epi-demiology, toxicology) and across health endpoints
Incorporate peer and public comment and advice
Weigh alternative views on controversial issues
Characterize strength of evidence into causal conclusions Causal relationshipLikely to be a causal relationshipSuggestive of a causal relationshipInadequate to infer a causal relationshipNot likely to be a causal relationship
Assess adversity of effects
Source: US EPA (2013a).
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 831
the modified criteria, or ‘‘aspects’’ as EPA refers to them, are
to be used in the framework, EPA states:
Although these aspects provide a framework for assessing
the evidence, they do not lend themselves to being
considered in terms of simple formulas or fixed rules of
evidence leading to conclusions about causality (Hill,
1965). For example, one cannot simply count the number
of studies reporting statistically significant results or
statistically nonsignificant results and reach credible
conclusions about the relative weight of the evidence and
the likelihood of causality. Rather, these aspects provide a
framework for systematic appraisal of the body of
evidence, informed by peer and public comment and
advice, which includes weighing alternative views on
controversial issues. (US EPA, 2013a)
The NAAQS causal framework suggests that evidence be
evaluated for major health outcome categories (e.g. respira-
tory effects) and conclusions be drawn based on the
integration of evidence from across health disciplines and
across the spectrum of related health endpoints (e.g. health
effects ranging from inflammation of the lungs to respiratory
disease mortality). In drawing judgments, there is a focus on
evidence of effects in the range of relevant human exposures
to a given pollutant, with a general consideration of studies
with doses or exposures in the range of no more than one or
two orders of magnitude above current or ambient conditions.
In discussing causal determinations, EPA characterizes the
evidence on which the judgment is based, including the
strength of evidence for individual endpoints within each
major health outcome category. Based on these characteriza-
tions, conclusions regarding the WoE for causation are
classified using a five-level hierarchy: ‘‘Causal relationship’’;
‘‘Likely to be a causal relationship’’; ‘‘Suggestive of a causal
relationship’’; ‘‘Inadequate to confer a causal relationship’’
and ‘‘Not likely to be a causal relationship’’ (Table 1). The
framework also discusses several factors for consideration in
delineating between adverse and non-adverse health effects
resulting from exposure to air pollution.
Table 2. EPA aspects to aid in judging causality.
Aspect Description
Consistency of the observed association An inference of causality is strengthened when a pattern of elevated risks is observedacross several independent studies. The reproducibility of findings constitutes one ofthe strongest arguments for causality. If there are discordant results amonginvestigations, possible reasons such as differences in exposure, confounding factors,and the power of the study are considered.
Coherence An inference of causality from one line of evidence (e.g. epidemiologic, controlledhuman exposure [clinical], or animal studies) may be strengthened by other lines ofevidence that support a cause-and-effect interpretation of the association. Thecoherence of evidence from various fields greatly adds to the strength of an inferenceof causality. In addition, there may be coherence in demonstrating effects acrossmultiple study designs or related health endpoints within one scientific line ofevidence.
Biological plausibility An inference of causality tends to be strengthened by consistency with data fromexperimental studies or other sources demonstrating plausible biological mechan-isms. A proposed mechanistic linking between an effect and exposure to the agent isan important source of support for causality, especially when data establishing theexistence and functioning of those mechanistic links are available.
Biological gradient (exposure–response relationship) A well-characterized exposure–response relationship (e.g. increasing effects associatedwith greater exposure) strongly suggests cause and effect, especially when suchrelationships are also observed for duration of exposure (e.g. increasing effectsobserved following longer exposure times).
Strength of the observed association The finding of large, precise risks increases confidence that the association is not likelydue to chance, bias or other factors. However, it is noted that a small magnitude in aneffect estimate may represent a substantial effect in a population.
Experimental evidence Strong evidence for causality can be provided through ‘‘natural experiments’’ when achange in exposure is found to result in a change in occurrence or frequency of healthor welfare effects.
Temporal relationship of the observed association Evidence of a temporal sequence between the introduction of an agent, and appearanceof the effect, constitutes another argument in favor of causality.
Specificity of the observed association Evidence linking a specific outcome to an exposure can provide a strong argument forcausation. However, it must be recognized that rarely, if ever, does exposure to apollutant invariably predict the occurrence of an outcome, and that a given outcomemay have multiple causes.
Analogy Structure activity relationships and information on the agent’s structural analogs canprovide insight into whether an association is causal. Similarly, information on modeof action for a chemical, as one of many structural analogs, can inform decisionsregarding likely causality.
Source: US EPA (2013a, Table I).
832 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
As discussed above, the NAAQS causal framework is
largely based on an IOM (2008) WoE framework. The current
IOM framework has four categories of causal determination,
with only the top two designating evidence that establishes a
causal relationship (top category) or a relationship for which
causality is ‘‘at least as likely as not’’ (second category)
(Table 3). In contrast, as shown in Table 4, the NAAQS
causal framework has five categories of causal determination,
with the top three designating evidence that establishes a
causal relationship (i.e. causal, likely causal, or suggestive
of a causal relationship). Further, in contrast to the IOM
approach, the NAAQS causal framework states that only
one positive study is sufficient to establish a suggestive
causal relationship when the results of other studies are
inconsistent; this can greatly increase the frequency of
positive causal conclusions.
Table 4. EPA’s weight of evidence for causal determination.
Causal determination Health effects
Causal relationship Evidence is sufficient to conclude that there is a causal relationship with relevant pollutant exposures (i.e.doses or exposures generally within one to two orders of magnitude of current levels). That is, thepollutant has been shown to result in health effects in studies in which chance, bias, and confoundingcould be ruled out with reasonable confidence. For example: (a) controlled human exposure studiesthat demonstrate consistent effects; or (b) observational studies that cannot be explained by plausiblealternatives or are supported by other lines of evidence (e.g. animal studies or mode of actioninformation). Evidence includes multiple high-quality studies.
Likely to be a causal relationship Evidence is sufficient to conclude that a causal relationship is likely to exist with relevant pollutantexposures, but important uncertainties remain. That is, the pollutant has been shown to result in healtheffects in studies in which chance and bias can be ruled out with reasonable confidence but potentialissues remain. For example: (a) observational studies show an association, but copollutant exposuresare difficult to address and/or other lines of evidence (controlled human exposure, animal, or mode ofaction information) are limited or inconsistent; or (b) animal toxicological evidence from multiplestudies from different laboratories that demonstrate effects, but limited or no human data are available.Evidence generally includes multiple high-quality studies.
Suggestive of a causal relationship Evidence is suggestive of a causal relationship with relevant pollutant exposures, but is limited. Forexample, (a) at least one high-quality epidemiologic study shows an association with a given healthoutcome but the results of other studies are inconsistent; or (b) a well-conducted toxicological study,such as those conducted in the National Toxicology Program (NTP), shows effects in animal species.
Inadequate to infer a causal relationship Evidence is inadequate to determine that a causal relationship exists with relevant pollutant exposures.The available studies are of insufficient quantity, quality, consistency, or statistical power to permit aconclusion regarding the presence or absence of an effect.
Not likely to be a causal relationship Evidence is suggestive of no causal relationship with relevant pollutant exposures. Several adequatestudies, covering the full range of levels of exposure that human beings are known to encounter andconsidering at-risk populations, are mutually consistent in not showing an effect at any level ofexposure.
Source: US EPA (2013a, Table II).
Table 3. IOM recommended categories for the level of evidence for causation.
Causal determination Evidence
Sufficient The evidence is sufficient to conclude that a causal relationship exists. For example: (a) replicated and consistent evidenceof an association from several high-quality epidemiologic studies that cannot be explained by plausible noncausalalternatives (e.g. chance, bias or confounding); or (b) evidence of causation from animal studies and mechanisticknowledge; or (c) compelling evidence from animal studies and strong mechanistic evidence from studies in exposedhumans, consistent with (i.e. not contradicted by) the epidemiologic evidence.
Equipoise and above The evidence is sufficient to conclude that a causal relationship is at least as likely as not, but not sufficient to conclude thata causal relationship exists. For example: (a) evidence of an association from the preponderance of several high-qualityepidemiologic studies that cannot be explained by plausible noncausal alternatives (e.g. chance, bias or confounding) aswell as animal evidence and biological knowledge consistent with a causal relationship; or (b) strong evidence fromanimal studies or mechanistic evidence that is not contradicted by human or other evidence.
Below equipoise The evidence is not sufficient to conclude that a causal relationship is at least as likely as not, or is not sufficient to makea scientifically informed judgment. For example: (a) consistent human evidence of an association that is limited bythe inability to rule out chance, bias, or confounding with confidence, and weak animal or mechanistic evidence; or(b) animal evidence suggestive of a causal relationship, but weak or inconsistent human and mechanistic evidence; or(c) mechanistic evidence suggestive of a causal relationship, but weak or inconsistent animal and human evidence; or(d) the evidence base is very thin.
Against The evidence suggests the lack of a causal relationship. For example: (a) consistent human evidence of no causalassociation from multiple studies covering the full range of exposures encountered by humans; or (b) animal ormechanistic evidence supportive of a lack of a causal relationship.
Source: IOM (2008).
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 833
There are many valuable features in the current NAAQS
causal framework. These include literature searching that is
iterative, consideration of modified Bradford Hill criteria to
aid in judgments regarding causal determinations, weighing
alternative views on controversial issues, integration of
judgments of the evidence across disciplines and endpoints,
focusing on evidence of health effects in the range of human-
relevant exposures, consideration of the adversity of health
outcomes and a hierarchy for categorization of the strength of
the evidence.
Some aspects of the NAAQS causal framework could be
more specific, however, and the framework does not explicitly
include several features that, we would argue, are critical for
a thorough and internally consistent WoE evaluation. For
example, there is no explicit guidance for literature search
strategies or determining study exclusion criteria. There is
also no explicit guidance for evaluating the strengths and
weaknesses of studies and no information on how to use study
quality criteria to assign a quality weight to individual studies.
There are no clear statements regarding how the Bradford Hill
criteria should be applied (e.g. there is no indication of what
constitutes a strong association) or how the causality judg-
ments should consider all criteria jointly. There is also no
guidance on how to evaluate alternative hypotheses that are
supported by the data (e.g. a given substance is not a causal
factor for a health effect and apparent associations are
explicable by other factors) or what criteria should be used
to determine which hypothesis is most supported by the data.
As indicated in the NAS roadmap, EPA needs to state
the justification for its judgments explicitly – including the
reasoning behind judgments – to aid transparency.
Because of these and other limitations discussed below,
EPA’s NAAQS causal framework has not always been applied
consistently in evaluations of causality for individual sub-
stances – much less across substances – and adequate
transparency regarding the justification of specific causal
judgments is not achieved in the criteria air pollutant ISAs.
We provide specific examples of this with a case study of the
ozone ISA (US EPA, 2013a) below.
WoE best practices
We recently conducted a survey of existing WoE frameworks,
including the NAAQS causal framework, to evaluate WoE
best practices (Rhomberg et al., 2013). From these frame-
works, we identified four successive phases to the overall
WoE process that are consistent with the critical steps for the
development of a scientifically sound assessment of the WoE
for causation, as identified by the NAS panel that reviewed
EPA’s formaldehyde assessment. These critical steps should
be considered by EPA in developing an updated NAAQS
framework. Below, we describe these phases; Table 5 presents
their general features.
Phase 1. Define the causal question and develop criteria
for study selection. Frame the purpose of the evaluation and
the causal questions to be evaluated, and define the criteria for
selecting studies relevant to the evaluation to ensure trans-
parency. Prior to this phase, there is a separate role for
problem formulation, which could be considered as ‘‘Phase
0’’ and entails the evaluation of what is needed to address the
decisions to be made, along with an assessment of the
capacity of an analysis of available data to provide answers
addressing those needs.
Phase 2. Develop and apply criteria for review of
individual studies. Conduct and present a systematic and
consistent review of available studies relevant to the causal
Table 5. Weight-of-evidence best practices.
Phase Steps
Phase 1Define the causal question and develop criteria for study selection Define causal question or hypothesis
Define criteria for study inclusion and exclusionPlan literature searchDesign literature search strategiesScreen and select studies, with justification for exclusionRecord search strategies, results, and decisions
Phase 2Develop and apply criteria for review of individual studies Extract study characteristics
Extract dataAssess study qualityCategorize study qualityAssess individual study resultsCategorize study relevance and adequacy
Phase 3Integrate and evaluate evidence Evaluate data within and across realms of evidence
(e.g. toxicology, epidemiology, MoA)Assess MoAAssess adversity of effectsCompare alternative accounts of observationsFormulate WoE conclusionsIdentify data gaps and propose next steps
Phase 4Draw conclusions based on inferences Propose categories for causal relationships
Propose recommendations for risk assessmentPropose recommendations for research or policies
Source: Rhomberg et al. (2013).
834 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
question. Evaluate the rigor and quality of individual study
results using pre-defined criteria applied uniformly across
studies.
Phase 3. Integrate and evaluate evidence. Make sound
and defensible scientific judgments about the existence and
nature of causative processes for the health outcome under
consideration. This is one of the more challenging phases for
any WoE framework; no matter how one lays out procedures
and methods for synthesizing across studies, in the end, the
question is about how studies in one setting (e.g. animal or
in vitro assays) should affect our assessment of potential
causality or risks in another (e.g. the general human
population exposed environmentally).
Phase 4. Draw conclusions based on inferences. Apply the
results of the WoE evaluation from Phase 3 to make
conclusions that can be used to inform regulatory decision-
making. Although this phase is not risk management itself, it
can be influenced by risk management considerations. In a
regulatory setting, decisions about WoE categories (‘‘known’’
causative agent, ‘‘likely’’ causative agent, etc.) or findings
about the science [sufficient evidence for a mode of action
(MoA) or to replace a default assumption for developing a
toxicity value] are influenced in this stage by policy questions
and regulatory consequences for those decisions and, ultim-
ately, by policies and judgments about the sufficiency of
evidence to support those decisions.
The most important aspect of a WoE framework is that it
provides specific and transparent guidance for how to work
through the WoE process. At the same time, the framework
needs to be flexible so that it can be applied in a consistent
manner across different types of datasets for evaluations of
causality. A WoE evaluation is only useful and applicable to
constructive scientific debate if the logic behind it is made
clear, and, with that, it is often necessary to take the reader
through alternative interpretations of the data so that the
various interpretations can be compared logically. This
approach does not eliminate the need for scientific judgment,
and often may not lead to a definitive choice of one
interpretation over the other, but it will clearly lay out the
logic for how one weighs the evidence for and against each
interpretation. Only in this way is it possible to have
constructive scientific debate about potential causality that
is focused on an organized, logical ‘‘weighing’’ of the
evidence.
The NAAQS causal framework provides guidance on
many features of the WoE phases noted above. Some of them
are explicit and some are clearly included in the framework,
but specific guidance is not always provided (e.g. study
exclusion criteria). As a result, evaluations are not always
consistent either among studies for a particular criteria
pollutant or among pollutants, as discussed below in the
case study of ozone.
Table 5 outlines WoE best practices generally and Table 6
outlines features of the NAAQS causal framework, as well as
additional features that we recommend based on best
practices. These additions will enable the framework to
include these WoE best practices and help ensure that they are
applied consistently. Several of the recommended features
may have, in practice, been incorporated into WoE evalu-
ations in the criteria pollutant ISAs; however, if they are not
explicitly stated as guidance in the discussion of the NAAQS
causal framework in the ISA Preamble, or if they are
discussed as occurring in a Phase that does not match
where they should occur according to WoE best practices,
they are still noted in Table 6 as additional, best practice
features. Below, we discuss these features by phase.
Phase 1: Define the causal question and developcriteria for study selection
The causal question and breadth of scope should be explicitly
stated in Phase 1 of a WoE evaluation to provide a clear
purpose for the assessment as well as direction for the
remaining steps. Defining the causal question at the beginning
of the process helps ensure that the correct question is posed,
and it sets the context within which the bearing and utility of
studies can be evaluated. Study selection criteria and the
literature search design should be articulated clearly (Tables 5
and 6). The end product of Phase 1 is a clearly defined causal
question, a documented search strategy with clear inclusion
and exclusion criteria, a list of included and excluded studies
with reasons for inclusion or exclusion, and a record of any
deviations from the original plan.
In the NAAQS causal framework, the definition of the
causal question is inferred from the purpose of the ISA in the
NAAQS process, which is to provide a ‘‘concise review,
synthesis, and evaluation of the most policy-relevant science
to serve as a scientific foundation for the review of the
[NAAQS]’’ (US EPA, 2013a). EPA does not, but in our view
should, explicitly resolve the overarching causal question into
specific issues for problem formulation. For example, EPA
could specify aspects of the analysis regarding what is
relevant to existing human exposures, the basis for what
constitutes sensitive subpopulations, whether exposure to
other agents affects response, and interactions with other
stressors. In other words, by specifying whether the purpose is
a ‘‘hazard in context’’ decision or a more general identifica-
tion of a potential hazard at any exposure level, this would
help to guide decisions on whether or not it is found in the
actual population. Problem formulation should try to identify
the specific sub-judgments that need to be made to support
the overall goal, with the aim of enabling assessment of
whether the available data speak to and enable sound
decisions on them.
With regard to the literature search, the Preamble of the
ozone ISA provides an overview of EPA’s general literature
search strategy and the various ways in which studies are
identified for potential inclusion. Some of the study inclusion
criteria are also presented, and an evaluation of study quality
is included in the study selection process. However, the
Preamble does not provide guidance regarding study exclu-
sion criteria or justification for inclusion or exclusion of
studies. As shown in Table 6, the NAAQS causal framework
should include explicit guidance regarding literature search
strategies for identifying all available evidence, as well as
criteria for both study inclusion and exclusion (e.g. study
type, types of participants, exposures, outcomes). While
relevance to ambient exposures to criteria pollutants is
important to consider at this point, allowances may be made
for scientific information that does not meet inclusion criteria
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 835
Tab
le6
.M
od
ifie
dN
AA
QS
cau
sal
fram
ewo
rk.
Ste
pN
ote
s
Ph
ase
1
Def
ine
cau
sal
qu
esti
on
or
hy
po
thes
isIn
ferr
edfr
om
NA
AQ
Sp
roce
ss(a
sis
pro
ble
mfo
rmu
lati
on
)D
eter
min
eb
rea
dth
of
sco
pe
Exp
lore
revi
ewq
ues
tio
ns
no
tid
enti
fied
ap
rio
riD
evel
oped
duri
ng
the
iter
ativ
epro
cess
of
the
lite
ratu
rese
arch
Def
ine
crit
eria
for
stu
dy
incl
usi
on
an
dex
clu
sio
nM
ayocc
ur
inse
ver
alst
ages
asnew
revie
wques
tions
are
iden
tifi
edbas
edon
lite
ratu
rese
arch
Det
erm
ine
stu
dy
typ
esD
efin
ety
pes
of
par
tici
pan
tsD
efin
eex
posu
res/
risk
fact
ors
/inte
rven
tions
Def
ine
ou
tco
mes
Co
nsi
der
do
se
Pla
nli
tera
ture
searc
h�
Iter
ativ
em
od
ific
atio
ns
too
pti
miz
eid
enti
fica
tio
no
fp
erti
nen
tp
ub
lica
tio
ns
�S
pec
iali
zed
sear
ches
on
spec
ific
topic
sIn
volv
eli
bra
ria
ns
Ch
oo
sed
ata
ba
ses
Iden
tify
arti
cles
by
exp
erts
,th
ep
ub
lic,
and
adv
iso
ryco
mm
itte
esR
evie
wre
fere
nce
list
s,ta
ble
of
con
ten
tso
fre
levan
tjo
urn
als,
cita
tio
ns
inp
rev
iou
sas
sess
men
tsS
earc
hgre
yli
tera
ture
Iden
tify
on
go
ing
/un
pu
bli
shed
stu
die
s
Des
ign
lite
ratu
rese
arc
hst
rate
gie
sC
ho
ose
key
wo
rds
Det
erm
ine
lan
gu
age,
da
te,
an
dd
ocu
men
tfo
rma
tre
qu
irem
ents
Scr
een
an
dse
lect
stu
die
s,w
ith
just
ific
ati
on
for
excl
usi
on
�A
sses
sag
reem
ent
of
stu
dy
sele
ctio
nam
on
gin
ves
tigat
ors
�In
clu
de
pee
r-re
vie
wed
stu
die
san
dE
PA
eval
uat
ion
so
fp
ub
lica
lly
avai
lab
led
ata
�C
on
sid
erre
levan
ceto
amb
ien
tex
po
sure
s�
Mak
eal
low
ance
sfo
rp
iece
so
fin
form
atio
nfr
om
area
so
uts
ide
of
incl
usi
on
/excl
usi
on
crit
eria
that
may
imp
act
eval
uat
ion
�D
on
ot
excl
ud
est
ud
ies
ba
sed
on
qu
ali
tyo
rif
stu
dy
rep
ort
sn
ull
or
neg
ati
vefi
nd
ing
sa
tth
isp
oin
t�
Do
no
tco
nsi
der
po
licy
-rel
eva
nce
of
stu
die
s
Rec
ord
sea
rch
stra
teg
ies,
resu
lts,
an
dd
ecis
ion
sS
ort
refe
ren
ces
inb
ibli
og
rap
hic
soft
war
e
Ph
ase
2
Ex
tra
ctst
ud
ych
ara
cter
isti
csT
hes
ein
clu
de
pa
rtic
ipa
nts
an
dse
ttin
g,
met
ho
ds,
inte
rven
tio
ns,
ou
tco
me
mea
sure
s,re
sult
s,a
nd
qu
ali
tya
spec
tsD
evel
op
qu
ali
tyco
ntr
ol
for
da
taen
try
Det
erm
ine
dis
agre
emen
tre
solu
tio
n
Ex
tra
ctd
ata
�S
ho
uld
be
ba
sed
on
rele
van
ceto
cau
sal
qu
esti
on
�M
ake
sure
all
rele
van
td
ata
are
extr
act
ed(i
.e.
itis
no
tsu
ffic
ien
tto
extr
act
asu
bse
to
fre
leva
nt
da
ta)
Det
erm
ine
da
taca
teg
ori
esO
rga
niz
eev
iden
cein
toco
nsi
sten
tse
tso
fca
teg
ori
esC
ate
go
rize
hea
lth
effe
ct(s
)b
ysp
ecif
icit
ya
nd
late
ncy
836 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
Ass
ess
stu
dy
qu
ali
tyD
iscu
ssgen
eral
lim
itat
ion
sto
con
sid
erfo
rea
chst
ud
yty
pe
(e.g
.co
ntr
oll
edh
um
anex
po
sure
,ep
idem
iolo
gy,
tox
ico
log
y)
Stu
dy
des
ign
Incl
ud
ea
sses
smen
to
fco
un
terf
act
ua
lity
of
stu
dy
des
ign
Str
eng
ths
an
dw
eakn
esse
sE
xp
osu
re/i
nte
rven
tio
nas
sess
men
tA
ssig
nh
igh
erw
eig
ht
tost
ud
ies
wit
hd
irec
tex
po
sure
mea
sure
men
t,ab
sen
ceo
fco
-po
llu
tan
tex
po
sure
s,o
ru
seo
fm
ult
i-p
oll
uta
nt
mo
del
sC
on
fou
nd
ers
Det
erm
ine
ifp
ote
nti
alco
nfo
un
der
sid
enti
fied
and
ifp
arti
ally
or
full
yad
dre
ssed
by
auth
ors
Sys
tem
ati
cer
ror
(bia
s)P
reci
sio
no
fm
easu
rem
ents
Rep
lica
bil
ity
of
ob
serv
ati
on
sR
elia
bil
ity
of
da
taS
tati
stic
alm
eth
od
sA
ssig
nh
igh
erw
eig
ht
tost
ud
ies
that
use
dst
atis
tica
lm
eth
od
sto
det
ect
and
con
tro
lfo
rco
nfo
un
din
gan
dm
ult
iple
com
par
iso
ns,
wh
enn
eces
sary
Ou
tlie
rsSel
ecti
veoutc
om
ere
port
ing
Iden
tify
fra
ud
ule
nt
stu
die
sE
lim
inat
efr
om
anal
ysi
s.
Ca
teg
ori
zest
ud
yq
ua
lity
�Q
ua
lita
tive
lyo
rb
yra
nki
ng
,gra
din
g,
or
sco
rin
g�
Do
no
tel
imin
ate
an
yst
ud
ies
ba
sed
on
wea
knes
s(u
nle
ssfa
tall
yfl
aw
eda
nd
no
tin
terp
reta
ble
)
Ass
ess
ind
ivid
ua
lst
ud
yre
sult
sS
tren
gth
of
ass
oci
ati
on
Det
erm
ine
ap
rio
rith
ed
efin
itio
no
fa
‘‘st
ron
g’’
ass
oci
ati
on
(e.g
.ri
skes
tim
ate�
3).
Use
the
sam
ecr
iter
iafo
rju
dg
ing
som
eth
ing
as
ari
skfa
cto
rvs
.a
pro
tect
ive
fact
or
(e.g
.a
risk
esti
ma
teo
f0
.99
2sh
ou
ldca
rry
the
sam
ew
eig
ht
as
ari
skes
tim
ate
of
1.0
08
)In
tern
al
con
sist
ency
Res
ult
su
sin
gd
iffe
ren
tm
od
els
sho
uld
be
rela
tive
lyco
nsi
sten
tB
iolo
gic
al
pla
usi
bil
ity
Bo
thd
ata
ind
ica
tin
gb
iolo
gic
al
pla
usi
bil
ity
an
da
lack
of
bio
log
ica
lp
lau
sib
ilit
ysh
ou
ldb
eco
nsi
der
edTem
po
rali
tyIt
isn
ot
on
lyth
at
the
expo
sure
mu
sto
ccu
rb
efo
rea
nef
fect
,b
ut
tha
tit
occ
urs
wit
hin
an
ap
pro
pri
ate
tim
efr
am
eD
ose
-res
po
nse
or
expo
sure
–re
spo
nse
Ifin
crea
sin
gef
fect
sa
reo
bse
rved
wit
hin
crea
sin
gex
po
sure
so
rd
ura
tio
no
fex
po
sure
s,th
isis
evid
ence
for
aca
usa
lre
lati
on
ship
;a
lack
of
this
ass
oci
ati
on
isev
iden
cea
ga
inst
aca
usa
lre
lati
on
ship
Va
ria
tio
n/h
isto
rica
lco
ntr
ol
rate
sD
eter
min
eh
ow
dep
end
ent
resu
lts
are
on
valu
esin
the
con
tro
lgro
up
,p
art
icu
larl
yif
they
dif
fer
fro
mh
isto
rica
lco
ntr
ols
Ou
tco
me
ass
essm
ent
Ra
nd
om
erro
r(c
ha
nce
)
Ca
teg
ori
zest
ud
yre
leva
nce
an
da
deq
ua
cy�
Det
erm
ine
wh
eth
eran
dto
wh
atex
ten
tre
sult
sar
ere
levan
tto
amb
ien
tex
po
sure
s�
Det
erm
ine
app
rop
riat
enes
san
du
sefu
lnes
so
fth
ed
ata
for
haz
ard
and
risk
asse
ssm
ent
Ph
ase
3
Eva
lua
ted
ata
wit
hin
an
da
cro
ssre
alm
so
fev
iden
ce(e
.g.
tox
ico
log
y,ep
idem
iolo
gy,
Mo
A)
�In
tegra
ted
ata
acr
oss
all
lin
eso
fev
iden
ceso
inte
rpre
tati
on
of
on
ew
ill
info
rmin
terp
reta
tio
no
fth
eo
ther
�A
sses
sa
lld
ata
,in
clu
din
gn
ega
tive
,n
ull
,a
nd
po
siti
vere
sult
s�
Ass
ign
less
wei
gh
tto
resu
lts
of
stu
die
so
flo
wer
qu
ali
ty�
Inco
rpo
rate
pee
ran
dp
ub
lic
com
men
tan
dad
vic
eS
tren
gth
of
asso
ciat
ion
Det
erm
ine
ap
rio
rith
ed
efin
itio
no
fa
‘‘st
ron
g’’
ass
oci
ati
on
(e.g
.ri
skes
tim
ate�
3).
Use
the
sam
ecr
iter
iafo
rju
dg
ing
som
eth
ing
as
ari
skfa
cto
rvs
.a
pro
tect
ive
fact
or
(e.g
.a
risk
esti
ma
teo
f0
.99
2sh
ou
ldca
rry
the
sam
ew
eig
ht
as
ari
skes
tim
ate
of
1.0
08
)C
on
sist
ency
of
asso
ciat
ion
sA
sso
cia
tio
ns
sho
uld
be
con
sist
ent
bo
thw
ith
ina
nd
acr
oss
stu
die
s,p
art
icu
larl
yth
ose
wit
hd
iffe
ren
tst
ud
yd
esig
ns
Co
her
ence
Res
ult
sa
cro
ssa
llli
nes
of
evid
ence
sho
uld
be
coh
eren
t;if
no
t,sp
ecie
scl
ose
stto
hu
ma
ns
sho
uld
be
con
sid
ered
toh
ave
mo
rere
leva
nce
toh
um
an
s
(co
nti
nu
ed)
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 837
Tab
le6
.C
on
tin
ued
Ste
pN
ote
s
Bio
log
ical
pla
usi
bil
ity
Bo
thd
ata
ind
ica
tin
gb
iolo
gic
al
pla
usi
bil
ity
an
da
lack
of
bio
log
ica
lp
lau
sib
ilit
ysh
ou
ldb
eco
nsi
der
edB
iolo
gic
alg
rad
ien
t(d
ose
-res
po
nse
)If
incr
easi
ng
effe
cts
are
ob
serv
edw
ith
incr
easi
ng
exp
osu
res
or
du
rati
on
of
expo
sure
s,th
isis
evid
ence
for
aca
usa
lre
lati
on
ship
;a
lack
of
this
ass
oci
ati
on
isev
iden
cea
ga
inst
aca
usa
lre
lati
on
ship
Ex
per
imen
tal
evid
ence
Tem
po
rali
tyIt
isn
ot
on
lyth
at
the
expo
sure
mu
sto
ccu
rb
efo
rea
nef
fect
,b
ut
tha
tit
occ
urs
wit
hin
an
ap
pro
pri
ate
tim
efr
am
eS
pec
ific
ity
An
alo
gy
Co
nfo
un
din
gD
eter
min
eif
con
sist
ent
fin
din
gs
ma
yb
ea
resu
lto
fa
con
sist
ent
con
fou
nd
erB
ias
Ass
ess
da
tare
leva
nt
tom
od
eo
fa
ctio
n(M
oA
)P
ost
ula
teM
oA
theo
ryD
escr
ibe
key
even
tsin
Mo
AE
valu
ate
Mo
Ab
ase
do
nB
rad
ford
Hil
lcr
iter
iaC
on
sid
ero
ther
po
ssib
leM
oA
sC
on
sid
eru
nce
rta
inti
es,
inco
nsi
sten
cies
,a
nd
da
taga
ps
for
Mo
AD
eter
min
ew
het
her
Mo
Aes
tab
lish
edin
an
ima
lsA
sses
sw
het
her
key
even
tsin
anim
alM
oA
pla
usi
ble
inh
um
ans
Ass
ess
do
se-r
esp
on
sefo
rke
yev
ents
inM
oA
Ass
ess
ad
vers
ity
of
effe
cts
�A
dap
tive,
com
pen
sato
ry,
tran
sien
t,an
dre
ver
sib
leef
fect
sar
ele
ssli
kel
yto
be
adver
se�
Pre
curs
ors
toap
ical
effe
cts,
sever
eef
fect
s,an
dir
rever
sib
leef
fect
sar
em
ore
likel
yto
be
adver
se
Co
mp
are
alt
ern
ati
ve
acc
ou
nts
of
ob
serv
ati
on
s,in
clu
din
gco
ntr
ove
rsia
lis
sues
�A
sses
sw
het
her
ad
dit
ion
al
evid
ence
sup
po
rts
or
wea
ken
sth
eca
usa
la
ssu
mp
tio
n�
Pro
po
sen
ewh
ypo
thes
is/a
cco
un
to
fev
iden
ceif
itp
rovi
des
mo
reco
nvi
nci
ng
resu
lt�
Iter
ate
ass
essm
ent
ba
sed
on
new
evid
ence
ifre
sult
sa
rea
mb
igu
ou
so
rif
new
hyp
oth
esis
req
uir
essu
pp
ort
Fo
rmu
late
Wo
Eco
ncl
usi
on
sN
ote
ass
um
pti
on
s,es
pec
iall
yw
hen
they
are
adh
oc
Iden
tify
da
taga
ps
an
dp
rop
ose
nex
tst
eps
Ph
ase
4
Pro
po
seca
teg
ori
esfo
rca
usa
lre
lati
on
ship
s
Pro
po
sere
com
men
da
tio
ns
for
risk
ass
essm
ent
Pro
po
sere
com
men
da
tio
ns
for
rese
arc
ho
rp
oli
cies
Cu
rren
tst
eps
inth
eN
AA
QS
cau
sal
fram
ewo
rkar
ep
rese
nte
din
no
n-i
tali
cfo
nt
wit
had
dit
ion
alb
est
pra
ctic
esp
rese
nte
din
ital
icfo
nt.
So
me
of
the
add
itio
nal
bes
tp
ract
ices
may
hav
e,in
pra
ctic
e,b
een
inco
rpo
rate
din
toW
oE
eval
uat
ion
sin
the
crit
eria
po
llu
tan
tIS
As,
bu
tif
they
are
no
tex
pli
citl
yst
ated
asg
uid
ance
inth
ed
iscu
ssio
no
fth
eN
AA
QS
cau
sal
fram
ewo
rkin
the
ISA
Pre
amb
leo
rar
ed
iscu
ssed
aso
ccu
rrin
gin
aP
has
eth
atd
oes
no
tm
atch
wh
ere
they
sho
uld
occ
ur
acco
rdin
gto
Wo
Eb
est
pra
ctic
es,
they
are
pre
sen
ted
init
alic
fon
tas
add
itio
nal
bes
tp
ract
ices
.
838 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
but may nonetheless affect the assessment through their
bearing on interpretation of those studies that do meet
narrower criteria. Studies should be screened and selected,
with justification provided for inclusion or exclusion. All
strategies and decisions should be recorded, particularly those
that occur after the evaluation is underway. It is critical that
studies should not be excluded at this point in the analysis
based on quality, type (e.g. case studies), or if they report null
or negative findings. These factors should be considered in
subsequent phases (as we discuss further below). In addition,
in a scientific evaluation such as in the ISA, no study with
potential scientific use should be excluded based on policy-
relevance. Detailed instructions for conducting literature
searches can be found in the Cochrane Handbook for
Systematic Reviews of Interventions (Higgins & Green,
2011). Although this handbook is focused on medical
interventions, its guidance for literature searches can be
valuable in application to literature searches of any WoE
evaluation.
Phase 2: Develop and apply criteria for review ofindividual studies
In Phase 2, study characteristics and data are extracted, and
study quality is assessed and categorized. Individual study
results are assessed, and study relevance and adequacy are
determined (Tables 5 and 6).
It is crucial that all relevant data presented, (and not just a
subset of those data) are extracted from each study. This may
be an iterative process, as what is considered relevant for
inclusion may change as more studies are reviewed. In
general, however, the data extracted should be based on the
causal question. If positive data are extracted more often than
null data from individual studies, the data will appear to be
more consistent than they actually are. The amount of data
extracted per study is generally quite large; thus, data should
be organized into consistent categories so that they can be
compared within and across studies. For example, specific
health outcomes should be captured for each time period that
they are evaluated in each study, based on all relevant
statistical models, so that results can be compared for
consistency.
The NAAQS causal framework does not provide a clear
account of its study characteristic and data extraction process
(Table 6). Regarding study quality, as EPA notes, one should
consider general limitations for each study type (e.g.
controlled human exposure, epidemiology, toxicology)
(Table 1). For example, EPA states that it assigns higher
weight to studies with direct (i.e. personal) exposure meas-
urement, an absence of co-pollutant exposures, or use of
multi-pollutant models, and studies that use statistical
methods to detect and control for confounding and multiple
comparisons, when necessary. However, there also should be
a discussion of each of these factors that is independent of the
description of individual studies. EPA should discuss all of
the ways in which exposure can be measured, the strengths
and limitations of each method, the possibility for exposure
measurement error, and which methods carry the most
weight. For example, studies that use central-site monitors
to estimate personal exposure should be assigned less weight
than studies that use more refined methods, such as land-use
regression models, or personal monitoring data in the rare
cases it is available, to estimate exposure. There should also
be a discussion of statistical methods used among all studies
evaluated and which specific methods are more robust and
why (e.g. Have multiple comparisons been addressed? Are
assumptions in Cox proportional hazard models appropri-
ate?). Specific confounders should be addressed (e.g.
co-pollutants, socioeconomic status, age) in terms of how
they are handled in different studies and their likely impact on
results. If confounders are evaluated only at the beginning of a
study and not during follow-up, the potential for residual
confounding should be considered. Other factors that should
be considered in detail include bias, measurement precision,
replicability of observations, data reliability, outliers, select-
ive outcome reporting and fraudulent reporting. What is
crucial is that the way these quality measures are evaluated
is the same across studies. If a particular statistical model
is considered a limitation in one study, it must be determined
whether it should be considered a limitation in other studies
that use it for similar analyses. An evaluation of study quality
should be independent of the results and funding source of
that study. It should be based purely on the methods, in a
consistent manner across studies, and studies with more
robust methods should receive more weight in the overall
analysis.
Study quality can be categorized in a number of ways; it
can be done qualitatively or quantitatively using a ranking,
grading, or scoring method. EPA does not state explicitly how
it characterizes studies within the NAAQS causal framework.
Using a qualitative approach, all studies could be listed in a
table or set of tables, with strengths and limitations in each
column, so one can look across rows and columns to get an
idea of strengths and limitations across studies. When a group
of studies is then evaluated together in Phase 3, one can refer
to this table to determine what weight to give each study
based on its relative strengths and limitations.
There are several quantitative weighting approaches that
have been recommended for evaluating studies based on
quality. For example, Linkov et al. (2011) presented a
quantitative multi-criteria decision analysis framework for
evaluating individual lines of evidence in a WoE evaluation.
Similarly, Lavelle et al. (2012) proposed a step-wise approach
to systematically evaluate human and animal studies, which
includes rating and ranking studies based on data quality
criteria, and Klimisch et al. (1997) proposed a scheme for
evaluating human data into four ranking categories for
inclusion in a WoE evaluation (i.e. reliable without restric-
tion, reliable with restrictions, not reliable and not assign-
able). While all these approaches have merits, any
quantitative scheme can be arbitrary in terms of the weight
given to different facets (e.g. Should exposure measurement
error be given the same weight as an unaccounted for
confounder?). Thus, a qualitative approach should also be
considered.
Regardless of the approach used, studies should not be
eliminated from an analysis simply based on weakness or a
low score in a quantitative ranking; rather, they should be
given less weight in the final analysis. Low quality studies
(e.g. studies with several methodological limitations) need to
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 839
be considered in Phase 3, particularly if these studies are the
basis of an explanation of the observations at hand that
contrasts with another explanation (as discussed further
below). Study quality information should be used to incorp-
orate results into the larger arguments, as in Phase 3, but study
quality should not drive this process. Studies with fatal flaws
(e.g. uninterpretable findings resulting from methodological
errors or falsified data) can be eliminated, however.
After study quality is addressed, the results of individual
studies should be assessed with consideration of strength of
association, internal consistency, biological plausibility, tem-
porality, dose-response, historical control rate, outcome
assessment and random error. Regarding strength of associ-
ation, EPA indicates that larger, precise risks increase
confidence that the association is not due to chance, bias or
other factors. EPA also notes, however, that ‘‘a small
magnitude in an effect estimate may represent a substantial
effect in a population’’ (Table 2). While this statement is true,
consistent with WoE best practices, EPA should apply the
same consideration to the converse: a small effect estimate
that is not due to chance, bias or another factor could
represent a substantial effect, but, because it is small, it may
actually indicate that the association is not causal. A great
deal of effect estimates in the air pollution epidemiology
literature are extremely small (e.g. 51.1); EPA should
acknowledge that associations of small magnitude do not
provide strong evidence of causality, as the hypothesis
attributing the association to a causal effect is not markedly
more plausible than is attributing the association to one or
several of the many factors (e.g. confounders, random chance)
that could partially or fully account for the statistical
association.
Internal consistency is not always, but always should be,
considered when assessing individual studies in the ISAs.
While it is not necessary that every statistical model produce
the same results (if that were the case, one would not need to
assess more than one model), the results should be relatively
consistent. The consideration of internal consistency also
should extend to the evaluation of results for multiple cities or
among regions, as well as for different time periods, as
discussed further below. If results are not consistent, the
reasons why should be explored. That is, the process of
evaluating the consistency and plausibility of support for a
causal interpretation should consider such issues as the
tentative reasons proposed for why apparent discordance
exists; why particular findings among the discordant results
should be considered informative about potential human risk
while other contrary findings are not; and why the contrary
findings should not be viewed as refutations of the causality
assertion. If a lack of causality is as likely an explanation as
another, the study should not be considered evidence for a
causal association.
EPA discusses the importance of biological plausibility, as
a known biologically plausible MoA increases the likelihood
that an association is causal. If there is evidence suggesting an
association is not biologically plausible (e.g. the body’s
defense mechanisms prevent an adverse effect up to a certain
point), however, this must be considered as well. Similarly, as
discussed by EPA (Table 2), a well-characterized exposure–
response relationship increases confidence for a cause-effect
association, but EPA should also consider that a lack of
exposure–response in a study with the statistical power to
evaluate it is evidence against causation.
Temporality is also considered by EPA, but the NAAQS
causal framework is missing a discussion (or evaluation) of
whether effects occur within an appropriate time frame.
Effects that occur at time points that cannot be linked to an
exposure in a biologically plausible manner cannot be
considered evidence of an effect. If a biologically plausible
time frame is not known a priori, one should look for
consistency among the time frames during which effects
occur across studies. Another important factor is the consid-
eration of controls. A control population might not be truly
representative of unexposed individuals, and this may influ-
ence the outcome. It is essential that the control population be
evaluated to determine if and how it influences results.
Another point regarding the assessment of individual study
results is the consideration of whether results are likely a
result of random error or chance. If these are just as likely
explanations as a causal association, one should not consider
an association indicative of causation.
The final step in Phase 2 is to determine whether and to
what extent studies are relevant to ambient exposures and the
relevance (i.e. appropriateness) and adequacy (i.e. usefulness)
of data for hazard and risk assessment. The NAAQS causal
framework indicates this should be done, but this is often not
explicitly discussed in ISAs.
Phase 3: Integrate and evaluate evidence
Phase 3 involves integrating and evaluating data across realms
of evidence (Tables 5 and 6). The NAAQS causal framework
includes modified Bradford Hill criteria (Table 2) to aid in
judgments regarding causality, which is consistent with Phase
3 best practices, but it should also update its guidance on how
the criteria should be applied to the available evidence and
how all criteria should be considered jointly (i.e. as a whole)
and consistently. As discussed above, descriptions for strength
of association, coherence, consistency, biological plausibility
and temporality are missing several key components that
should always be considered in a WoE evaluation (Table 6).
The EPA NAAQS causal framework looks separately at the
collective human studies and animal toxicology evidence,
first coming to a synthesized WoE judgment for each of these
realms and then integrating these separate judgments for
different lines of evidence into an overall qualitative statement
about causality and integration of human and animal toxicity
data with data on exposure levels (US EPA, 2013a). It is
preferable that the data evaluation be integrated across all
lines of evidence based on their mutual illumination of
potential biological processes that may underlie toxicity.
In this way, interpretation of each line of evidence informs the
interpretation of the others. This approach emphasizes the
evaluation of how the results of particular studies can be
applied more generally to inform the potential for similar
causal processes in other studies, including studies in other
realms of investigation. It is the potential for such common-
ality of causal processes that makes each particular result
constitute evidence about the causality question being
evaluated, and it is the reason that animal data can be
840 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
considered as evidence for potential effects in humans. EPA’s
method of integrating judgments at the end of the evaluation
does not allow data from one realm of evidence to influence
conclusions from another. For example, if an epidemiology
analysis can be interpreted two ways, and animal studies can
shed light on whether one is more plausible than the other,
this should be considered when making judgments about the
epidemiology analysis. An empirical finding of, say, a dose-
rate effect or a sex difference may be plausible without any
direct evidence among human data, but animal or mechanistic
studies may provide insight into whether such a phenomenon
has a plausible biological basis in humans. A related aspect is
that one should ensure that added assumptions – even
reasonable and biologically plausible ones – that are invoked
to account for variations within a realm of investigation (say,
among human studies) are not contradicted by assumptions
made when coming to integrated evaluation within another
realm (say, to explain variations among animal bioassay
outcomes in different rodent species). This cannot be done
if judgments are made separately within realms, and only
the conclusions (and not their basis) are integrated at the
end of the process.
An assessment of MoA also plays a crucial role in data
integration because it provides a means of understanding the
degree of parallelism to be expected in underlying causal
processes in human and animal responses, and it can give
insight into the potential reasons behind differences in these
processes at different exposure levels, exposure rates or study
populations. The NAAQS causal framework acknowledges
the importance of considering MoA, but it does not state
explicitly how data relevant to MoA are assessed. To the
degree possible, a proposed MoA and its key events, as well
as any alternative MoAs and their key events, should be
described and evaluated. Uncertainties, inconsistencies and
data gaps should be considered. It should be determined
whether a MoA has been established in animals and whether
key events in animals are plausible in humans, although less
than fully established MoAs can still be valuable tools in
assessing the plausibility of tentative understandings about the
bearing of different kinds of data on the overall causality
question. Finally, exposure–response relationships for par-
ticular MoAs should be evaluated, as some MoAs are
dependent upon exposure rate or cumulative exposure.
While it is not necessary to know the specific MoA to
establish causality, it certainly adds to the WoE of the
likelihood of a causal association. It can also help bound
exposure levels at which a MoA occurs. In addition, a known
MoA may provide evidence against causality if it can be
shown that certain key events in the MoA are not biologically
plausible in humans at relevant exposures.
The NAAQS causal framework includes an evaluation of
the adversity of effects, but this occurs after the WoE
decisions are made and causal relationships are categorized
(in Phase 4). It could also occur in Phase 3, however, when
making judgments about the WoE for causality. This would be
appropriate if the causal question is to establish whether an
agent can cause effects that are specifically deemed adverse,
rather than whether it can cause any effect at all. As Phase 4 is
an assessment of the sufficiency of evidence to make
decisions, and if a decision depends on an adversity question
(as is often the case, such as in the NAAQS process), then the
adequacy of the Phase 3 evaluation to support a Phase 4
conclusion on such matters requires a discussion of adversity,
and of the scientific basis for distinguishing adverse from
non-adverse effects. In general, if effects are determined to be
adaptive, compensatory, transient, or reversible, they are less
likely to be adverse; if they are precursors to apical effects,
severe or irreversible, they are more likely to be adverse
(Goodman et al., 2010). As there are exceptions, these should
not be considered as hard and fast rules; rather, they may
serve as guidelines for the evaluation of the adversity of
effects.
The NAAQS causal framework includes the weighing of
alternative views on controversial issues, but this should be
expanded to an evaluation of alternative explanations of the
same sets of results, regardless of whether the data are
controversial. The evidence for and against a hypothesis
should be discussed, and a clear illustration of how one
explanation compares to another should be provided.
Conclusions regarding causality should be presented not just
as the result of judgments but with their context of reasons for
choosing them over competing conclusions. Assumptions in
the evaluation should be noted, especially when they are
ad hoc (i.e. introduced to explain some phenomenon already
seen). One should also assess whether additional evidence
supports or weakens the causal assumption and provide new
hypotheses or accounts of the evidence if it provides a more
convincing result. The assessment should be updated based on
new evidence if results are ambiguous or a new hypothesis
requires support. At this point, one can formulate a WoE
conclusion, identify data gaps, and propose next steps. The
NAAQS causal framework could be strengthened by utilizing
this final step – particularly identifying data gaps – in Phase 3.
Phase 4: Draw conclusions based on inferences
The NAAQS causal framework proposes categories for causal
relationships (Table 4); based on these relationships, EPA
determines which health effects will be evaluated in quanti-
tative risk assessments. An important issue is that EPA’s
guidance for causality categorization is not congruent with the
judgments based on the Bradford Hill criteria. The framework
claims to rely heavily on the criterion of consistency across
studies in its categorization scheme, but, in practice, it does
not fully evaluate consistency or incorporate other criteria
such as coherence, biological plausibility, biological gradient
or strength of association (discussed below in the case study
of ozone).
EPA states that evidence is sufficient to conclude a causal
relationship if ‘‘chance, bias, and confounding [can] be ruled
out with reasonable confidence’’ (US EPA, 2013a), yet there
is no guidance on what constitutes ‘‘reasonable confidence’’.
Based on the current framework, EPA cannot make that
determination reliably because there is no guidance for
assessing chance, bias, or confounding in a consistent manner.
Although such an assessment is inevitably a scientific
judgment, the basis for coming to and justifying the judgment
should be included in the evaluation. Moreover, an evaluation
of how confident one is in ruling out chance, bias and
confounding should include an examination of the evidence
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 841
for or against competing explanations of the results being
evaluated. This will help determine whether the results are
attributable to a causal effect of the agent or solely the results
of, or at least severely skewed by, the influence of chance,
bias or confounding. This evaluation should involve a
discussion of the possible scope for the chance, bias and
confounding, as well as the evidence for or against the actual
(rather than merely possible) influence of these factors on the
reported outcomes.
EPA suggests ‘‘controlled human exposure studies that
demonstrate consistent effects’’ constitute evidence for a
causal relationship (US EPA, 2013a), but it should indicate
that this is only true if the results are coherent with other lines
of evidence for particular exposure scenarios. EPA also
indicates that ‘‘observational studies that cannot be explained
by plausible alternatives’’ constitute evidence for a causal
relationship (US EPA, 2013a). Yet, EPA does not fully
explore alternative explanations for study results in practice,
as discussed below in the case study of the ozone ISA.
Currently, EPA sets forth a hypothesis (i.e. a criteria pollutant
causes a particular health effect) and determines whether the
data may support that hypothesis. As discussed above, EPA
does not, but should, fully explore whether and to what degree
the data support other hypotheses (e.g. a confounder, rather
than the criteria pollutant, causes a particular health effect). It
is only in this manner that alternative hypotheses can truly be
explored.
EPA states that evidence is sufficient to conclude a likely
causal relationship if ‘‘copollutant exposures are difficult to
address and/or other lines of evidence (controlled human
exposure, animal, or mode of action information) are limited
or inconsistent’’ or if ‘‘animal toxicological evidence from
multiple studies from different laboratories. . .demonstrate
effects, but limited or no human data are available’’ (US EPA,
2013a). EPA concludes evidence is suggestive of a causal
relationship if ‘‘at least one high-quality epidemiologic study
shows an association with a given health outcome but the
results of other studies are inconsistent’’ or if ‘‘a well-
conducted toxicological study, such as those conducted in the
National Toxicology Program (NTP), shows effects in animal
species’’ (US EPA, 2013a).
Any WoE evaluation, by definition, involves a consider-
ation of all lines of evidence in a consistent manner. It is not
about resolving all uncertainty but, rather, determining
whether the evidence as a whole supports causation more
than it supports a lack of effect. If co-pollutants cannot be
addressed or studies are inconsistent, the impact of the study
results is ambiguous and the WoE might equally well be
interpreted to be consistent with a lack of causality. In other
words, evaluating the base of data ‘‘as a whole’’ means that
one attends not simply to how much evidence can be adduced
to support (or to counter) the hypothesized causal effect, but
also how separate lines of evidence support (or contradict)
one another. How discrepancies are to be accounted for and
why apparent counterexamples to the generality of the
proposed causal process should not be regarded as refutations
of its applicability to the human risk question being evaluated
should also be assessed.
The NAAQS causal framework requires only one ‘‘high-
quality’’ study for evidence of causal relationship to be
deemed ‘‘suggestive’’. However, EPA does not provide
explicit guidance on what constitutes a high-quality study,
and, under this requirement, high-quality studies that are
inconsistent may not be considered as long as one high-
quality study exists that demonstrates an effect. Instead, all
studies should be reviewed using the same criteria, and only if
the WoE indicates that a causal association is more likely than
not, based on all the available data, should one conclude a
suggestive causal association. In a similar vein, because EPA
groups health outcomes into broad categories, one high-
quality study of a single endpoint (e.g. wheezing) can be used
as evidence for a suggestive causal relationship for a much
broader category that includes more severe outcomes (e.g.
respiratory effects).
An assessment of the adversity of health effects is included
in the NAAQS causal framework as a step to consider after
categorization of causal relationships, but, as noted above,
such an assessment can also be considered prior to making
judgments regarding causality (i.e. in Phase 3). Regardless, it
is key that EPA be more critical when defining an adverse
effect, and the assessment should differentiate between
homeostatic and biological changes that can adversely affect
health (Goodman et al., 2010). Another issue that is worth
pursuing in the NAAQS causal framework is the determin-
ation of the portion of an adverse health effect burden that
could be plausibly attributed to the particular criteria pollutant
under review. Even if the WoE indicates causality for a given
health effect, the estimated magnitude of the health burden
contributed by that pollutant may be very small (or even
negligible) in comparison to other causes of or contributors to
the same health effect. Failure to place the total burden of an
effect into its proper context is a step that is currently lacking
in regulatory decision-making.
Many of the issues noted above could be resolved by
updating EPA’s categories for causal determination (Table 4)
to be more consistent with the IOM framework (Table 3) on
which it was based originally, and we recommend such an
update to the NAAQS causal framework. This would not
only be in line with WoE best practices, but also would
improve the use of the causal determination categories for
risk communication. In the end, EPA should evaluate all of
the data in a consistent manner using well-specified criteria
and determine whether, as a whole, they constitute evidence
for causation or are more likely indicative of an alternative
explanation. It is by considering these alternatives that one
sees most directly how the evidence at hand relates to
possible conclusions.
Case study: Ozone ISA
The lack of explicit guidance in the NAAQS causal frame-
work has led to it being applied in a manner that does not
consistently meet its objectives to provide and communicate a
sound scientific basis for causality judgments. As a case
study, below, we provide several examples to demonstrate
how this lack of explicit guidance led to an overall unbalanced
evaluation of health effects in the ozone ISA, which yielded
causality conclusions that were not fully supported by the
evidence (US EPA, 2013a). The purpose of this case study is
not to indicate every instance in which the NAAQS causal
842 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
framework is applied inconsistently in the ISA, nor is the aim
to reargue evaluations of specific causality questions. Rather,
the purpose is to provide several concrete examples of the
systematic challenges that arise with the use of the NAAQS
causal framework in practice, as a way to indicate that the
general issues with the framework, that we describe above,
have real occurrences.
Phase 1
The NAAQS causal framework describes a general literature
search strategy and inclusion criteria for study selection, but
does not provide explicit guidance for this and does not
discuss exclusion criteria. EPA provides a database of the
studies considered for inclusion, noting which studies were
cited in the ozone ISA and which were not; however, it is
unclear why particular studies were excluded and if excluded
studies were relevant or informative.
For example, in the ozone ISA, EPA (2013a, p 2-2) states,
‘‘[l]iterature searches have been conducted routinely since
then to identify studies published since the last review,
focusing on studies published from 2005 (closing date for the
previous scientific assessment) through July 2011’’. EPA
included the study by Zanobetti & Schwartz (2011) in the
ozone ISA but omitted a study by Lipsett et al. (2011) that
was published online the same day (23 June 2011). EPA also
omitted a study by Spencer-Hwang et al. (2011), which was
published online on 21 July 2011. In addition, there were
several studies of both ozone and PM that were not included
in the ozone ISA but played a prominent role in EPA’s PM
evaluation (e.g. Jerrett et al., 2005; Miller et al., 2007). This
indicates that not all relevant studies were captured by the
literature search strategy used.
Study quality is a basis for study selection in the NAAQS
causal framework, but, as discussed above, studies of lower
quality could still provide information and should be included
and weighed in later phases of the evaluation. Some lower
quality studies that, despite their shortcomings, may have
contributed usefully to the WoE were likely excluded early on
from EPA’s analysis. The overall study selection process in
the ozone ISA builds on past analyses and public comments,
so it is possible that some relevant studies that may have been
excluded were considered later in the process. However,
without a systematic method for identifying and selecting
studies up front, the overall assessment may be biased.
Phase 2
Evaluation of study quality
The NAAQS causal framework does not describe a specific
method for evaluating the strengths and limitations of
individual studies and using these to determine the weight
of each study in the evaluation of causality. This led to an
inconsistent evaluation of individual studies throughout the
ozone ISA. As discussed below, there are many examples
in the ozone ISA where studies with positive associations
were emphasized over studies with null associations, as
opposed to studies of greater quality being emphasized over
those of lesser quality. This provided a false perception that
most of the reliable evidence supported a positive causal
association.
In the discussion of respiratory effects in adult day-hikers
in the ozone ISA, positive associations with lung function
decrements reported in one study (Korrick et al., 1998) were
emphasized over the null associations for the same endpoints
reported in another study (Girardot et al., 2006), and there
was no discussion of the strengths and limitations of either
study. In the evaluation of studies examining short-term ozone
exposure and cause-specific mortality, EPA stated that there is
evidence of associations with cardiovascular (CV)- and
respiratory-specific mortality, yet all risk estimates for
respiratory mortality and almost all for CV mortality were
null in analyses of these endpoints year-round; results were
mixed for both endpoints in analyses of the summer months,
when ambient ozone concentrations are highest.
In its summary of epidemiology data for short-term effects
of ozone on pulmonary inflammation and oxidative stress in
the ozone ISA, EPA stated that many recent studies reported
positive associations. Yet throughout its discussion, EPA
noted that the results were mixed and inconsistent. EPA did
not provide an explanation for why studies with positive
associations should carry more weight than those reporting
null associations. Finally, in the discussion of short-term
effects of ozone on respiratory symptoms, EPA stated there is
a ‘‘strong’’ body of evidence demonstrating associations
between ozone and increased respiratory symptoms (e.g.
wheezing) and asthma medication (e.g. bronchodilator) use in
asthmatic children, yet almost all risk estimates were null for
these outcomes. This also suggests that EPA’s evaluation of
the epidemiology data did not fully consider evidence from
studies with null results.
EPA’s evaluation of lung function decrements in asthmatic
children is another example of how the lack of methods for
evaluating the strengths and limitations of studies leads to an
overall inconsistent evaluation of individual studies across
pollutants. EPA relied on the study by Mortimer et al. (2002)
as a key study to indicate that increasing ozone concentrations
are associated with decreasing lung function in children with
asthma, as determined by self-reported peak expiratory flow
rate (PEFR) measurements (i.e. the maximum speed at which
an individual can breathe out air, a measure of the degree of
airway obstruction). By contrast, in the ISA for nitrogen
dioxide (US EPA, 2008a), EPA determined that the same
study was unreliable because of the use of self-reported PEFR
data, which has been demonstrated to have an unacceptably
high rate of error (Kamps et al., 2001).
If the NAAQS causal framework provided explicit guid-
ance on evaluating the strengths and limitations of each study,
it would allow for weight to be assigned to the studies in a
consistent manner based on quality. It could then be
determined whether the large number of studies with null
results carry equal or greater weight than the number with
positive results, strengthening the evidence against a causal
relationship, or whether the positive studies carry more
weight and EPA’s emphasis on these results is justified.
Consideration of study limitations
Measurement error. Exposure and outcome measurement
biases are potentially significant sources of uncertainty in
effect estimates derived from epidemiology studies of ozone
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 843
exposure. Most epidemiology studies rely on data from
central ambient monitoring sites to provide community-
average ambient ozone exposure concentrations (e.g. Gent
et al., 2003; Katsouyanni et al., 2009; Mortimer et al., 2002;
Naeher et al., 1999; Neas et al., 1999; Stieb et al., 2009), and
the interpretation of statistical associations is predicated upon
the assumption that these ambient measurements reflect
personal exposures. Studies have shown that large discrepan-
cies exist in the ambient/personal ozone relationships, how-
ever, and these can vary across cities (e.g. Sarnat et al., 2001,
2005). Thus, the use of fixed ambient monitoring data as
surrogates for personal ozone exposures can result in
exposure measurement error, which can bias the results of
an epidemiology analysis in either direction (Jurek et al.,
2005, 2008; Wacholder et al., 1995). Rhomberg et al. (2011b)
reviewed exposure measurement errors that prevail in envir-
onmental epidemiology studies, demonstrating that the degree
of bias known to apply to these studies is sufficient to produce
a false low-dose linear result. In the ozone ISA, EPA
acknowledged that exposure measurement error is a potential
source of variability in the ozone epidemiology study results,
but it did not give weight to the fact that it is also a potentially
large source of bias that could account for the small statistical
associations reported in many of the studies on which EPA
relies for evidence of causality.
Outcome measurement error results from inaccurate
reporting of health effects, and this was also not always
considered a limitation in the ozone ISA. For example, self-
measured and self-reported lung function measurements are
an important limitation of many ozone respiratory morbidity
studies. These measurements can be unreliable, particularly if
a population is young and/or not highly motivated to comply
and when a long time period of self-reporting is required.
Despite this limitation, EPA relied on a number of studies
using self-reported measures of lung function as evidence of
causality in the ozone ISA. In one key study on which EPA
relied for lung function effects in asthmatic children,
O’Connor et al. (2008) evaluated lung function changes that
were self-measured by the children; EPA did not acknowledge
the high frequency of unreliable values in this study. Some
other studies on which EPA relied also included self-reporting
of symptoms and, as a result, were subject to recall bias. For
example, in a study of children with asthma, Gent et al. (2003)
used a non-standard questionnaire (a daily calendar) and
relied on subjective symptom reporting by participants’
mothers, which could have biased the effect estimates.
Overall, in the ozone ISA, EPA did not give appropriate
weight to the potential bias from outcome measurement error.
Confounders. Confounding is a major limitation of a number
of key ozone respiratory morbidity studies that is not
adequately accounted for in the ozone ISA. For example,
Mortimer et al. (2002) reported associations between ozone
and asthma exacerbation in children in single-pollutant
models, but no statistically significant associations were
observed in multi-pollutant models. This indicates that the
effect may have been influenced partially or fully by other
pollutants. Korrick et al. (1998) also adjusted for several
confounding factors in their study of ozone effects on lung
function in adult hikers, showing that confounding by PM2.5
and aerosol acidity could not be ruled out. Similarly, in a
study by Neas et al. (1999) of associations between decreased
lung function in children and ozone exposure, effects were
attenuated in a co-pollutant model with sulfate. EPA tends to
downplay the effect of co-pollutants, and generally considers
only study results from single-pollutant models as positive
evidence for causality.
Temperature and other environmental factors can also
confound the relationship between ozone and respiratory
morbidity, and it is not clear if this is considered by EPA. For
example, the longitudinal study of children with asthma by
Gent et al. (2003) only considered same-day maximum
temperature in the statistical models, while meteorological
variables such as relative humidity may have been potential
confounders of respiratory symptoms. Air conditioning use
and exposure to tobacco smoke are also important potential
confounders of the associations with respiratory effects that
were not accounted for in some of the key studies on which
EPA relied for its causality judgments (Mortimer et al., 2002;
Stieb et al., 2009).
Confounding by co-pollutants, meteorology, and other
factors is also a major source of uncertainty in time-series
analyses of ozone-mortality associations. This issue is
complicated by the fact that air pollutants can be highly
correlated, and it is often dismissed because of this. Some
studies have reported risk estimates that appear to be robust to
inclusion of co-pollutants (Bell et al., 2007), but findings
from other studies affirm that this issue remains relevant and
important. Specifically, many studies have investigated con-
founding of PM up to 10 micrometers in diameter (PM10),
PM2.5 and some PM components, and found evidence of
confounding effects (Franklin & Schwartz, 2008; Katsouyanni
et al., 2009; Smith et al., 2009), and a few studies have shown
confounding effects of temperature (Smith et al., 2009). For
example, Smith et al. (2009) reported that inclusion of PM10
with ozone in a two-pollutant model resulted in a 22–33%
decrease in the ozone-mortality association. Similarly,
Franklin & Schwartz (2008) reported a 31% decrease in the
ozone-mortality risk estimate when sulfate PM was included
in the model. Smith et al. (2009) also considered alternative
models to control for meteorology that resulted in reduced
mortality estimates, suggestive of additional confounding by
temperature. EPA should consider both the studies that show
confounding as well as those that do not in its assessments.
Although confounding is discussed as part of the evalu-
ation of respiratory morbidity and mortality in the ozone ISA,
it does not appear to play a key role in EPA’s ultimate
conclusions regarding the causal relationship between ozone
and these outcomes.
Model specification. The ozone ISA noted that, in selecting
epidemiology studies, consideration was given to methodo-
logical issues related to the interpretation of the evidence
(including model selection), but this was often not the case.
The results of many studies, including meta-analyses of ozone
mortality time-series studies, have indicated that model
selection has a key role in the determination of results.
The significance of model selection was brought to bear in
2002 when National Morbidity, Mortality, and Air Pollution
Study (NMMAPS) researchers discovered an issue with the
844 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
way the commonly used statistical model, the Generalized
Additive Model (GAM), was implemented by the statistical
software, S-plus. Specifically, these issues led to an investi-
gation of how differences in the level of control for temporal
trends and weather can affect the magnitude of effect
estimates. When compared to other statistical models or
different criteria for controlling these important confounding
factors, the GAM model in S-plus was generating much larger
health effects estimates (HEI, 2003). Similarly, studies have
also shown that ozone mortality estimates can vary depending
on the degrees of freedom selected for smoothing long-term
trend (HEI, 2003; Ito et al., 2005; Katsouyanni et al., 2009).
In the Air Pollution and Health: a European and North
American Approach (APHENA) study, which included
datasets from US, Canadian, and European multi-city studies,
ozone mortality estimates were sensitive to the smoothing
function type applied (Katsouyanni et al., 2009). Despite
extensive sensitivity analyses comparing a number of differ-
ent models, Katsouyanni et al. (2009) were unable to identify
a model deemed most appropriate for comparing health effect
estimates across the different study locations they evaluated.
They reported large differences with penalized versus natural
splines, as results were negative when penalized splines were
used and positive when natural splines were used. In the
ozone ISA, EPA only presented the positive associations that
were reported from the use of natural splines, because
‘‘alternative spline models have been previously shown to
result in similar effect estimates’’. Although EPA provided a
justification for why it did not present the APHENA results
from both smoothing functions, this justification does not
make sense when the large APHENA study (upon which EPA
relied heavily in the ozone ISA for its causal determinations)
indicates that there is sensitivity of risk estimates to the type
of smoothing function used in the model.
These examples highlight the prevalence of model selec-
tion bias in the epidemiology literature. Despite findings from
several researchers that have shown the substantial impact of
model selection bias (e.g. Clyde, 2000; Ito, 2003; Koop &
Tole, 2004; Koop et al., 2010; Moolgavkar et al., 2013), EPA
has not systematically considered this issue in its causality
determinations for ozone.
Phase 3
Consideration of modified Bradford Hill criteria
The NAAQS causal framework indicates that modified
Bradford Hill criteria should be used to aid in causal
judgments, but it does not provide explicit guidance regarding
how these criteria should be applied to the available evidence
and how all criteria should be considered jointly.
Furthermore, several criteria were not fully considered by
EPA in the ozone ISA, including biological plausibility,
biological gradient and strength of association. Several
examples are described below.
EPA’s evaluation of lag times (i.e. the period of time
between the measurement of an exposure and an effect) is an
example of not fully considering biological plausibility. Many
epidemiology studies utilized statistical models that examined
health effects occurring at several time points after measured
ozone exposures. Extensive human clinical and mechanistic
data on ozone indicate that the respiratory effects of ozone
occur soon after exposure. Findings for 0- to 1-day lags or for
cumulative ozone exposure over a few days are most likely to
be biologically plausible, while findings for longer lags are
less likely to be so. Therefore, effects (or a lack thereof)
reported at the earlier time points should have carried more
weight than those reported at later times, but, in the ozone
ISA, effects at any time point were generally considered
evidence for causality.
EPA also often overlooked the criterion of biological
gradient (i.e. exposure–response relationship) in drawing
conclusions in the ozone ISA. For example, EPA noted that
there is no exposure–response relationship for lung function
changes across short-term exposure studies of outdoor
workers (Brauer et al., 1996; Chan & Wu, 2005; Hoppe
et al., 1995; Romieu et al., 1998). This provides evidence
against a causal association. Yet, in the ozone ISA’s summary
and causal conclusion for short-term respiratory effects, this
was not discussed, indicating it carried little, if any, weight in
EPA’s final evaluation.
As a final example, EPA did not fully consider the strength
of associations reported in the ozone epidemiology studies
and did not factor this into the overall judgments of causality
in the ozone ISA. EPA appears to have considered any risk
estimate above the null as supportive of a causal relationship,
regardless of the magnitude of the association. Many
epidemiology studies report very weak findings, and this is
true for studies of ozone. As reviewed by Taubes (1995),
epidemiologists generally consider risk estimates greater than
3–4 to reflect strong associations and to be supportive of a
causal link, while smaller risk estimates (1.5–3) are con-
sidered to be weak and require other lines of evidence to
demonstrate causality. In addition, many small, statistically
significant associations are found to be not statistically
significant when confounders are accounted for. Boffetta
et al. (2008) noted that the importance of unmeasured and
residual confounding as a source of bias has been sometimes
downplayed in the literature, but a recent statistical simulation
study showed that, with plausible assumptions, such con-
founding can generate effect sizes in the range of 1.5–2.0.
EPA relied on many studies with risk estimates even lower
than this. For example, in EPA’s evaluation of associations
between ozone exposure and respiratory symptoms in
children with asthma, 35 of 37 risk estimates above the null
that were presented as evidence of effects were between 1.01
and 1.39, with two additional risk estimates of 1.59 and 2.13.
EPA did not fully consider that the positive associations on
which it relied for causal associations are often weak and
within the range of magnitude that can be attributed to
confounding. If these weak associations are to be considered
as evidence of causality, consistent with WoE best practices,
EPA should also give equal weight to associations of equal
magnitude in the opposite direction (i.e. 0.72 to 0.99) as
evidence for a protective effect of ozone. As a protective
effect of ozone on the respiratory system is not plausible,
associations of this magnitude (either positive or negative) are
likely not associated with ozone exposure and instead can be
attributed to confounding or other factors.
The above examples indicate some of the specific criteria
that were not fully considered by EPA. It is also apparent that
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 845
not all of the criteria are considered jointly. For each health
effect, EPA should systematically consider each of the criteria
and determine where the WoE lies. While every criterion does
not have to be met per se, each needs to be considered.
For those that are not met, EPA should provide an explan-
ation of why that criterion does not affect the interpretation of
the results. With no clear guidance on how to apply the
criteria or how to consider them all jointly, EPA’s evaluations
of individual studies tend to be biased toward findings
of effect.
Weighing alternative views
The NAAQS causal framework indicates one should weigh
alternative views on controversial issues, but it does not
provide guidance on how to do this. The framework should be
more explicit in this regard, e.g. by noting that evidence for
and against competing explanations should be discussed and
clear illustrations of how one explanation compares to the
other should be provided. There are several instances in the
ozone ISA where EPA did not discuss or give due weight to
alternative views.
For example, EPA noted that consideration of the limita-
tions of epidemiology studies, such as potential confounding
and exposure measurement error, must be taken into account
to properly inform the interpretation of epidemiology evi-
dence. Yet, in its final evaluation, EPA did not consider that
another factor may have caused the health effects associated
with ozone in certain epidemiology studies. EPA did not
discuss the reasons why this view (i.e. ozone is not causal)
is less likely to be true than the view that ozone is the
causal factor.
In addition, EPA reviewed several recent controlled human
exposure studies of ozone and decrements in lung function
over a total of 6.6 hours of exposure (Adams, 2006; Schelegle
et al., 2009; Kim et al., 2011). The mean lung function
decrement for the subjects in each study after exposure to 60
parts per billion (ppb) ozone was only statistically significant
in the study by Kim et al. (2011). Because of concern that the
statistical test in the original publication was not sufficiently
powerful to detect an effect on lung function, EPA reanalyzed
the data for the final time point (but not at other interim
hourly time points) in the study by Adams (2006) using
several statistical methods that differed from those of the
study author. This reanalysis, reported by Brown (2007) and
Brown et al. (2008), indicated statistically significant lung
function decrements at 60 ppb ozone. Other investigators have
also reanalyzed the data from the study by Adams (2006)
using various statistical methods and reported no statistically
significant changes in lung function (Lefohn et al., 2010;
Nicolich, 2007). It has been argued that EPA’s approach
produced statistically significant results that can be attributed
to the majority of the data being selectively omitted from the
analysis (Goodman & Sax, 2012). EPA appeared to weigh the
findings of its own reanalyses over the original analyses
conducted by the study authors and reanalyses by other
investigators without fully considering the strengths and
limitations of each analytical method.
As a final example, EPA examined the concentration-
response (C-R) relationship between short-term ozone
exposure and mortality, but it did not weigh alternative
views regarding the shape of the C-R curve and the possible
existence of a threshold (i.e. the minimum ozone concentra-
tion necessary to produce an effect). It is well recognized that
epidemiology analyses such as those that investigate associ-
ations between mortality and air pollution exposures are
complicated by the noise in the observational data and
uncertainties in the epidemiology models, both of which may
mask a threshold. Despite the challenges in identifying ozone
thresholds in ozone-mortality evaluations, several studies
have applied various statistical techniques to evaluate the
nature of the ozone-mortality C-R function and found
evidence suggestive of a threshold for ozone-mortality effects
(Smith et al., 2009; Stylianou & Nicolich, 2009; Xia & Tong,
2006). Similarly, Jerrett et al. (2009) reported evidence of
improved model fit with a threshold model. EPA appeared to
weigh the view that there is no clear threshold over the
alternative view that a threshold exists, even though there is
mounting evidence to support the latter.
Overall, while EPA often discusses alternative views, it
does not appear to weigh them appropriately. To avoid this,
the NAAQS causal framework should provide explicit
guidance regarding how to discuss and compare the evidence
for and against competing explanations.
Phase 4
The NAAQS causal framework claims to rely heavily on the
consistency of results across studies in its categorization of
causal relationships between criteria pollutants and different
health effects, but, in practice, the consistency of results is not
evaluated fully. An example of this can be seen by comparing
the classification of causal relationships for ozone-related
mortality in the ozone ISA to those in the previous NAAQS
review of ozone, as summarized in the 2006 AQCD for ozone
(US EPA, 2006).
EPA’s causal determination for short-term ozone exposure
and total all-cause mortality was upgraded from ‘‘highly
suggestive’’ in the previous review to ‘‘likely causal’’ in the
ozone ISA. In the 2006 AQCD, a large body of epidemiology
studies, including single- and multi-city studies, was eval-
uated. Only a limited number of additional studies have been
published since then, including several that are follow-up
studies or reanalyses of previous studies. Importantly, EPA
stated in the ozone ISA that the new evidence is supportive of
a robust association between short-term ozone exposures and
mortality, but the studies showed very small pooled mortality
effects that are inconsistent across studies. Furthermore,
multi-city studies have demonstrated that there is substantial
heterogeneity in city-specific findings, which are dominated
by ozone associations that are not statistically significant and
close to null, or even negative (i.e. indicating reduced
mortality with increased ozone exposure). The pooled
‘‘national’’ ozone-mortality effect estimates are therefore
misleading, as they do not reflect this heterogeneity.
In addition, EPA did not fully consider many of the important
uncertainties that are still present in the new studies, including
confounding, model selection, appropriate lag times, non-
threshold assumptions in the C-R relationship, and exposure
measurement error.
846 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
With respect to long-term ozone exposure and mortality,
EPA relied on very limited new evidence since the last ozone
review in upgrading its classification from ‘‘insufficient’’ to
‘‘suggestive’’ evidence of a causal relationship in the ozone
ISA. Specifically, EPA appears to have relied solely on one
new study that provided some findings suggesting an
increased risk of respiratory-related mortality for long-term
ozone exposure (but no consistent associations for all-cause
mortality) (Jerrett et al., 2009), and did not fully consider
other key studies that suggest no effects of ozone on total and
cause-specific mortality or counterintuitive geographical
heterogeneity in mortality risks (e.g. Abbey et al., 1999;
Dockery et al., 1993; Jerrett et al., 2005; Lipfert et al., 2000;
Miller et al., 2007). In addition, new studies published since
the last review support a lack of association between long-
term ozone exposure and mortality from specific causes,
including respiratory causes (e.g. Lipsett et al., 2011). This
example illustrates that requiring only one positive (which
may be high-quality, but still may have issues) study to
provide ‘‘suggestive’’ evidence can yield conclusions toward
a causal determination even if the majority of the evidence
does not support this.
Because the NAAQS causal framework does not fully
evaluate the consistency of results across studies or fully
incorporate other Bradford Hill criteria such as coherence,
biological plausibility, biological gradient and strength of
association, the resulting causality categorization is not
congruent with judgments based on these criteria. To address
this, EPA could redefine these categories so that they more
accurately reflect the WoE. Further, in harmony with best
practices, EPA should evaluate all of the data in a consistent
manner, using well-specified criteria that can be applied
consistently across evaluations over time, to determine
whether they constitute evidence for causation.
Conclusions
The current NAAQS framework for causal determination has
several valuable features, but there are certain elements that
are missing or not stated explicitly. To improve the NAAQS
framework, EPA could add explicit guidance regarding study
selection criteria. These criteria should include specifying
that no studies be excluded based on their quality or results;
rather, weak studies should be given less weight in the final
analysis. Also, to reduce bias, analyses conducted by EPA
should not be included unless they are peer reviewed. In
addition, guidance for data extraction, clearly stating that all
data relevant to the causal question should be extracted from
each study and not just a subset of the data, should be added
to the NAAQS framework. In terms of evaluating study
quality and results, this should be done in the same manner
across all studies to ensure that the assessment is thorough
and consistent. Data suggesting a lack of biological plausi-
bility or a lack of an exposure–response relationship for a
specific effect should be considered as evidence against
causation.
The NAAQS causal framework should provide guidance
on how the Bradford Hill criteria should be applied to the
available evidence and how all criteria should be considered
jointly; the lack of such guidance in the current framework
can lead to an assessment that unduly favors a causal
determination. Indeed, in following the current framework,
EPA sets forth a hypothesis and determines whether the data
support that hypothesis. The framework should include
guidance regarding the weighing of alternative hypotheses,
explicitly directing EPA to fully explore whether and to what
degree the data support other hypotheses. The current
framework’s method of integrating judgments for different
realms of evidence at the end the evaluation also does not
allow data from one realm of evidence to inform the
interpretation of the others. The framework should state
explicitly that the data evaluation be integrated across all
realms of evidence to provide insight into potential biological
processes underlying the effects under consideration
for causality.
Lastly, there should be guidance for causality categoriza-
tion that is congruent with the judgments based on the
Bradford Hill criteria, and the use of categories for causal
determination should be based on a consistent evaluation of
all of the data using well-specified criteria for causation.
Although the current NAAQS causal framework includes
several of the elements discussed above, it does not
always contain explicit guidance for them, and this can
lead to conclusions of causality that are not fully supported
by the evidence.
We provided some specific examples of how the limita-
tions of the current NAAQS causal framework led to an
evaluation of ozone-related health effects that was not always
consistent and yielded causality conclusions that were not
fully supported by the evidence. This is also apparent in other
assessments that used the NAAQS causal framework. For
example, in a 9 December 2011, letter to EPA, the Clean Air
Scientific Advisory Committee review panel for the first draft
ISA for lead recommended a ‘‘more rigorous and transparent
‘weight of the evidence’ analysis’’ that should ‘‘devote more
attention to the limitations of the existing studies with respect
to consistency, reproducibility, bias, control for confounders,
and shortcomings in statistical methodology’’ (Frey & Samet,
2011). In addition, many of the same issues were also
identified in EPA’s draft IRIS assessment of formaldehyde,
prompting the NAS panel that reviewed that assessment to
propose a roadmap for improving the risk assessment process
and to recommend that EPA use existing models of systematic
WoE approaches for hazard identification.
If EPA undertakes a program to revise the NAAQS causal
framework to be more consistent with the NAS roadmap, it is
our hope that the suggestions in this paper will be considered
in the process. The NAAQS causal framework could be
revised so that it is consistent with WoE best practices, and
the causal determination categories should be updated to
reflect these revisions. The full and consistent application of
the revised framework is needed to ensure that future
assessments of the potential health effects of criteria air
pollutants will be thorough, transparent, and scientifically
sound.
Acknowledgements
The authors thank the anonymous reviewers whose comments
helped to improve this paper.
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 847
Declaration of interest
The authors are employed by Gradient, a private environ-
mental consulting firm. The Gradient staff has strong
expertise in assessing human, experimental animal, and
mechanistic data in WoE analyses (as is evident in recent
evaluations conducted for bisphenol A, naphthalene, formal-
dehyde, chlorpyrifos, methanol, styrene, nickel, and toluene
diisocyanate) and has presented several of these analyses to
regulatory bodies. In addition, the Gradient staff, including
the authors of this paper, have carefully evaluated the science
underlying EPA’s review of various NAAQS and offered both
oral and written testimony to EPA. Gradient has also
addressed issues on systematic review and integration of
evidence for a number of clients. The work reported in this
paper was conducted by the authors during the normal course
of employment at Gradient with financial support provided by
the American Petroleum Institute (API). API is the major
trade association of corporations in the petroleum sector, from
discovery through production and refining. Drafts of this
paper were reviewed by members or affiliates of API. The
authors have the sole responsibility for the writing, contents,
and conclusions in this paper. The conclusions are not
necessarily those of API.
References
Abbey DE, Nishino N, McDonnell WF, et al. (1999). Long-terminhalable particles and other air pollutants related to mortality innonsmokers. Am J Respir Crit Care Med, 159, 373–82.
Adami HO Berry SC, Breckenridge C, et al. (2011). Toxicology andepidemiology: Improving the science with a framework for combiningtoxicological and epidemiological evidence to establish causalinference. Toxicol Sci, 122, 223–34.
Adams WC. (2006). Comparison of chamber 6.6-h exposures to 0.04–0.08 ppm ozone via square-wave and triangular profiles on pulmonaryresponses. Inhal Toxicol, 18, 127–36.
Bachmann J. (2007). Will the circle be unbroken: a history of the USNational Ambient Air Quality Standards. J Air Waste Manage Assoc,57, 652–97.
Bell ML, Kim JY, Dominici F. (2007). Potential confounding ofparticulate matter on the short-term association between ozone andmortality in multisite time-series studies. Environ Health Perspect,115, 1591–5.
Boffetta P, McLaughlin JK, La Vecchia C, et al. (2008). False-positiveresults in cancer epidemiology: a plea for epistemological modesty.J Natl Cancer Inst, 100, 988–95.
Borgert CJ, Mihaich EM, Ortego LS, et al. (2011). Hypothesis-drivenweight of evidence framework for evaluating data within the US EPA’sEndocrine Disruptor Screening Program. Regul Toxicol Pharmacol,61, 185–91.
Brauer M, Blair J, Vedal S. (1996). Effect of ambient ozone exposure onlung function in farm workers. Am J Respir Crit Care Med, 154,981–7.
Brown JS, Bateson TF, McDonnell WF. (2008). Effects of exposure to0.06 ppm ozone on FEV1 in humans: a secondary analysis of existingdata. Environ Health Perspect, 116, 1023–6.
Brown JS. [US EPA, National Center for Environmental Assessment(NCEA)]. (2007). Memo to Ozone NAAQS Review Docket (OAR-2005-0172) re: the effects of ozone on lung function at 0.06 ppm inhealthy adults. 8p., June 14.
Chan CC, Wu TH. (2005). Effects of ambient ozone exposure on mailcarriers’ peak expiratory flow rates. Environ Health Perspect, 113,735–8.
Clyde M. (2000). Model uncertainty and health effect studies forparticulate matter. Environmetrics, 11, 745–63.
Dockery DW, Pope CA, Xu X, et al. (1993). An association between airpollution and mortality in six U.S. cities. N Engl J Med, 329, 1753–9.
European Centre for Ecotoxicology and Toxicology of Chemicals(ECETOC). (2009). Framework of the Integration of Human and
Animal Data in Chemical Risk Assessment. ECETOC TechnicalReport No. 104. 130p., January.
Franklin M, Schwartz J. (2008). The impact of secondary particles on theassociation between ambient ozone and mortality. Environ HealthPerspect, 116, 453–8.
Frey HC, Samet JM. (2011). Letter to L. Jackson re: CASAC Review ofEPA’s ‘‘Integrated Science Assessment for Lead (First ExternalReview Draft – May 2011)’’. Clean Air Scientific AdvisoryCommittee (CASAC). 120p., December 9. [Online] Available from:http://yosemite.epa.gov/sab/sabproduct.nsf/0/D3E2E8488025344D852579610068A8A1/$File/EPA-CASAC-12-002-unsigned.pdf [lastaccessed 29 June 2012].
Gent JF, Triche EW, Holford TR, et al. (2003). Association of low-levelozone and fine particles with respiratory symptoms in children withasthma. JAMA, 290, 1859–67.
Girardot SP, Ryan PB, Smith SM, et al. (2006). Ozone and PM2.5exposure and acute pulmonary health effects: a study of hikers in theGreat Smoky Mountains National Park. Environ Health Perspect, 114,1044–52.
Goodman JE, Dodge DG, Bailey LA. (2010). A framework for assessingcausality and adverse effects in humans with a case study of sulfurdioxide. Regul Toxicol Pharmacol, 58, 308–22.
Goodman JE, Sax SN. (2012). Comments on the Integrated ScienceAssessment for Ozone and Related Photochemical Oxidants (ThirdExternal Review Draft). 125p. Submitted to EPA on 15 August 2012.Docket ID: EPA-HQ-ORD-2011-0050.
Health Effects Institute (HEI). 2003. Revised analyses of time-seriesstudies of air pollution and health. 306p., May.
Higgins JPT, Green S. (2011). Cochrane handbook for systematicreviews of interventions (Version 5.1.0). The Cochrane Collaboration.March. Available from: http://www.cochrane-handbook.org/ [lastaccessed 03 Feb 2012].
Hill AB. (1965). The environment and disease: association or causation?Proc R Soc Med, 58, 295–300.
Hoppe P, Praml G, Rabe G, et al. (1995). Environmental ozone fieldstudy on pulmonary and subjective responses of assumed risk groups.Environ Res, 71, 109–21.
Institute of Medicine (IOM). (2008). Improving the presumptivedisability decision-making process for veterans. Committee onEvaluation of the Presumptive Disability Decision-Making Processfor Veterans, Board on Military and Veterans Health, NationalAcademies Press. 781p.
International Agency for Research on Cancer (IARC). (2006). Preambleto the IARC monographs on the evaluation of carcinogenic risks tohumans. 27p. Available from: http://monographs.iarc.fr/ENG/Preamble/CurrentPreamble.pdf [last accessed 20 Sep 2012].
Ito K. (2003). Associations of particulate matter components with dailymortality and morbidity in Detroit, Michigan. In: Revised analyses oftime-series studies of air pollution and health. Boston, MA: HealthEffects Institute, 143–156.
Ito K, De Leon SF, Lippmann M. (2005). Associations between ozoneand daily mortality: analysis and meta-analysis. Epidemiology, 16,446–57.
Jerrett M, Burnett RT, Pope CA, et al. (2009). Long-term ozone exposureand mortality. N Engl J Med, 360, 1085–95.
Jerrett M, Burnett RT, Ma R, et al. (2005). Spatial analysis of airpollution and mortality in Los Angeles. Epidemiology, 16, 727–36.
Jurek AM, Greenland S, Maldonado G. (2008). How far from non-differential does exposure or disease misclassification have to be tobias measures of association away from the null? Int J Epidemiol, 37,382–5.
Jurek AM, Greenland S, Maldonado G, Church TR. (2005). Properinterpretation of non-differential misclassification effects: expect-ations vs. observations. Int J Epidemiol, 34, 680–7.
Kamps AW, Roorda RJ, Brand PL. (2001). Peak flow diaries inchildhood asthma are unreliable. Thorax, 56, 180–2.
Katsouyanni K, Samet JM, Anderson HR, et al. (2009). Air pollutionand health: a European and North American Approach (APHENA).HEI Research Report 142. 132p., October 29.
Kim CS, Alexis NE, Rappold AG, et al. (2011). Lung function andinflammatory responses in healthy young adults exposed to 0.06 ppmozone for 6.6 hours. Am J Respir Crit Care Med, 183, 1215–21.
Klimisch HJ, Andreae M, Tillmann U. (1997). A systematic approach forevaluating the quality of experimental toxicological and ecotoxico-logical data. Regul Toxicol Pharmacol, 25, 1–5.
848 J. E. Goodman et al. Crit Rev Toxicol, 2013; 43(10): 829–849
Koop G, McKitrick R, Tole L. (2010). Air pollution, economic activityand respiratory illness: evidence from Canadian cities, 1974–1994.Environ Model Softw, 25, 873–85.
Koop G, Tole L. (2004). Measuring the health effects of air pollution: towhat extent can we really say that people are dying from bad air?J Environ Econ Manage, 47, 30–54.
Korrick SA, Neas LM, Dockery DW, et al. (1998). Effects of ozone andother pollutants on the pulmonary function of adult hikers. EnvironHealth Perspect, 106, 93–9.
Lavelle KS, Schnatter AR, Travis KZ, et al. (2012). Framework forintegrating human and animal data in chemical risk assessment. RegulToxicol Pharmacol, 62, 302–12.
Lefohn AS, Hazucha MJ, Shadwick D, Adams WC. (2010). Analternative form and level of the human health ozone standard. InhalToxicol, 22, 999–1011.
Lipfert FW, Perry HM, Miller JP, et al. (2000). The WashingtonUniversity-EPRI Veterans’ Cohort Mortality Study: preliminaryresults. Inhal Toxicol, 12, 41–73.
Lipsett MJ, Ostro BD, Reynolds P, et al. (2011). Long-term exposure toair pollution and cardiorespiratory disease in the California teachersstudy cohort. Am J Respir Crit Care Med, 184, 828–35.
McClellan RO. (2012). Role of science and judgment in setting nationalambient air quality standards: how low is low enough? Air QualAtmos Health, 5, 243–58.
Miller KA, Siscovick DS, Sheppard L, et al. (2007). Long-term exposureto air pollution and incidence of cardiovascular events in women.N Engl J Med, 356, 447–58.
Moolgavkar SH, McClellan RO, Dewanji A, et al. (2013). Time-seriesanalyses of air pollution and mortality in the United States: asubsampling approach. Environ Health Perspect, 121, 73–8.
Mortimer KM, Neas LM, Dockery DW, et al. (2002). The effect of airpollution on inner-city children with asthma. Eur Respir J, 19,699–705.
Naeher LP, Holford TR, Beckett WS, et al. (1999). Healthy women’sPEF variations with ambient summer concentrations of PM10, PM2.5,SO42-, Hþ, and O3. Am J Respir Crit Care Med, 160, 117–25.
National Research Council (NRC). (2011). Review of the EnvironmentalProtection Agency’s Draft IRIS Assessment of Formaldehyde.National Academies Press. 194p., April. Available from: http://www.nap.edu/catalog/13142.html [last accessed 08 April 2011].
Neas LM, Dockery DW, Koutrakis P, Speizer, FE. (1999). Fine particlesand peak flow in children: Acidity versus mass. Epidemiology, 10,550–3.
Nicolich M. (2007). Letter Report to US EPA, Air and Radiation Docket.Re: Comments on EPA’s Proposed ‘‘National Ambient Air QualityStandards for Ozone’’, 72 Fed Reg. 37, 818 (11 July 2007). DocketNo. EPA-HQ-OAR-2005-0172. October 9.
O’Connor GT, Neas L, Vaughn B, et al. (2008). Acute respiratory healtheffects of air pollution on children with asthma in US inner cities.J Allergy Clin Immunol, 121, 1133–9.
Rhomberg LR, Bailey LA, Goodman JE, et al. (2011a). Is exposureto formaldehyde in air causally associated with leukemia? – ahypothesis-based weight-of-evidence analysis. Crit Rev Toxicol, 41,555–621.
Rhomberg LR, Bailey LA, Goodman JE. (2010). Hypothesis-basedweight of evidence: a tool for evaluating and communicatinguncertainties and inconsistencies in the large body of evidence inproposing a carcinogenic mode of action – naphthalene as an example.Crit Rev Toxicol, 40, 671–96.
Rhomberg LR, Chandalia JK, Long CM, Goodman JE. (2011b).Measurement error in environmental epidemiology and the shape ofexposure–response curves. Crit Rev Toxicol, 41, 651–71.
Rhomberg LR, Goodman JE, Bailey LA, et al. (2013). A surveyof frameworks for best practices in weight-of-evidence analyses.Crit Rev Toxicol. DOI: 10.3109/10408444.2013.832727.
Romieu I, Meneses F, Ramirez M, et al. (1998). Antioxidant supple-mentation and respiratory functions among workers exposed to highlevels of ozone. Am J Respir Crit Care Med, 158, 226–32.
Sarnat JA, Brown KW, Schwartz J, et al. (2005). Ambientgas concentrations and personal particulate matter exposures: impli-cations for studying the health effects of particles. Epidemiology, 16,385–95.
Sarnat JA, Schwartz J, Catalano PJ, Suh HH. (2001). Gaseous pollutantsin particulate matter epidemiology: confounders or surrogates?Environ Health Perspect, 109, 1053–61.
Schelegle ES, Morales CA, Walby WF, et al. (2009). 6.6-Hour inhalationof ozone concentrations from 60 to 87 parts per billion in healthyhumans. Am J Respir Crit Care Med, 180, 265–72.
Smith RL, Xu B, Switzer P. (2009). Reassessing the relationship betweenozone and short-term mortality in U.S. urban communities. InhalToxicol, 21, 37–61.
Spencer-Hwang R, Knutsen SF, Soret S, et al. (2011). Ambient airpollutants and risk of fatal coronary heart disease among kidneytransplant recipients. Am J Kidney Dis, 58, 608–16.
Stieb DM, Szyszkowicz M, Rowe BH, Leech JA. (2009). Air pollutionand emergency department visits for cardiac and respiratory condi-tions: A multi-city time-series analysis. Environ Health, 8, 25.
Stylianou M, Nicolich MJ. (2009). Cumulative effects and thresholdlevels in air pollution mortality: Data analysis of nine large US citiesusing the NMMAPS dataset. Environ Pollut, 157, 2216–23.
Swaen G, van Amelsvoort L. (2009). A weight of evidence approach tocausal inference. J Clin Epidemiol, 62, 270–7.
Taubes G. (1995). Epidemiology faces its limits. Science, 269,164–9.
US EPA. (2005). Guidelines for carcinogen risk assessment. RiskAssessment Forum, EPA/630/P-03/001F. March.
US EPA. (2006). Air quality criteria for ozone and related photochemicaloxidants. Volume I. EPA 600/R-05/004aF. 821p., February.
US EPA. (2008a). Integrated science assessment for oxides of nitrogen.EPA/600/R-08/071. 260p., July.
US EPA. (2008b). Integrated science assessment for sulfur oxides –health criteria. EPA/600/R-08/047F. September.
US EPA. (2009). Integrated science assessment for particulate matter(final). EPA/600/R-08/139F. 2228p., December.
US EPA. (2010a). Integrated science assessment for carbon monoxide.EPA/600/R-09/019F. 593p., January.
US EPA. (2010b). Toxicological review of formaldehyde – inhalationassessment (CAS No. 50-00-0) in support of summary information onthe integrated risk information system (IRIS). Volumes I–IV (Draft).EPA/635/R-10/002A. 1043p., June.
US EPA. (2012). Integrated science assessment for lead (third externalreview draft). EPA/600/R-10/075C. 1762p., November.
US EPA. (2013a). Integrated science assessment for ozone and relatedphotochemical oxidants (final). EPA/600/R–10/076F. 1251p.,February.
US EPA. (2013b). Process of reviewing the national ambient air qualitystandards. 2p., June. Available from: http://www.epa.gov/ttn/naaqs/review.html [last accessed 19 Aug 2013].
Wacholder S, Hartge P, Lubin JH, Dosemeci M. (1995). Non-differentialmisclassification and bias towards the null: a clarification [letter to theeditor]. Occup Environ Med, 52, 557.
Xia Y, Tong H. (2006). Cumulative effects of air pollution on publichealth. Stat Med, 25, 3548–59.
Zanobetti A, Schwartz J. (2008). Is there adaptation in the ozonemortality relationship: a multi-city case-crossover analysis. EnvironHealth, 7, 22.
Zanobetti A, Schwartz J. (2011). Ozone and survival in four cohorts withpotentially predisposing diseases. Am J Respir Crit Care Med, 184,836–41.
DOI: 10.3109/10408444.2013.837864 NAAQS Causal Framework Evaluation 849
Copyright of Critical Reviews in Toxicology is the property of Taylor & Francis Ltd and itscontent may not be copied or emailed to multiple sites or posted to a listserv without thecopyright holder's express written permission. However, users may print, download, or emailarticles for individual use.