Assessment of the High-Resolution Rapid Refresh Model’s Ability to PredictMesoscale Convective Systems Using Object-Based Evaluation
JAMES O. PINTO, JOSEPH A. GRIM, AND MATTHIAS STEINER
Research Applications Laboratory, National Center for Atmospheric Research,* Colorado
(Manuscript received 24 September 2014, in final form 23 April 2015)
ABSTRACT
An object-based verification technique that keys off the radar-retrieved vertically integrated liquid (VIL) is
used to evaluate how well the High-Resolution Rapid Refresh (HRRR) predicted mesoscale convective
systems (MCSs) in 2012 and 2013. It is found that the modeled radar VIL values are roughly 50% lower than
observed. This mean bias is accounted for by reducing the radar VIL threshold used to identify MCSs in the
HRRR. This allows for a more fair evaluation of the model’s skill at predicting MCSs. Using an optimized
VIL threshold for each summer, it is found that the HRRR reproduces the first (i.e., counts) and second
moments (i.e., size distribution) of the observed MCS size distribution averaged over the eastern United
States, as well as their aspect ratio, orientation, and diurnal variations. Despite threshold optimization, the
HRRR tended to predict too many (few) MCSs at lead times less (greater) than 4 h because of lead time–
dependent biases in the modeled radar VIL. TheHRRR predicted too manyMCSs over the Great Plains and
too few MCSs over the southeastern United States during the day. These biases are related to the model’s
tendency to initiate too many MCSs over the Great Plains and too few MCSs over the southeastern United
States. Additional low biases found over the Mississippi River valley region at night revealed a tendency for
the HRRR to dissipateMCSs too quickly. The skill of the HRRR at predicting specificMCS events increased
between 2012 and 2013, coinciding with changes in both the model physics and in the methods used to as-
similate the three-dimensional radar reflectivity.
1. Introduction
The improved prediction of mesoscale convective
systems (MCSs) can enhance public safety and improve
the decision-making of a number of industries (such as
aviation, construction, utility and road maintenance,
and outdoor recreation) that require timely and accu-
rate forecasts of high-impact weather. Many of these
industries could use more specific forecast information
regarding the potential properties of convective storms.
For example, aviation planners need information re-
lated to the size, orientation, and organization of con-
vective areas to more efficiently route air traffic across
the United States (e.g., Krozel et al. 2007; Steiner et al.
2010). In addition, understanding the performance
characteristics of high-resolution ensemble forecasts of
convection and their intrinsic properties is an important
component of the National Weather Service’s (NWS)
warn-on-forecast vision (Stensrud et al. 2009, 2013).
High-resolution models continue to evolve as
computational resources continue to increase, higher-
resolution observations become available, and the
representation of physical processes and data assimila-
tion techniques continue to improve. As model resolu-
tion continues to increase, more detail with regard to the
forecasted storm properties becomes available. Un-
derstanding how well storm features, such as their size
and organization, are reproduced is key for continued
model development efforts and for the interpretation of
model forecasts.
During the late 1980s, feature identification tech-
niques were developed and applied to observational
datasets. Augustine and Howard (1988) developed a
method for identifying MCSs in satellite data and others
have done the same for radar (e.g., Dixon and Wiener
1993; Jirak et al. 2003). More recently, feature-based
* The National Center for Atmospheric Research is sponsored
by the National Science Foundation.
Corresponding author address: James O. Pinto, Research Ap-
plications Laboratory, National Center for Atmospheric Research,
3450 Mitchell Ln., Boulder, CO 80301.
E-mail: [email protected]
892 WEATHER AND FORECAST ING VOLUME 30
DOI: 10.1175/WAF-D-14-00118.1
� 2015 American Meteorological Society
techniques have been used in the evaluation of numer-
ical models. Some of the first automated, feature-based,
model evaluation studies were performed by Ebert and
McBride (2000), with a proliferation of various im-
plementations since, including the evaluation of high-
resolution deterministic forecasts (e.g., Baldwin and
Lakshmivarahan 2003; Davis et al. 2006; Lack et al.
2010; Ebert and Gallus 2009; Caine et al. 2013) and,
more recently, high-resolutionmodel ensembles (Gallus
2010; Johnson and Wang 2013; Johnson et al. 2013).
The time dimension is less frequently included in
feature-based forecast verification methods. Davis et al.
(2006) used feature-based verification software called
the Method for Object-Based Diagnostic Evaluation
(MODE) to assess how well the Weather Research
and Forecasting (WRF) Model, version 1.3 (WRF1.3;
Skamarock and Klemp 2008), predicts the spatiotem-
poral evolution of rain systems exceeding 400km2 in
area. They found that, generally, WRF1.3 produced too
many large storms (maximum dimension L . 120 km)
and that the predicted storms lasted longer than ob-
served. They also found evidence that storm timing
(given by the difference in modeled and observed
storm centroid times) was delayed in themodel, possibly
because of their slow formation. Using an object-
based technique, Burghardt et al. (2014) found that
a subkilometer-resolution implementation of WRF, ver-
sion 3.3 (WRF3.3), tended to initiate too many storms
over the high plains in an area of complex terrain. Clark
et al. (2014) evaluated the ability of four members of the
Center for Analysis and Prediction of Storms (CAPS)
high-resolution ensemble with varying treatments of
microphysics to predict the timing and evolution of rain
systems using a modified version of MODE that in-
cluded the time dimension. Their results demonstrated
that predictions of the onset and dissipation of rain ob-
jects (including both mesoscale convective systems and
smaller, less-organized areas of convection) were highly
sensitive to the treatment of cloud and precipitation
physics.
In the present study, storm-tracking software de-
veloped for radar analysis and forecasting has been
modified to evaluate model skill at predicting the oc-
currence, characteristics, and timing (especially the on-
set) of MCSs. A key aspect of this work, which is distinct
from most other object-based model evaluations, is that
we adjust the threshold used to identify storm objects in
the modeled vertically integrated liquid (VIL) field to
remove mean bias prior to performing the model eval-
uation. We also extend recent object-based verification
work by assessing regional variations in model perfor-
mance. The datasets used in this study are described in
section 2. The object-based verification technique is
described in section 3, with results of its application on
radar data given in section 4. In section 5, we describe a
method in which the threshold used to identify MCSs in
the model is optimized in order to more fairly evaluate
the skill of a high-resolution numerical weather pre-
diction model whose performance is charted over two
summers. Finally, the results are summarized and put
into context in section 6.
2. Datasets
a. Radar dataset description
Vertically integrated liquid derived from radar data is
often used by NWS meteorologists to detect areas of
intense, possibly hail-containing convection across the
country (e.g., Kitzmiller et al. 1995; Billet et al. 1997).
Vertically integrated liquid is also used by aviation
meteorologists and aviation planners to identify areas of
convection that are hazardous for flight. In this study, we
use VIL data from the Corridor Integrated Weather
System (CIWS; Crowe andMiller 1999; Evans andDucot
2006), which was developed for tactical decision-making
by the aviation industry. The VIL field developed for
CIWS is generated following the method described by
Smalley and Bennett (2002), in which the reflectivity
from each sweep of each available radar is first quality
controlled and then converted to VIL, using the em-
pirical relationship developed by Greene and Clark
(1972):
VIL5 3:443 1026 �N
i50
0:5(Zei 1Zei11)4/7 dhi , (1)
where Ze is the equivalent reflectivity factor, dh is the
depth of each radar volume, and N is the number of
radar volumes. Unlike other VIL products [e.g., the
multiradar/multisensor (MRMS) method uses a maxi-
mum reflectivity of 56 dBZ in the VIL calculation], the
layer mean reflectivity is not capped at any value. The
two-dimensional VIL field obtained from each radar is
then combined by advecting VIL to a common time and
then taking the maximum plausible VIL value in areas
where the radar swaths overlap. The VIL map includes
radar data collected from each of the WSR-88D net-
work radars (134 NEXRADs) operated by the NWS, 11
Terminal Doppler Weather Radars (TDWRs) operated
by the Federal AviationAdministration (FAA), and five
Canadian C-band (5 cm) radars north of the Canadian–
U.S. border that are operated by Environment Canada.
The resulting VIL dataset, which spans the entire
United States and the southern Canadian provinces, and
extends roughly 100km offshore, has a horizontal
AUGUST 2015 P I NTO ET AL . 893
resolution of 1 km, and is updated every 5min. For this
study, only hourly data are used. In addition, only data
collected over the United States are used in our analy-
ses, because the Canadian radars tend to attenuate
rapidly in the heavy precipitation events that are of in-
terest in this study. Gaps in radar coverage over the
western United States limit the utility of the observa-
tional dataset in this region; thus, much of the analysis
herein focuses on the eastern two-thirds of the country.
b. Model dataset description
The High-Resolution Rapid Refresh (HRRR) model
has been under development at the National Oceanic
and Atmospheric Administration/Earth System Re-
search Laboratory/Global Systems Division (NOAA/
ESRL/GSD) since 2008 (Benjamin et al. 2009, 2011).
Model outputs used in this study are from the summer
months [June, July, andAugust (JJA)] of 2012 and 2013.
The HRRR was run as part of the FAA’s summer
evaluation (Alexander et al. 2011), serving as the model
forecast input field for Consolidated Storm Prediction
for Aviation (CoSPA; Pinto et al. 2010) forecasts. The
HRRRwas run once per hour at a convection-permitting
grid spacing of 3km. While this grid spacing is too coarse
to resolve all scales of motion that are important in the
development, organization, and evolution of convective
storms (Bryan et al. 2003), it is believed that the larger-
scale convective features of MCSs are well resolved
when using this grid spacing (e.g., Weisman et al. 1997;
Schwartz et al. 2009).
The data assimilation and physics of the HRRR and
its parent model were modified between 2012 and 2013.
The model configurations utilized during these two
years are listed in Table 1. The 2013 configuration of the
HRRR became operational at NCEP in September
2014. During both years, the HRRR was initialized and
driven at the lateral boundaries using version 2of the 13-km
Rapid Refresh (RAP; Weygandt et al. 2011) model run
in experimental mode at NOAA/ESRL. Both the RAP
model and the HRRR use Gridpoint Statistical Inter-
polation analysis system (GSI) based three-dimensional
variational data assimilation (3DVAR) to assimilate a
wealth of observational data (Benjamin et al. 2013).
However, two changes expected to have a positive im-
pact on the simulation of convection were made be-
tween 2012 and 2013. First, the PBL scheme was changed
fromMellor–Yamada–Janji�c (MYJ) in 2012 to Mellor–
Yamada–Nakanishi–Niino (MYNN) in 2013. This
change was expected to improve the simulation of the
boundary layer, as found by Coniglio et al. (2013) for
simulations with convection-permitting resolution.
Second, three-dimensional radar reflectivity data,1 which
were only assimilated into RAP in 2012, were assimilated
into both the RAP and the 3-km HRRR in 2013. This
changewas expected to improve the initialization of storm
information on the 3-km grid, thus providing a more ac-
curate depiction of ongoing convection at the earlier
forecast lead times. We present evidence in section 5 that
this is indeed the case.
Because VIL is used to identify MCSs in the model,
the treatment of microphysics and the subsequent cal-
culation of radar reflectivity are critical. The cloud mi-
crophysics scheme of Thompson et al. (2008) is used to
simulate cloud and precipitation processes. Profiles of
snow, rain, and graupel produced by the Thompson et al.
microphysics scheme are converted to a reflectivity
factor using the radar equation and assumptions per-
taining to the shape of the size distribution of each
precipitation type [see Koch et al. (2005) for details].
Profiles of reflectivity are then used to estimate the
‘‘radar retrieved’’ VIL using the empirical relationship
developed for WSR-88D NEXRAD data following (1).
While the equation used to calculate VIL from radar
reflectivity data in the model is the same as that used on
observed reflectivity, no attempt has been made to ac-
count for the impact of radar coverage gaps in the cal-
culation of the simulated reflectivity field obtained from
TABLE 1. HRRR configurations used in 2012 and 2013. Note that RAP, the parentmodel of theHRRR,was also updated. Details of these
updates may be found in Benjamin et al. (2013).
2012 configuration 2013 configuration
WRF version 3.3.1 3.4.1
Radar data assimilation RAP only RAP and HRRR
Radiative transfer RRTM longwave RRTM longwave
Dudhia shortwave Goddard shortwave
PBL MYJ MYNN
Land surface model RUC, version 3.3.1-Smirnova RUC, version 3.3.1-Smirnova
Six levels Nine levels
1 The three-dimensional radar reflectivity mosaic was produced
as part of the MRMS data suite produced by the National Severe
Storms Laboratory in Norman, Oklahoma.
894 WEATHER AND FORECAST ING VOLUME 30
the model. Despite this simplifying assumption (which
would lead to overestimates of radar-retrievedVIL), the
model actually underforecasts VIL. In fact, the fre-
quency of VIL values exceeding 3.5 kgm22 is under-
forecasted by as much as a factor of 5, especially at the
longer lead times (Fig. 1). Thus, the expected over-
prediction of VIL that would result from neglecting the
impact of radar coverage gaps in the model-derived VIL
is masked by biases associated with assumptions in the
microphysics parameterization and subsequent compu-
tations of radar reflectivity that contribute to the un-
derprediction of VIL.
3. Storm identification
The object-based technique used in this study was
adapted from the Thunderstorm Identification Tracking
Analysis and Nowcasting (TITAN) software, developed
by Dixon and Wiener (1993), which was originally de-
signed to identify and track convective storms in the
reflectivity field of an individual radar. The software
provides several characteristics of each storm it iden-
tifies including location, orientation, size, aspect ratio,
storm trends, and motion. The TITAN software has
since been adapted to work on radar mosaic data and,
more recently, model output (Pinto et al. 2007; Caine
et al. 2013). In this study, the adapted TITAN software is
used to assess the timing and location of MCSs in both
radar mosaic data and HRRR forecasts. Convective
objects are identified in the observed VIL field using a
threshold of 3.5 kgm22. This VIL threshold, which is
typically used to identify convection, corresponds to
column maximum reflectivity values ranging from 30 to
40dBZ, following Hallowell et al. (1999). To be con-
sidered an MCS, the convective object (which may in-
clude gaps between convective cells of up to 20km)must
have a maximum dimension (measured along the ob-
ject’s major axis) exceeding 100 km for at least two
consecutive hours. This MCS definition is similar to that
used in previous studies to investigate the occurrence of
MCSs (e.g., Houze 1993; Geerts 1998; Parker and
Johnson 2000; Coniglio et al. 2010), except that our
definition includes less-organized convective areas or
clusters of convective cells that are typical of the
southeastern United States.
In this study, the initiation of an MCS is defined as
occurring if no preexisting MCSs are present within the
last 2 h and within a search radius of 125 km that ex-
tended from all edges of the MCS in question. The
search radius is based on the median speed of MCSs
reported in the literature (e.g., Carbone et al. 2002) and
comparable to the separation distance of 144 km used by
Davis et al. (2006) to distinguish separate convective
storm objects. The use of look-back times of 1 and 2h to
identify MCS initiation (MCS-I) is similar to the 1–2-h
MCS ‘‘genesis’’ stage described by Coniglio et al. (2010)
during which time storm cells may first merge to form a
near-continuous line of convection exceeding 100 km in
length. Figure 2 illustrates the procedure used to identify
MCS-I events. In this example, five MCS objects were
identified at the analysis time. Objects 4 and 5 had
clearly existed previously, as indicated by the blue and
magenta objects from 1 and 2h in the past, respectively.
Farther west, objects 1 and 2 had no preexisting MCSs
within 125 km in the past 2 h and are classified as MCS-I
events. The convective storms evident to the east and
southeast of objects 4 and 5 in the VIL field were too
small and too far apart to be classified as MCSs. Finally,
object 3, which should have been classified as an MCS-I
event, was not identified as such because it formed
within 125 km of object 4 that was present 2 h earlier.
This missed MCS-I event represents a flaw in the algo-
rithm; however, this situation was found to occur less
than 10% of the time in a comparison with manually
classified MCS-Is. Furthermore, it is expected that since
the same technique is used on both observations and
model data, the overall impact on the comparisons will
be small.
4. Observed MCS properties
The algorithm described above was used to detect
MCSs and MCS-I events in the observed VIL mosaic
obtained from CIWS for the summers of 2012 and 2013.
Table 2 lists the mean macroscale properties of MCSs
observed during these two summers. There was a 15%
increase in the number of MCS events observed be-
tween 2012 and 2013, but their median size, shape, and
orientation were basically unchanged (,4% change).
FIG. 1. Observed and modeled frequencies of VIL exceeding
either 1.5 or 3.5 kgm22 as a function of lead time for the eastern
United States (i.e., east of 1058W) for JJA 2012.
AUGUST 2015 P I NTO ET AL . 895
The median orientation, given as the angle at which the
major axis of the storm is oriented clockwise from due
north (values 08–1808), ranged between 758 and 808 (in-dicating that the MCSs tended to extend from the west-
southwest to east-northeast). The aspect ratio, given as
the ratio of the MCS major radius to its minor radius,
indicates that the length of MCSs was 3 times the width.
MCS-I events were recorded a factor of 10 times less
often than MCS occurrences in both years. This large
difference betweenMCS andMCS-I counts is due to the
fact that a single MCS can be counted multiple times
(each hour of its existence) over the course of its life
cycle whereas each MCS-I is treated as a singular event.
Sensitivity studies, in which each algorithmic parameter
was separately varied, revealed that MCS and MCS-I
counts were most sensitive to the VIL threshold and
permissible gap size, with the MCS counts being more
sensitive than the MCS-I counts to changes in the MCS-
defining characteristics. A 1kgm22 increase in VIL
threshold resulted in a 40% (30%) decrease in the
number of MCSs (MCS-Is) while a 10-km increase in
permissible gap size resulted in a 100% (50%) increase
in the number of MCSs (MCS-Is). The sensitivity of the
observed MCS counts to VIL threshold reveals the im-
portance of accounting for mean bias offsets between
the modeled and observed VIL fields prior to evaluating
model skill.
Maps of the observed MCS frequency of occurrence
were generated using 18 grid boxes (Fig. 3). For each
MCS detected, the grid boxes containing any part of the
MCS were determined and the MCS count in each of
these grid boxes was increased by one. The frequency is
then determined by dividing the MCS count by the
number of observation times in the period of interest.
The frequency is then scaled to give the number of ob-
served MCSs occurring within each grid box per week
for a given time period (daytime is defined as 1100–2000
LST, and nighttime during the remaining hours). The
largest seasonal (JJA) differences in MCS frequency
between 2012 and 2013 are found over the central por-
tion of both the Great Plains (GP) andMississippi River
valley (MRV) longitudinal bands at night (Fig. 3). The
TABLE 2. MCS characteristics observed for the summers of 2012 and 2013.
Year No. of MCSs No. of MCS-Is Median MCS (MCS-I) size (km) Median aspect ratio Mode orientation
2012 8270 873 143 (123) 2.8 77
2013 9529 929 145 (123) 2.9 78
FIG. 2. Schematic representation of the MCS-I identification method. Red, blue, and ma-
genta indicate the current, 1-h-previous, and 2-h-previous locations of the MCS, respectively.
Colors from more recent times overlap those from previous times. Filled contour underlay
gives theVIL obtained 1 h before present. The ellipses indicate the approximate search area for
each MCS identified at current time with the orange ellipses indicating the MCS-I events and
the cyan ellipses indicating which MCSs are not considered MCS-Is. Each current MCS is also
numbered to aid discussion in the text.
896 WEATHER AND FORECAST ING VOLUME 30
limited MCS activity observed over the central United
States in 2012 corresponded with one of the driest
summers on record in this region (Hoerling et al. 2013;
Basara et al. 2013). In 2013, MCS activity over the
central United States increased dramatically over 2012
and was much closer to normal.
Regional variations in the observed diurnal cycle of
MCSs are depicted in Fig. 4. Results from the three
longitudinal bands and two additional evaluation boxes
shown in Fig. 3 are given. Results from the Texas (TX)
box represent a subset of the MCS activity occurring
within the GP band, while the Southeast (SE) box
overlaps the southern portions of both the MRV and
Appalachians (APP) longitudinal bands. Both the am-
plitude (i.e., percentage of all MCSs occurring in a re-
gion) and phase (i.e., timing) of the diurnal cycle vary by
region. Consistent with previous climatological studies
(e.g., Carbone and Tuttle 2008), the center of the time
(LST2) of peak MCS activity (defined as the midpoint
between clear positively and negatively sloping portions
of each curve and given in local solar time) occurs about
4 h later in the GP band (1900 LST) than that found for
the APP band (1500 LST). There is also evidence to
support the existence of an earlier ramp-up in MCS
activity in the two southern regions (i.e., SE and TX)
compared to their corresponding longitudinal band(s).
This north–south variation in the timing of the ramp-up
FIG. 3. Observed seasonal mean MCS occurrence rates (No. per week) for (a) all times, (b) daytime (1100–2000
LST), and (c) nighttime (2100–1100 LST) for JJA (left) 2012 and (right) 2013. Labeled bands and boxes indicate
regions used in additional analyses described in the text.
2 LST is used instead of UTC for many analyses, since the con-
tiguous United States spans;588 of longitude and, thus, local solartime varies by ;3.9 h across the country.
AUGUST 2015 P I NTO ET AL . 897
is related to the later initiation of MCSs to the north, as
seen in Fig. 5b. The diurnal cycles with the largest am-
plitudes occur in the SE and APP regions, while the
smallest-amplitude diurnal cycle is found for the MRV
band. Very similar diurnal cycles were observed for each
region in 2013 (not shown).
While the observed ratio of MCS to MCS-I events is
consistent between the two summers (Table 2), differ-
ences are evident in the observed spatial patterns of
MCS-I frequency (Fig. 5a). These maps were created by
counting all MCS-I events occurring within overlapping
58 3 58 boxes spaced 18 apart. This larger box size
(compared to that used to build the MCS frequency
maps) was necessary because of the relatively infrequent
nature of MCS-I events. To remove the effects of off-
shore convection and the Canadian radar attenuation
issue, the MCS-I frequencies are limited to only those
occurring within nonoceanic grid boxes and within the
continental United States.MCS-I frequencymaxima are
observed over the GP and APP bands peaking over the
Florida peninsula. In both years, the MCS-I frequency
maximum in the APP band extends from the Gulf Coast
northward along the spine of the Appalachians into
western New York. This peak frequency of MCS-I over
the Appalachians is consistent with that reported by
Parker andAhijevych (2007). PeakMCS-I activity in the
GP band varies notably between 2012 and 2013, shifting
from the Dakotas in 2012 to Nebraska and western
Kansas in 2013 with the area and increasing in magni-
tude in 2013.
The median time of MCS-I events shown in Fig. 5b is
found following a two-step method. First, a distribution
MCS-I counts versus LST is generated and smoothed
using a cyclic 5-h boxcar average. Then, the time cor-
responding with the minimum MCS-I count in the
smoothed distribution is found and used as the starting
point to compute the median LST value using a circular
time of day coordinate system. The median time of
MCS-I events increases as one moves north and west
across the United States (Fig. 5b) varying from around
1200 LST in the southeastern United States to 1800–
2200 LST over much of the central and northern por-
tions of the GP band. Localized areas of even later
MCS-I activity are evident in the GP and MRV bands
especially in 2013. Summer-to-summer variations are
also in northern Texas where MCS-I occurred notably
later in 2013 and in the northern third of theMRV band,
which experienced much earlier MCS-I in 2013. The
observed MCS-I frequency maxima found over the GP
region in Fig. 5 are generally located upstream of the
MCS frequency maxima shown in Fig. 3. The relation-
ship between MCS-I location and subsequent down-
stream MCS frequency maxima is consistent with that
previously discussed by Carbone and Tuttle (2008).
In the SE region, MCSs tend to form through the
clustering and upscale growth of convective storms (e.g.,
Geerts 1998). The MCS-I frequency maxima in the SE
region are coincident in time and space (Figs. 5a,b) with
the peaks in the MCS frequency map shown in Fig. 3. In
the region denoted by the SE box in Fig. 3, MCS-I
generally occurs between 1100 and 1400 LST and peak
MCS activity occurs between 1300 and 1700 LST
(Fig. 4). Similar to that described by Geerts (1998), the
timing of peak MCS-I and MCS frequencies in the SE
FIG. 4. Normalized diurnal cycle of MCSs by region (GP, MRV, and APP bands, as well as
two smaller boxes for SE and TX, as shown in Fig. 2) for JJA 2012. Total numbers of MCS
events for each region are listed in the legend.
898 WEATHER AND FORECAST ING VOLUME 30
region indicates that MCSs tend to form and dissipate
over a short time period with limited propagation.
5. Evaluation of HRRR
The same storm identification algorithm used to detect
MCSs and MCS-Is in the observations (see section 4) is
used to evaluate the HRRR. Evaluations are performed
both in a statistical sense (i.e., in which the statistical
properties of observed andmodeled storms are compared
without being matched in time and location) and using
standard pairwise metrics (in which modeled and ob-
served storms are matched based on proximity to one
another, as described in more detail below). Since MCS
identification can be highly sensitive to the VIL threshold
used to identify convection and because the modeled VIL
field can be significantly biased depending on the choice of
microphysical parameterization (e.g., Clark et al. 2014)
and the assumptions used to compute the reflectivity, an
attempt is made to remove the mean bias from the
synthetic-radar-based VIL field (as shown in Fig. 1)
prior to applying the MCS identification software.
a. The VIL threshold optimization
To account for model bias caused by assumptions in
the microphysics and computation of radar-derived VIL
and also potential biases in the observations, the
threshold used to identify MCSs is chosen using an op-
timization procedure. This is done by comparing mod-
eled MCS size distributions obtained using a range of
VIL thresholds to that obtained from observations using
the standard VIL threshold of 3.5 kgm22. This pro-
cedure is illustrated in Fig. 6. It is seen that using the
standard threshold of 3.5 kgm22 on the model VIL re-
sults in large underforecasts of the number of MCSs in
each size bin. In the example shown in Fig. 6, a VIL
threshold of 1.5 kgm22 produces the best match be-
tween the 6-h forecast of MCSs and those that were
observed. This finding is consistent with that shown in
Fig. 1, where the frequency of 6-h modeled VIL ex-
ceeding 1.5 kgm22 corresponds with the frequency of
observed VIL exceeding 3.5 kgm22. However, because
bias in the modeled VIL field varies with forecast
lead time (Fig. 1), using a single threshold to detect
MCSs in the model forecasts will translate into lead
FIG. 5. Maps of (a) observed MCS-I frequency (No. per week per unit area for the actual land area within a 500 km2 search box) and
(b) median time of MCS-I event during (left) 2012 and (right) 2013. The frequency is computed by summing over all times of day. The
median time of MCS-I is only computed at points with at least one MCS-I per week per unit land area.
AUGUST 2015 P I NTO ET AL . 899
time–dependent biases in the modeled MCS frequen-
cies. These lead time–dependent biases are an intrinsic
property of the model and are not related to assump-
tions used in the computation of radar reflectivity.
Sensitivity studies revealed that this optimized
threshold can be obtained using a period as short as a
week depending on the amount of convective activity. It
is important to note that only the VIL threshold is varied
during the optimization procedure. All other criteria
used to identifyMCSs in themodel are identical to those
used to identify MCSs in the observations. Using this
optimization procedure, it was found that VIL thresh-
olds of 1.5 and 1.75 kgm22 worked best for the summers
of 2012 and 2013, respectively. These thresholds are
roughly a factor of 2 less than those used to detect MCSs
in the observations. This may seem concerning since one
might think these storms will appear to be too weak to
be considered MCSs, but one must remember that we
chose these thresholds to account for model biases that
are predominantly in the model data postprocessing
used to generate the radar-derived VIL field. When one
compares the retrieved VIL field with the composite
reflectivity field, it becomes clear that the threshold
optimization method does a nice job of detecting con-
vective storms that would otherwise have been missed
when using the retrieved VIL field alone (Fig. 7).
Application of these optimized thresholds resulted in
median storm sizes, aspect ratios, and modal orienta-
tions that are nearly identical to the observed values
reported in Table 2. The 2013 data provide a represen-
tative comparison of observed MCS counts with those
obtained from the model using a threshold of
1.75 kgm22 (Table 3). Biases in the counts of MCSs and
MCS-Is obtained from all available 6-h forecasts are
small (,10%) when evaluated over the United States
east of 1058W (eUS), but can reach nearly a factor of 2
when evaluated regionally. Results given in Table 3 re-
veal that there were 45% toomanyMCSs predicted over
the GP region and 15% too few predicted over the SE
region in 2013 (where the MCS-I count was also dra-
matically underforecasted by 39%). Temporal and re-
gional variations inmodel performance will be discussed
in more detail below, along with how the model per-
formance changed between 2012 and 2013.
FIG. 6. Sensitivity of the MCS size distribution to choice of VIL
threshold using 6-h forecasts and observations collected during 6–
12 Jul 2012. The black line depicts the observed MCS size distri-
bution obtained using a threshold of 3.5 kgm22.
FIG. 7. HRRR 6-h forecast valid at 0500 UTC 9 Jul 2013 of (a) column max reflectivity (dBZ) and (b) VIL (kgm22) derived from
modeled radar reflectivity. The white contours indicate storms identified as MCSs using a VIL threshold of 1.75 kgm22. Only the large
area of convection over southwestern NorthDakota would have been classified as anMCS using the standardVIL threshold of 3.5 kgm22
despite the fact that the modeled composite reflectivity within each MCS contour exceeds 35 dBZ.
900 WEATHER AND FORECAST ING VOLUME 30
b. Space–time variations in model performance
Spatiotemporal variations in the model’s skill at pre-
dictingMCSs were evaluated in two ways. First, regional
variations in the seasonal mean statistical properties of
modeled and observed MCSs and MCS-Is are com-
pared. Then, a simple technique that matches modeled
and observed objects while allowing for small position
errors is used to assess the model’s skill at predicting
specific MCS events.
1) STATISTICAL ASSESSMENT
While threshold optimization removes the impact of
mean bias in the VIL field, biases in the prediction of
MCSs that are a function of lead time, valid time period
(i.e., day vs night), and region remain (Fig. 8). The
model generally predicts too many MCSs during the
earlier lead times and too few MCSs at the longer lead
times. While this dependence on lead time is evident in
both years, it was somewhat reduced in 2013. In both
years, the model tended to overpredict the frequency of
MCSs during both daytime and nighttime periods in the
GP band while underpredicting the daytime MCS fre-
quency in the SE andAPP regions.While themodel bias
in some regions was consistent from one year to the next
(e.g., nighttime high bias in the GP region), other re-
gions experienced notable changes in performance. For
example, the frequency of daytime MCSs in the APP
and MRV regions was better captured in 2013.
The diurnal cycle of MCS frequency of occurrence
obtained from model forecasts of varying lead times is
compared to the observedMCS frequency in Fig. 9. The
MCS frequency for a given longitudinal band is obtained
by dividing the number of MCSs obtained for a given
band, lead time, and valid time by the corresponding
total number of modeled MCSs obtained for a given
band and lead time summed over all valid times. This
normalization effectively removes the model biases
shown in Fig. 8 so that any deviations between the
forecasted and observed diurnal cycles may be in-
terpreted as amplitude and/or timing errors for a given
forecast lead time. All lead times of the model generally
capture the observed regional variations in the phase
TABLE 3. Modeled and observed MCS and MCS-I counts for the
summer of 2013.
Observed
6-h model forecast,
T 5 1.75 kgm22
Region MCS MCS-I MCS MCS-I
eUS 9529 929 10 240 850
GP 2497 288 3622 322
MRV 3483 288 3013 196
APP 3164 313 3198 296
SE 2990 264 2550 160
FIG. 8. Modeled minus observed MCS frequency vs lead time obtained using optimized thresholds for
(a) daytime (1100–2000 LST) and (b) nighttime (2100–1100 LST) for three longitudinal bands (GP, MRV, and
APP), and two regions (TX and SE) during (left) 2012 and (right) 2013. The actual observedMCS frequencies (No.
per week) for each region and time period are given in the legend.
AUGUST 2015 P I NTO ET AL . 901
and amplitude of the MCS diurnal cycle in both years,
with the 2013 forecasts deviating from the observations
less than the 2012 forecasts. Comparing results for the
three longitudinal bands, the largest deviations are evi-
dent for the GP band in 2012 for lead times of 10 h or
greater. The GP diurnal cycle obtained at these longer
lead times appears to be several hours out of phase
compared to the observed diurnal cycle, with both the
ramp-up ofMCS activity and the center time of theMCS
frequency maximum occurring several hours later than
observed (Fig. 9a). The opposite effect is found for the
GP band in 2013, when the modeled ramp-up, peak, and
decay phase of the diurnal cycle occur about 1–2h ahead
of that observed for all forecast lead times. Themodeled
timing and amplitude of the MCS diurnal cycle in the
MRV band is well captured in both years, with the 2013
forecasts showing slightly less deviations from the ob-
served diurnal cycle than those found in 2012. Finally,
the amplitude of MCS activity in the APP band was
closer to that observed in 2013, but the timing (ramp-up
and decay of MCS frequency) remains delayed by
about 2 h.
A more detailed inspection of how well the model
captures temporal and regional variations in MCS fre-
quency of occurrence is shown in Fig. 10. The frequency
differences shown in Fig. 10 are for 6-h forecasts valid
FIG. 9. Frequency of modeled and observed MCSs for the (a) GP, (b) MRV, and (c) APP longitudinal belts for JJA (left) 2012 and
(right) 2013. Frequencies are found by dividing the MCS count found at a given valid time by the total number of MCSs detected in the
observations or the model forecast (for a given lead time) during the 92-day period.
902 WEATHER AND FORECAST ING VOLUME 30
during three time periods: daytime, nighttime, and all
times. In 2012, the largest positive biases are found over
Louisiana and Florida for forecasts valid during day-
time hours, where the model predicts up to five oc-
currences per week more than observed (i.e., an
overprediction by more than a factor of 2). The model
also overpredicts the frequency of MCSs over vast
areas of the GP band (particularly the northern half)
and northeastern United States during the day and at
night. In contrast, the model tended to underforecast
the frequency of daytime MCSs in the southeastern
United States especially near the coast of the Carolinas
and northern Florida in both years. Other regions
characterized by underforecasting biases include the
western slope of the southern Appalachians and east-
ern Texas during the day and parts of the MRV band at
night. The daytime high biases in the northeastern
United States, Louisiana, and Florida were less pro-
nounced in 2013. In contrast, the magnitude of biases in
the GP region (positive bias) and central MRV region
(negative bias) increased in 2013.
The MCS frequency biases shown in Fig. 10 can be
attributed, in part, to biases in the initiation of theMCSs
in the model. Comparing Figs. 5 and 11, it is seen that
while the model was able to roughly approximate the
location of the observedMCS-I hot spots (i.e., especially
the high plains portion of the GP band and Florida), it
overforecasted the MCS-I frequency maximum over
portions of the GP band (in terms of the extent of the
local maximum) and grossly underforecasted theMCS-I
frequency in the SE region and APP band (in terms of
both extent and magnitude) except over parts of Florida
where the model overforecasts the frequency of MCS-I
(Fig. 11a). These MCS-I biases are generally consistent
FIG. 10. Model 6-h forecasted MCS occurrence rate minus the observed occurrence rate for (a) all times,
(b) daytime (1100–2000 LST), and (c) nighttime (2000–1100 LST) for JJA (left) 2012 and (right) 2013. Black box
denotes central MRV region used in Fig. 18.
AUGUST 2015 P I NTO ET AL . 903
with MCS frequency biases shown in Fig. 10. Lesser
MCS-I underforecasting biases are also evident
throughout theMRV band, which does not seem to fully
explain the nighttime MCS forecasting bias that is evi-
dent in 2013 (Fig. 10).
The model seems to capture the larger-scale varia-
tions in the timing of MCS-I fairly well in both years. As
seen in Fig. 4, MCS-I generally occurs later in the day as
one moves north and west with latest MCS-Is in the
central and northern portions of the MRV and GP
bands. South-to-north variations in themedian timing of
MCS-Is are generally captured by the model fairly well
in 2012; however, in 2013 the modeled MCS-I is much
too early in the central portions of the GP and much too
late over southern Texas (Fig. 11). In the APP band,
there is a tendency for the modeledMCS-I to occur later
than observed, with this late timing bias being larger in
2013. Finally, the model shows an area of later-forming
MCSs similar in size to that observed; however, the
modeled area of late-forming MCSs is displaced several
hundred kilometers to the southeast of the observed
area of late-forming MCS.
2) MATCHED MODELED AND OBSERVED MCSANALYSES
Thus far, we have only compared the statistical
properties of modeled and observed MCSs. In this sec-
tion, we discuss the model’s skill at predicting specific
MCS events using a simple object-matching technique.
In this object-matching technique, modeled MCS ob-
jects that are within 100 km of a forecasted MCS are
given credit for a hit, as shown in Fig. 12 and Table 4.
The position and area covered by each MCS are defined
using a centroid location and a 72-vertex polygon (one
vertex every 58). Each modeled and observed MCS ob-
ject is then expanded by extending each vertex of the
polygon by 50km. Extending both modeled and ob-
served MCS objects by 50km allows for forecast errors
of up to 100 km to be considered a hit. Hits, misses, false
alarms, and correct nulls are determined following the
rules given in Table 4. It is important to note that the
standard skill scores computed using this technique are
impacted by the size of the modeled and observed
storms and howmuch they overlap. This upscaling of the
FIG. 11. Maps of (a) seasonal mean MCS-I occurrence rate obtained from 6-h forecasts and (b) median time of MCS-I in the 6-h
forecasts for JJA (left) 2012 and (right) 2013. The median time of MCS-I is only computed at points with at least oneMCS-I per week per
500 km2. The VIL threshold for identifying MCSs is shown at the top right.
904 WEATHER AND FORECAST ING VOLUME 30
modeled and observed MCSs is designed to give credit
for near misses but not penalize for additional false
alarms caused by the MCS expansions.
The hit, miss, false alarm, and positive null counts are
summed over the period from 1 June to 31 August for a
given forecast lead time using the 3 3 3 contingency
table provided in Table 4, from which the following
standard skill scores are computed (Wilks 2006):
frequency bias (FBIAS)5 nfcst/nobs , (2)
false alarm ratio (FAR)5 false alarm/(hit1false alarm),
and (3)
probability of detection (POD)5 hit/(hit1miss) . (4)
Using Table 4, the frequency bias is obtained by
the number of forecasts greater than or equal to two
nfcsts and the number of observations greater than or
equal to two nobs.
In addition, because we are trying to compare the skill
for two seasons with varying event frequencies, we use
the symmetric extreme dependence score, which is eq-
uitable and not impacted by the frequency of occurrence
of the event being verified (Hogan et al. 2009):
symmetric extreme dependency score (SEDS)
5 ln[FBIAS(FRQ)2]/ln[POD(FRQ)]2 1 and (5)
frequency (FRQ)
5 (hit1miss)/(hit1miss1 null1 false alarm). (6)
The seasonal mean skill scores are computed for 2012
and 2013 for three regions: the eastern United States
(east of 1058W), and the SE and GP regions shown in
Fig. 3. As with the statistical comparisons discussed in
the previous section, the skill scores are a strong func-
tion of valid time, forecast lead time, and region
(Figs. 13–16). The dashed line shown in these figures can
be used to determine how the skill score of the 0000 UTC
forecast varies with lead time. The mean skill scores for
any other forecast issuance time may be found by following
a line parallel to the dashed line that originates on the
issuance time of interest.
The diurnal variation in frequency bias is shown for
each region in Fig. 13. This figure providesmore detailed
information on the diurnal variation of bias than that
given in Fig. 8. Figure 13 also includes curves that in-
dicate the diurnal variation in the observed MCS fre-
quency for each region. Comparing the curves in each
panel reveals that the observed diurnal cycle of MCS
frequency for the GP region is almost completely out of
phase with that found for the SE region. Diurnal varia-
tions in the skill scores may be described in terms of the
observed phase of MCS activity (i.e., dissipation vs ini-
tiation and growth). For example, in the SE region, the
frequency bias peaks (more so in 2013) in forecasts is-
sued between 0000 and 0800 UTC, which are valid
during the dissipation phase of MCSs in this region, in-
dicating the model’s tendency to dissipate storms too
slowly in this region. At the same time, the frequency
bias in the SE region was reduced during the storm
FIG. 12. Schematic demonstrating how contingency tables are
created. The red-filled contours depict the truth field (given a value
of 2) and the yellow-filled contours represent a 50-km expansion of
the truth field (given a value of 1). The dark blue contours indicate
themodel-predictedMCS locations (52) and the light blue contour
indicates the model-predicted MCS locations extended by 50 km
(51). The grid lines represent a 18 grid, with each grid square being
classified as either a hit (indicated by an ‘‘H’’), miss (indicated by an
‘‘M’’), false alarm (indicated by an ‘‘F’’), or correct null (indicated by
an ‘‘N’’) based on the classification method shown in Table 4.
TABLE 4. Description of 3 3 3 contingency table used in the matched comparison of modeled and observed MCSs. For the ‘‘Modeled
MCS location value’’ and ‘‘Observed MCS location value,’’ a value of 2 is used to indicate the exact location of anMCS while a value of 1
indicates a 50-km buffer around the MCS. A value of zero indicates that no MCSs are present within 50 km.
Observed MCS location value
2 1 0 Totals
2 Hit Hit False alarm nfcstModeled MCS location value 1 Hit Correct null Correct null Not used
0 Miss Correct null Correct null nfcst,nullTotals nobs Not used nobs,null
AUGUST 2015 P I NTO ET AL . 905
initiation and growth phase (1600–2200 UTC) in 2013.
In the GP region, forecasts issued after 1200 and before
0000 UTC had large frequency biases. These high bia-
ses are found primarily during the initiation phase of
MCSs in this region, which is consistent with the
overprediction of MCS-Is and MCSs in this region
(Figs. 10 and 11). These high biases expanded in 2013
despite a 40% increase in observed MCS activity
over 2012.
Like the frequency bias, each of the skill scores (i.e.,
SEDS, POD, and FAR) varies with lead time, valid
time, region, and year (Figs. 14–16). For the eastern
U.S. domain, the best performing forecasts were issued
between 2000 and 0400 UTC (note the skill pickup
along either side of the dashed line that identifies all of
the valid times for the 0000 UTC forecast in Fig. 14).
This valid time period corresponds with the observed
period of enhanced MCS activity in the GP region
(Fig. 13c). This period of peak skill is characterized by
higher POD and lower FAR values than at other times
of day. In contrast, model forecasts issued between
0600 and 1600 UTC (i.e., period corresponding with
reduced observed MCS activity compared to other
times of day) perform worst (Fig. 14). These relation-
ships are consistent with the conjecture that the as-
similation of radar reflectivity improves model skill
when appreciable convective activity is present at
forecast initialization time.
Overall, the skill (given by SEDS) of the MCS
forecasts is greater in 2013, particularly for longer-
lead forecasts verifying between 0600 and 1200 UTC
(i.e., overnight period). These increases in SEDS
are coincident with simultaneous increases in POD
and decreases in the FAR in the 2013 forecasts
(Figs. 14b,c). This skill pickup was not simply due to the
increased frequency of the observed MCS events (seen
in Fig. 13c), as discussed by Jolliffe (2008), since SEDS is
not sensitive to these fluctuations (Hogan et al. 2009).
FIG. 13. Model bias as a function of lead time and valid time of day in the (a) eastern U.S. and the (b) SE and
(c) GP domains during (left) 2012 and (right) 2013. Solid curves indicate relative variations in observed MCS
frequency with time of day for a given region (the max observed frequency is given for each region). Dashed
diagonal lines indicate the lead time/valid time coordinates of the 0000 UTC forecasts.
906 WEATHER AND FORECAST ING VOLUME 30
However, the year-to-year change in observed MCS
frequency indicates that there were likely appreciable
differences in the mean environmental conditions of this
region between 2012 and 2013, which may have contrib-
uted to changes in model performance.
Regional variations in model performance are ex-
plored by studying two regions (SE and GP) with vastly
different storm formation mechanisms and statistical
performance characteristics [see section 5b(1)]. Values
of SEDS vary more strongly with lead time in the SE
region and more strongly with valid time in the GP re-
gion (Figs. 15 and 16, respectively). In the SE region,
SEDS values are consistently highest for the shortest
lead times, while in the GP region, SEDS peaks during
the evening or overnight period (between 0000 and
1200UTC). Variations in SEDS are related to variations
in POD in the SE region, and both POD and the FAR in
the GP region. While the POD values are comparable
between the two regions, the FAR values are much
greater in the GP region. The cause of the high FAR
values in the GP region after ;1200 UTC, which is
consistent with the overprediction of MCS frequencies
in this region discussed in section 5b(1), is further ex-
plored in section 5c.
Values of SEDS in 2013 tend to be higher than those
found for the 2012 forecasts in both regions. In the
SE region, this increase in skill is limited to forecast
lead times of 3–8h that are valid between 1400 and
1700 UTC. In the GP region, SEDS increases in 2013 for
forecasts issued between 0000 and 0600 UTC and for
lead times of up to 10 h, while it decreases slightly at lead
times longer than 8h for forecasts issued between 1600
and 0000 UTC (i.e., issue times for which convective
activity is limited, as seen in Fig. 13c). In both regions,
the POD of the first few lead hours of forecasts issued in
2013 increases dramatically over that found in 2012.
This increase in POD at the short lead times is consis-
tent with what one might expect through improved
FIG. 14. Model performance as a function of lead time and valid time of day as indicated by (a) SEDS, (b) POD,
and (c) FAR for the eastern U.S. domain during (left) 2012 and (right) 2013. Dashed diagonal lines indicate the
location of lead times associated with the 0000 UTC forecasts.
AUGUST 2015 P I NTO ET AL . 907
assimilation of radar reflectivity. The increase in SEDS
extends to longer lead times in theGP regionwhere larger,
longer-lived storms are more common than in the SE re-
gion; thus, their assimilation into the model can have a
longer impact on themodel’s performance. These findings
are also consistent with Stratman et al. (2013), who dis-
cussed improvements associated with the assimilation of
radar reflectivity into the CAPS 4-km model ensemble.
c. Exploring the cause for model biases
The evaluation presented above indicates that there
are three consistent biases in the model’s prediction of
MCSs. The three main biases are 1) underprediction of
daytime MCSs in the SE, 2) overprediction of MCSs in
the high plains and northernGP throughout the day, and
3) underprediction of nighttime MCSs in the central
MRV. In this section, each of these biases is explored in
more detail.
1) DAYTIME MCS UNDERFORECASTING BIAS IN
THE SOUTHEASTERN UNITED STATES
The importance of evaluating the model at finer scales
is evident when comparing longitudinal band averages
of MCS frequencies (Fig. 8) with those obtained using a
100-km grid (Fig. 10). It is seen that while model-
forecasted MCS frequencies had limited bias in the
APP longitudinal band (Fig. 8), the southern and eastern
areas within the APP band had large negative biases,
especially during the daytime (Fig. 10). As mentioned
earlier, a large fraction of the observed MCS-I events
are missed in the southeastern third of the APP band in
both years. Matched pair analyses revealed that only
18% of the observed MCS-I events in this region were
simulated to within 62 h in 2013. Of the missed MCS-I
events, 25% of the time the model failed to initiate
convection at any scale while the rest of the time con-
vection was predicted, but storms failed to grow large
FIG. 15. As in Fig. 14, but for the SE domain.
908 WEATHER AND FORECAST ING VOLUME 30
enough to attain MCS status (Fig. 17). Here, it is seen
that, for each observed MCS-I event, the size of the
modeled convective storms tended to remain below the
MCS criterion (i.e., 100km). Quantification of the biases
in the modeled environmental conditions associated
with these missed MCS-I events is needed to guide fu-
ture model improvement efforts.
2) MCS OVERFORECASTING BIAS OVER THE HIGH
PLAINS AND NORTHERN PLAINS
Model forecasts of MCSs exhibited a high bias in the
GP longitudinal band throughout the day at all but the
longest forecast lead times (Fig. 8). The high bias was
most pronounced over the high plains (i.e., western half
of the GP band) and northern Great Plains (Fig. 10),
with areas experiencing the highest biases correspond-
ing directly with areas where MCS-Is were also over-
predicted. The overprediction of MCS-Is over the high
plains is consistent with that reported by Burghardt et al.
(2014), who found similar biases (the modeled MCS-Is
being too early and occurring too often) in a set of high-
resolution simulations of terrain-aided convective initi-
ation over the high plains.
In terms of the matched skill scores, the model was
able to detect more than 80% of the MCS events oc-
curring during the period on enhanced MCS activity at
night, but at the expense of FAR values of over 50%
(Fig. 16). These FAR values are much larger than the
30% values found during the period of peak activity in
the SE region (Fig. 15). Further inspection of the false
alarm cases in the GP region revealed that they were
primarily due to the model having too many MCS-I
events, with many of these events producing an MCS
lasting several hours. About 10% of the time, areas of
primarily nonconvective precipitation were incorrectly
identified asMCSs using the optimized threshold values.
FIG. 16. As in Fig. 14, but for the GP domain.
AUGUST 2015 P I NTO ET AL . 909
3) NIGHTTIME MCS UNDERFORECASTING BIASES
OVER THE CENTRAL MISSISSIPPI RIVER
VALLEY
In contrast to the GP region, the model tended to
underforecast the occurrence of MCSs in the MRV re-
gion during overnight hours (Fig. 8), especially in 2013.
The largest underforecasts in this longitudinal band
(with a deficit of nearly 40 per week in 2013) occurred
over the northern and central parts of the MRV band
(Fig. 10), where the model was unable to reproduce the
local maximum in MCS activity observed to the south
and west of Lake Michigan in 2013 (Fig. 3). Similar to
the SE region, this underforecasting bias can be linked,
in part, to an underprediction of MCS-I events. This is
evident when comparing Figs. 5 and 11, which indicate
that the model MCS-I frequency in 2013 is about 50% of
the observed frequency over much of the central MRV
band centered near Missouri. In addition to this MCS-I
underprediction, the MCS underforecasting bias can
also be attributed to the model dissipating nighttime
MCSs too quickly, as demonstrated in Fig. 18. In this
figure the number of observed and forecasted (given as a
function of lead time for every other issue time)MCSs is
summed over the entire summer. The forecasted MCSs
are for all forecast lead times and issued every even hour
of the day; thus, each curve overlaps the next by 12h.
The number of MCSs predicted between 2000 and 1200
LST starts near the observed number of MCSs, but then
decreases with forecast lead time. In other words, over
the course of each model run valid during this time pe-
riod, the model tends to dissipate MCSs too quickly.
Thus, the low bias in the MRV band at night is caused
by a combination of underforecasted MCS-Is and a
tendency to dissipate storms too quickly.
6. Summary and context of results
An object-based verification technique has been used
to evaluate HRRR’s skill at predicting the occurrence,
properties, and initiation of MCSs. Software, which was
originally developed to identify and track convective
storms in radar data, called TITAN (Dixon and Wiener
1993), has been adapted to assess the performance of
high-resolution model forecasts of convection (e.g.,
Pinto et al. 2007; Caine et al. 2013). While a number of
studies have used object-based techniques to evaluate
convective precipitation forecasts in the past, few have
attempted to quantify regional variations in perfor-
mance, and fewer still have quantified the model’s
ability to predict the initiation of larger storm systems
over a region spanning many different climate zones. In
addition, unlike most studies, this work employs a
method to remove the mean bias in the modeled VIL
field prior to identification of storm objects and the
computation of performance statistics. This is done by
adjusting theVIL threshold used to identifyMCSs in the
model forecasts until the modeledMCS size distribution
closely matches that obtained from radar observations
using a VIL threshold of 3.5 kgm22. This procedure ef-
fectively removes the impact of mean biases in both the
model and the observational dataset.
It was found that using an optimized VIL threshold
was critical in evaluating the HRRR’s skill at predicting
MCSs.Optimized thresholds of 1.5 and 1.75 kgm22 were
used to detect MCS HRRR runs produced in 2012 and
FIG. 17. Distributions of modeled (red) and observed (black)
storm sizes occurring within regions bound by 29.58N, 86.58W and
33.58N, 80.58W (a subregion within the SE region) that were ob-
tained for a subset of MCS-I events observed in 2013 in which max
dimension of the modeled convective storms failed to reach
100 km. Note that themodeled storms larger than 100 km that were
present in the subregion during an observed MCS-I event did not
initiate within 62 h of the observed event.
FIG. 18. Total number of modeled MCSs (3–14-h lead times of
forecasts issued every even hour are shown, with the line for the
0000 LST run being blue and more reddish and brown colors being
forecasts issued later in the day) and observed (connected dots)
MCSs obtained during JJA 2013 for the central MRV region de-
noted in Fig. 10.
910 WEATHER AND FORECAST ING VOLUME 30
2013, respectively. Both of these thresholds are at least
50% less than the 3.5 kgm22 value used to identify areas
of convection in the observed VIL field. While these
thresholds are relatively low, and not what one would
typically relate to convection, it was shown that these
thresholds effectively capture areas of convection that
might be seen using other fields such as composite
maximum reflectivity. As such, a key aspect of this study
is to make users of this data aware of potential biases in
the modeled VIL field and to allow for a fair assessment
of the model skill when using this field to identify MCS
rather than penalize the model for issues predominantly
related to postprocessing.
Once the mean bias is removed using a single VIL
threshold, other biases related to model physics and
predictability issues can be explored. It is found that the
remaining biases are a function of lead time, region, and
time of day. Generally, the frequency of occurrence of
predicted MCSs decreases with forecast lead time.
When using a single optimized threshold, the HRRR
predicts too many (as many as a factor for 2 more than
observed in the GP region during the day) MCSs at the
shorter lead times and too few (nearly 50% less than
observed in the SE region at night) at the longer lead
times (see Fig. 8 for details). This dependence on lead
time was less pronounced in 2013, which is indicative of
an overall improvement in the HRRR’s skill at pre-
dicting MCSs. Regional variations in bias of the fore-
casted MCS frequencies obtained with the HRRR
using a single optimized threshold varied between233%
in the SE during the day to 175% in the GP region at
night in both years (Fig. 7). It should be noted that if one
had used 3.5kgm22 to detect MCSs, the range of varia-
tions in the frequency bias reported above would be
similar but values would all be negative.
More detailed analyses revealed that the HRRR
consistently overestimated the frequency of occurrence
ofMCSs over theGreat Plains during both day and night
(Fig. 10). This consistent overforecasting bias was re-
lated to the simulation of too manyMCS-I events on the
high plains in the late afternoon. Recent studies have
shown that this bias may be insensitive to model reso-
lution, as Burghardt et al. (2014) found similar biases in
the initiation of convection over the high plains at a
model grid spacing of 400m. In contrast to the high bias
found over the Great Plains, the model tended to un-
derestimate MCS frequency in the SE during the day
and in the MRV band at night. The low bias in the SE
region was related to the model underforecasting the
number of MCS-I events while the low bias in the MRV
region was related to the model’s tendency to dissipate
storms too quickly. While the model did a better job
simulating MCS-I frequency over the Appalachians in
2013, the timing in this region was worse than in 2012
with MCS-I occurring 2–3 h later than observed. None-
theless, broadly speaking, the model did a nice job
capturing the regional variations in the timing ofMCS-Is
with good representation of the later initiation of MCSs
as one moves north and west across the United States.
Smaller-scale interannual variability appears to have
been captured as well. Future changes in the HRRR’s
skill at predicting MCSs and MCS-I events can be
tracked at the NCAR/Research Applications Labora-
tory MCS verification website, which provides weekly
and monthly performance graphics (http://rap.ucar.edu/
projects/mcsprediction/statistics/).
Using optimized thresholds also allowed for a more fair
evaluation of model skill. It was found that both SEDS and
POD increased notably between 2012 and 2013while FAR
was reduced. This increase in POD was observed
throughout the day for lead times of 3–5h, with notable
increases extending to longer lead times during the storm
growth period in the SE region and at night in the GP re-
gion. Both periods in which the increase in SEDS extended
to longer lead times coincidedwith forecast issue times that
weremore likely to have existing storms. The large increase
in POD found at short lead times and the increase in SEDS
of longer-lead forecasts that likely had ongoing convective
activity at forecast issue time is consistent with what one
might expect in response to improved assimilation of radar
reflectivity. A more detailed investigation of the relation-
ship between radar reflectivity assimilation and model
performance is beyond the scope of this paper.
The object-based technique used in this study could
easily be extended to evaluate model performance as a
function of storm properties. For example, one could
explore the relationship between skill and storm size or
skill and storm organization (e.g., linear vs circular
MCSs, broken squall lines, etc.) via computing skill
scores as a function of permissible gap size or aspect
ratio. The technique could also be used to assess the
performance of high-resolution model ensemble fore-
casts following the work of Gallus (2010). Taking this a
step further, one could apply the object-based technique
on model ensemble forecasts to generate probabilistic
forecasts of storm properties. Pinto et al. (2013) dem-
onstrated this type of forecast system in which they
generated forecasts of the likelihood of convective
storms exceeding a given size threshold to aid aviation
planners in optimally selecting air traffic flow structures
in the presence of large convective storms.
Acknowledgments. The authors greatly appreciated
access to CIWS data provided by Dr. Haig Iskenderian
of the MIT Lincoln Laboratory. We would also like to
thank Drs. Stan Benjamin, Steve Weygandt, and Curtis
AUGUST 2015 P I NTO ET AL . 911
Alexander of NOAA/ESRL/GSD for providing the
HRRR data as well as several stimulating discussions
on HRRR performance over the years. The thoughtful
comments by Drs. TammyWeckwerth, Russ Schumacher,
Matthew Bunkers, and another reviewer (anonymous)
were greatly appreciated and helped sharpen the dis-
cussion of the material.
This research has been conducted in response to re-
quirements of the Federal Aviation Administration
(FAA), supplemented by funding provided by the Na-
tional Science Foundation (NSF) to NCAR. The views
expressed are those of the authors and do not necessarily
represent the official policy or position of neither the
FAA nor NSF.
REFERENCES
Alexander, C., S. Weygandt, S. G. Benjamin, T. G. Smirnova, J. M.
Brown, P. Hofmann, and E. James, 2011: TheHighResolution
Rapid Refresh (HRRR): Recent and future enhancements,
time-lagged ensembling, and 2010 forecast evaluation activi-
ties. Proc. 24th Conf. on Weather and Forecasting/20th Conf.
on Numerical Weather Prediction, Seattle, WA, Amer. Me-
teor. Soc., 12B.2. [Available online at https://ams.confex.com/
ams/91Annual/webprogram/Paper183065.html.]
Augustine, J. A., and K. W. Howard, 1988: Mesoscale convective
complexes over the United States during 1985. Mon. Wea.
Rev., 116, 685–701, doi:10.1175/1520-0493(1988)116,0685:
MCCOTU.2.0.CO;2.
Baldwin, M. E., and S. Lakshmivarahan, 2003: Development of an
events oriented verification system using data mining and
image processing algorithms. Preprints, Third Conf. on Arti-
ficial Intelligence, Long Beach, CA, Amer. Meteor. Soc., 4.6.
[Available online at http://ams.confex.com/ams/pdfpapers/
57821.pdf.]
Basara, J. B., J. N. Maybourn, C. M. Peirano, J. E. Tate, P. J.
Brown, J. D. Hoey, and B. R. Smith, 2013: Drought and as-
sociated impacts in the Great Plains of the United States—A
review. Int. J. Geosci., 4, 72–81, doi:10.4236/ijg.2013.46A2009.
Benjamin, S. G., S. Weygandt, T. G. Smirnova, M. Hu, S. E.
Peckham, J. M. Brown, K. Brundage, andG. S. Manikin, 2009:
Assimilation of radar reflectivity data using a diabatic digital
filter: Applications to the Rapid Update Cycle and Rapid
Refresh and initialization of High Resolution Rapid Refresh
forecasts with RUC/RR grids. Preprints, 13th Conf. on In-
tegrated Observing and Assimilation Systems for Atmosphere,
Oceans, and Land Surface (IOAS-AOLS), Phoenix, AZ,
Amer. Meteor. Soc., 7B.3. [Available online at https://ams.
confex.com/ams/pdfpapers/150469.pdf.]
——, ——, C. Alexander, J. M. Brown, T. G. Smirnova,
P.Hofmann, E. James, andG.Dimego, 2011: NOAA’s hourly-
updated 3km HRRR and RUC/Rapid Refresh—Recent
(2010) and upcoming changes toward improving weather
guidance for air-traffic management. Proc. Second Aviation,
Range, and Aerospace Meteorology Special Symp. on
Weather–Air Traffic Management Integration, Seattle, WA,
Amer. Meteor. Soc., 3.2. [Available online at https://ams.
confex.com/ams/91Annual/webprogram/Paper185659.html.]
——, and Coauthors, 2013: Data assimilation andmodel updates in
the 2013 Rapid Refresh (RAP) and High-Resolution Rapid
Refresh (HRRR) analysis and forecast systems. NCEP/EMC
Meeting, Washington, DC, NCEP/EMC/Model Evaluation
Group. [Available online at http://ruc.noaa.gov/pdf/NCEP_
HRRR_RAPv2_6jun2013-Benj-noglob.pdf.]
Billet, J., M. DeLisi, B. G. Smith, and C. Gates, 1997: Use of re-
gression techniques to predict hail size and the probability of
large hail. Wea. Forecasting, 12, 154–164, doi:10.1175/
1520-0434(1997)012,0154:UORTTP.2.0.CO;2.
Bryan, G. H., J. C. Wyngaard, and J. M. Fritsch, 2003: Resolution
requirements for the simulation of deep moist convection. Mon.
Wea. Rev., 131, 2394–2416, doi:10.1175/1520-0493(2003)131,2394:
RRFTSO.2.0.CO;2.
Burghardt, B. J., C. Evans, and P. J. Roebber, 2014: Assessing the
predictability of convection initiation in the high plains using
an object-based approach. Wea. Forecasting, 29, 403–418,
doi:10.1175/WAF-D-13-00089.1.
Caine, S., T. P. Lane, P. May, C. Jakob, S. T. Siems, M. J. Manton,
and J. O. Pinto, 2013: Statistical assessment of tropical
convection-permitting model simulations using a cell-tracking
algorithm. Mon. Wea. Rev., 141, 557–581, doi:10.1175/
MWR-D-11-00274.1.
Carbone, R. E., and J. D. Tuttle, 2008: Rainfall occurrence in theU.S.
warm season: The diurnal cycle. J. Climate, 21, 4132–4146,
doi:10.1175/2008JCLI2275.1.
——, ——, D. Ahijevych, and S. B. Trier, 2002: Inferences
of predictability associated with warm season precipita-
tion episodes. J. Atmos. Sci., 59, 2033–2056, doi:10.1175/
1520-0469(2002)059,2033:IOPAWW.2.0.CO;2.
Clark, A. J., R. G. Bullock, T. L. Jensen, M. Xue, and F. Kong,
2014: Application of object-based time-domain diagnostics
for tracking precipitation systems in convection allowing
models. Wea. Forecasting, 29, 517–542, doi:10.1175/
WAF-D-13-00098.1.
Coniglio, M. C., J. Y. Hwang, and D. J. Stensrud, 2010: Environ-
mental factors in the upscale growths and longevity of MCSs
derived from Rapid Update Cycle analyses. Mon. Wea. Rev.,
138, 3514–3539, doi:10.1175/2010MWR3233.1.
——, J. Correia, P. T. Marsh, and F. Kong, 2013: Verification of
convection-allowing WRF Model forecasts of the planetary
boundary layer using sounding observations. Wea. Fore-
casting, 28, 842–862, doi:10.1175/WAF-D-12-00103.1.
Crowe, B. A., and D. W. Miller, 1999: The benefits of using
NEXRAD vertically integrated liquid water as an aviation
weather product. Preprints, Eighth Conf. on Aviation, Range,
and Aerospace Meteorology,Dallas, TX, Amer. Meteor. Soc.,
168–171.
Davis, C. A., B. G. Brown, and R. G. Bullock, 2006: Object-based
verification of precipitation forecasts. Part II: Application to
convective rain systems. Mon. Wea. Rev., 134, 1785–1795,
doi:10.1175/MWR3146.1.
Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm identifi-
cation, tracking, analysis and nowcasting—A radar-based
methodology. J. Atmos. Oceanic Technol., 10, 785–797,
doi:10.1175/1520-0426(1993)010,0785:TTITAA.2.0.CO;2.
Ebert, E. E., and J. L. McBride, 2000: Verification of precipitation
in weather systems: Determination of systematic errors.
J. Hydrol., 239, 179–202, doi:10.1016/S0022-1694(00)00343-7.
——, and W. A. Gallus Jr., 2009: Toward better understanding of
the contiguous rain area (CRA) method for spatial forecast
verification. Wea. Forecasting, 24, 1401–1415, doi:10.1175/
2009WAF2222252.1.
Evans, J. E., and E. R. Ducot, 2006: Corridor Integrated Weather
System. MIT Lincoln Lab. J., 16, 59–80.
912 WEATHER AND FORECAST ING VOLUME 30
Gallus, W. A., Jr., 2010: Application of object-based verification
techniques to ensemble precipitation forecasts. Wea. Fore-
casting, 25, 144–158, doi:10.1175/2009WAF2222274.1.
Geerts, B., 1998: Mesoscale convective systems in the southeast
United States during 1994–95: A survey. Wea. Forecasting,
13, 860–869, doi:10.1175/1520-0434(1998)013,0860:
MCSITS.2.0.CO;2.
Greene, D. R., and R. A. Clark, 1972: Vertically integrated liquid
water—A new analysis tool. Mon. Wea. Rev., 100, 548–552,
doi:10.1175/1520-0493(1972)100,0548:VILWNA.2.3.CO;2.
Hallowell, R. G., and Coauthors, 1999: The Terminal Convective
Weather Forecast Demonstration. Preprints, Eighth Conf. on
Aviation, Range, and Aerospace Meteorology, Dallas, TX,
Amer. Meteor. Soc., 200–204.
Hoerling,M., and Coauthors, 2013: An interpretation of the origins
of the 2012 Central Great Plains drought. NOAA Drought
Task Force Assessment Rep., 50 pp. [Available online at
ftp://ftp.oar.noaa.gov/CPO/pdf/mapp/reports/2012-Drought-
Interpretation-final.web-041113.pdf.]
Hogan, R. J., E. J. O’Connor, and A. J. Illingworth, 2009: Verifi-
cation of cloud fraction forecasts.Quart. J. Roy. Meteor. Soc.,
135, 1494–1511, doi:10.1002/qj.481.
Houze, R. A., Jr., 1993: Cloud Dynamics.Academic Press, 573 pp.
Jirak, I. L., W. R. Cotton, and R. L. McAnelly, 2003: Satellite and
radar survey of mesoscale convective system development. Mon.
Wea. Rev., 131, 2428–2449, doi:10.1175/1520-0493(2003)131,2428:
SARSOM.2.0.CO;2.
Johnson, A., and X. Wang, 2013: Object-based evaluation of a
storm-scale ensemble during the 2009 NOAA Hazardous
Weather Testbed Spring Experiment. Mon. Wea. Rev., 141,1079–1098, doi:10.1175/MWR-D-12-00140.1.
——,——, F. Kong, andM. Xue, 2013: Object-based evaluation of
the impact of horizontal grid spacing on convection-allowing
forecasts. Mon. Wea. Rev., 141, 3413–3425, doi:10.1175/
MWR-D-13-00027.1.
Jolliffe, I. T., 2008: The impenetrable hedge: A note on propriety,
equitability and consistency. Meteor. Appl., 15, 25–29,
doi:10.1002/met.60.
Kitzmiller, D. H., W. E. McGovern, and R. F. Saffle, 1995: The
WSR-88D severe weather potential algorithm. Wea. Fore-
casting, 10, 141–159, doi:10.1175/1520-0434(1995)010,0141:
TWSWPA.2.0.CO;2.
Koch, S. E., B. S. Ferrier, J. S. Kain, M. T. Stoelinga, E. J. Szoke,
and S. J. Weiss, 2005: The use of simulated radar reflectivity
fields in the diagnosis of mesoscale phenomena from high-
resolution WRF Model forecasts. Preprints, 11th Conf. on
Mesoscale Processes/32nd Conf. on Radar Meteorology, Al-
buquerque, NM, Amer. Meteor. Soc., J4J.7. [Available online
at https://ams.confex.com/ams/pdfpapers/97032.pdf.]
Krozel, J., J. S. B. Mitchell, V. Polishchuk, and J. Prete, 2007:
Maximum flow rates for capacity estimation in level flight with
convective weather constraints.Air Traffic Control Quart., 15,209–238.
Lack, S., G. J. Limpert, and N. I. Fox, 2010: An object-oriented
multiscale verification scheme. Wea. Forecasting, 25, 79–92,
doi:10.1175/2009WAF2222245.1.
Parker, M. D., and R. H. Johnson, 2000: Organizational modes of
midlatitude mesoscale convective systems. Mon. Wea. Rev.,
128, 3413–3436, doi:10.1175/1520-0493(2001)129,3413:
OMOMMC.2.0.CO;2.
——, and D. Ahijevych, 2007: Convective episodes in the east-
central United States. Mon. Wea. Rev., 135, 3707–3727,
doi:10.1175/2007MWR2098.1.
Pinto, J. O., C. Phillips, M. Steiner, R. Rasmussen, N. Oien,
M. Dixon, W. Wang, and M. Weisman, 2007: Assessment of
the statistical characteristics of thunderstorms simulated with
the WRF Model using convection permitting resolution. 33rd
Int. Conf. on Radar Meteorology, Cairns, QLD, Australia, 5.5.
[Available online at https://ams.confex.com/ams/pdfpapers/
123712.pdf.]
——, W. Dupree, S. Weygandt, M. Wolfson, S. Benjamin, and
M. Steiner, 2010: Advances in the Consolidated Storm Pre-
diction for Aviation (CoSPA). Preprints, 14th Conf. on Avi-
ation, Range, and Aerospace Meteorology, Atlanta, GA,
Amer. Meteor. Soc., J11.2. [Available online at https://ams.
confex.com/ams/pdfpapers/163811.pdf.]
——, J. A. Grim, D. Ahijevych, and M. Steiner, 2013: An auto-
mated system for detecting large-scale convective storms:
Application to model evaluation. Proc. 16th Conf. on Avia-
tion, Range, and Aerospace Meteorology, Austin, TX, Amer.
Meteor. Soc., 9.4A. [Available online at https://ams.confex.
com/ams/93Annual/webprogram/Paper222079.html].
Schwartz, C. S., and Coauthors, 2009: Next-day convection-
allowing WRF Model guidance: A second look at 2-km ver-
sus 4-km grid spacing. Mon. Wea. Rev., 137, 3351–3372,
doi:10.1175/2009MWR2924.1.
Skamarock, W. C., and J. B. Klemp, 2008: A time-split non-
hydrostatic atmosphere model for weather research and
forecasting applications. J. Comput. Phys., 227, 3465–3485,doi:10.1016/j.jcp.2007.01.037.
Smalley, D. J., and B. J. Bennett, 2002: Using ORPG to enhance
NEXRAD products to support FAA critical systems. Pre-
prints, 10th Conf. on Aviation, Range, and Aerospace Meteo-
rology, Portland, OR, Amer. Meteor. Soc., 3.6. [Available
online at https://ams.confex.com/ams/pdfpapers/38861.pdf.]
Steiner, M., R. Bateman, D. Megenhardt, Y. Liu, M. Xu,
M. Pocernich, and J. Krozel, 2010: Translation of ensemble
weather forecasts into probabilistic air traffic capacity impact.
Air Traffic Control Quart., 18, 229–254.
Stensrud, D. J., and Coauthors, 2009: Convective-scale warn-on-
forecast: A vision for 2020.Bull. Amer. Meteor. Soc., 90, 1487–
1499, doi:10.1175/2009BAMS2795.1.
——, and Coauthors, 2013: Progress and challenges with warn-
on-forecast. Atmos. Environ., 123, 2–16, doi:10.1016/
j.atmosres.2012.04.004.
Stratman,D.R.,M.C.Coniglio, S. E.Koch, andM.Xue, 2013:Use of
multiple verification methods to evaluation forecasts of con-
vection from hot- and cold-start convection allowing models.
Wea. Forecasting, 28, 119–138, doi:10.1175/WAF-D-12-00022.1.
Thompson,G., P. R. Field, R.M. Rasmussen, andW.D.Hall, 2008:
Explicit forecasts of winter precipitation using an improved
bulk microphysics scheme: Part II: Implementation of a new
snow parameterization. Mon. Wea. Rev., 136, 5095–5115,
doi:10.1175/2008MWR2387.1.
Weisman, M. L., W. C. Skamarock, and J. B. Klemp, 1997:
The resolution dependence of explicitly modeled convective
systems. Mon. Wea. Rev., 125, 527–548, doi:10.1175/
1520-0493(1997)125,0527:TRDOEM.2.0.CO;2.
Weygandt, S., and Coauthors, 2011: The Rapid Refresh—
Replacement for the RUC, pre-implementation development
and evaluation. Proc. 24th Conf. on Weather and Forecasting/
20th Conf. on Numerical Weather Prediction, Seattle, WA,
Amer. Meteor. Soc., 12B.1. [Available online at https://ams.
confex.com/ams/91Annual/webprogram/Paper183027.html.]
Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences.
2nd ed. Academic Press, 648 pp.
AUGUST 2015 P I NTO ET AL . 913