det: testing and evaluation plan
DESCRIPTION
DET: Testing and Evaluation Plan. Barbara Brown 1 , Ed Tollerud 2 , and Tara Jensen 1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC. Wally Clark. DTC and DET Testing and Evaluation. T&E is one of the most important activities undertaken by the DTC - PowerPoint PPT PresentationTRANSCRIPT
Barbara Brown1, Ed Tollerud2, and Tara Jensen1
1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC
DET: Testing and Evaluation Plan
Wally Clark
DTC and DET Testing and Evaluation
T&E is one of the most important activities undertaken by the DTCDTC testing has involved WRF core
comparisons, boundary layer schemes, and other aspects of NWP
DTC has created “Reference Configurations” (RCs) that are to be re-tested in conjunction with model changes
DET infrastructure is being developed to allow Testing and evaluation andIntercomparison of ensemble systems and system
components
Major categories of testingForecasting system comparisons
Compare forecasts based on one configuration with forecasts based on a different model configuration
ExamplesTwo types of model initializationTwo or more methods of statistical post-processing
Individual reference configurationModel “setup” is evaluatedSetup is re-evaluated when model changes are implementedReference configurations may be defined by
Operational centersUsers
RCs may also be community-contributedForecasts contributed by a modeling group
Ex: Forecasts evaluated in HWT and HMT projects
DTC Testing and Evaluation Principles A formal test plan is developed,
defining all of the important aspects of the testing and evaluationDeveloper may have a role in helping
to create the test plan Execution of test is independent of
the developer Focus of test depends on the
questions that are of interestModule being usedVariables of interest
Many cases evaluated for statistical significanceNot just a few case studiesMultiple seasons, times of day, etc.
Meaningful stratificationsLocation/regionSeasonOther user-based criteria
Components of a test plan (example)GoalsExperiment design
Codes Specification of the codes will be run as
part of the testModel output
What kinds of output will be produced?Forecast periodsPost-processingVerification
Statistical methods and measuresGraphics generation and displayData archival and dissemination of
resultsComputer resourcesDeliverables
Example from QNSE evaluation (surface T and wind)
Questions to address when developing a test planWhich aspect(s) (or modules)of the ensemble
system will be evaluated?What performance aspects are we trying to
compare? Or evaluate?Who are the “users”?What are the variables of interest?
Answers to these questions will lead to determination of the other aspects of the plan
Considerations for ensemble T&ENumber of cases will likely need to be
increased (over non-ensemble evaluations)Many probabilistic and ensemble verification
scores (e.g., reliability) require relatively large subsamples
Subsamples must be large enough to assess statistical significance
But – Sampling must be focused enough for representativeness
Verification approaches and metrics are somewhat unique
Computer resources may be a limitation
Other considerationsReal-time vs. post-analysis
DTC intensive tests generally done in post-analysis
Real-time demonstrations also have many benefits (e.g., HMT, HWT)
Subjective evaluations – should these be considered for DET T&E?
How much rigorous end-to-end testing required vs. evaluation of individual components?
Example for HMT evaluation – winter 2010