abstract

1
Statistical Methods for Model Evaluation – Moving Beyond the Comparison of Matched Observations and Output for Model Grid Cells Kristen M. Foley 1 , Jenise Swall 1 1 Atmospheric Sciences Modeling Division, Air Resources Laboratory, NOAA, Research Triangle Park, NC, USA On Assignment to the National Exposure Research Laboratory, USEPA Abstract Standard evaluations of air quality models rely heavily on a direct comparison of monitoring data matched with the model output for the grid cell containing the monitor's location (e.g. Eder and Yu, 2006). While such techniques may be adequate for many applications, conclusions are limited by such factors as the sparseness of the available observations (limiting the number of grid cells at which the model can be evaluated), potential measurement error in the modeled observations, and the incommensurability between volume- averages and point-referenced observations. While we focus most closely on the latter problem, we find that it cannot be addressed without some discussion of the others. Simulated datasets are used to demonstrate cases in which incommensurability is more likely to adversely affect a traditional analysis. Analysis of observed and modeled ozone data is used to further compare standard evaluation methods and more complex statistical modeling in an operational setting. Comparing Kriging Techniques for Use in Model Assessment Example with Daily Maximum 8-hour Average Ozone Simulated “Perfect-World” Example with Weak vs Strong Spatial Correlation Discussion Analysis such as block kriging provides one approach to address the difference in variability between a point measurement and a volume-average prediction by using a statistical model to characterize sub-grid variability based on observed values. More sophisticated statistical modeling, e.g. Swall and Davis, 2006, provides additional information, which is not available in matched model to observation type comparisons. Spatial analysis of model errors is used to determine regions where model output is significantly different from observation-based estimates. These areas may be used for diagnostic evaluation to identify the source of consistent model errors. The added benefit of this extra layer of analysis will depend on the goals of a particular model evaluation. References Eder, B., and S. Yu, 2006: A performance evaluation of the 2004 release of Models-3 CMAQ. Atmos. Envir., 40, 4894-4905. Gelfand, A. E., Zhu, L., and B. P. Carlin, 2001: On the change of support problem for spatio-temporal data, Biostatistics, 2, 31-45. Goovaerts, P. 1997: Geostatistics for Natural Resources Evaluation. Oxford University Press, 483 pp. Swall, J. and J. Davis, 2006: A Bayesian Statistical Approach for Evaluation of CMAQ, Atmos. Envir., 40, 4883-4893. Fig. 1. Simulated data with short- range spatial correlation structure. Fig. 2. Grid cell averages based on the simulated data in Fig. 1. Each grid cell is a square with a side length of 12 units. Fig. 4. Simulated data with long-range spatial correlation structure. Fig. 5. Grid cell averages based on the simulated data portrayed in Fig.4. Fig. 7. Observations simulated with measurement error at the locations shown in Fig. 4 (long-range spatial correlation) Fig. 15. Estimates yielded by block kriging the observations in Fig 7. Disclaimer: The research presented here was performed under the Memorandum of Understanding between the U.S. Environmental Protection Agency (EPA) and the U.S. Department of Commerce's National Oceanic and Atmospheric Administration (NOAA) and under agreement number DW13921548. This work constitutes a contribution to the NOAA Air Quality Program. Although it has been reviewed by EPA and NOAA and approved for publication, it does not necessarily reflect their policies or views. 0 50 100 150 0 50 100 150 40 50 60 70 80 0 50 100 150 0 50 100 150 40 50 60 70 80 40 45 50 55 60 65 70 40 45 50 55 60 65 70 G rid cellaverages O bservations r:0.84 NM B :-1.12% NM E :6.61% R M SE:5.02 0 50 100 150 0 50 100 150 50 55 60 65 70 0 50 100 150 0 50 100 150 50 55 60 65 70 0 50 100 150 0 50 100 150 50 55 60 65 70 0 50 100 150 0 50 100 150 50 55 60 65 70 Fig. 3. Observations vs. grid cell averages, based on the data shown in Fig. 1 and Fig. 2. Incommensurability and the impact of spatial correlation: Statistical reasoning (Gelfand et al. 2001) indicates that estimates of variability and error are generally sensitive to the incommensurability issue, with estimates of averages less variable than estimates for individual points. However, the discrepancy between point values and spatially averaged values lessens when there is a high degree of spatial correlation. 50 55 60 65 70 50 55 60 65 70 Grid cellaverages Observations r:0.99 NM B:-0.41% N M E :1.15% R M SE:0.9 Fig. 6. Observations vs. grid cell averages, based on the data shown in Fig. 4 and Fig. 5. Considering other sources of error: Here the observations are simulated with measurement error. In this case, most of the variability around the red 1:1 line comes from the observational errors rather than from the in- commensurability issue. Operational evaluation: Using scatterplots such as Fig. 3 and Fig. 6, only grid cells which contain observations can be evaluated. Block kriging the observations allows for a spatial evaluation of model output across the entire domain, while accounting for the incommensurability issue. Confidence intervals can be used to identify regions that are significantly different from the observation-based estimates. However, as with any statistical approach, we must take into account errors related to statistical model formulation, imperfect assumptions and estimation. 0 50 100 150 0 50 100 150 50 55 60 65 70 0 50 100 150 0 50 100 150 0 2 4 6 8 0 50 100 150 0 50 100 150 0 2 4 6 8 45 50 55 60 65 70 75 45 50 55 60 65 70 75 G rid cellaverages Block kriging results 45 50 55 60 65 70 75 45 50 55 60 65 70 75 G rid cellaverages R esults ofkriging to cellctrs. 50 55 60 65 70 50 55 60 65 70 G rid cellaverages O bservations /block kriging results Observed (pointwise)values,r= 0.9 Block kriging results,r= 0.94 0 10 20 30 40 50 60 Kriging to cellctrs. Block kriging 50 55 60 65 70 50 55 60 65 70 Grid cellaverages Observations r:0.9 NM B :-1.44% NM E :3.65% R M SE:2.64 50 55 60 65 70 O bser- vations C ellavgs. K riging to cell ctrs. Block kriging Block kriging: The block kriging technique (e.g., Goovaerts 1997) allows us to adjust for incommensurability by estimating the spatial field at a lattice of points and then averaging these points within each of the grid cells. The error estimates and confidence intervals (C.I.) from block kriging are better calibrated than the results from kriging to the grid cell centers. More accurate error estimates are important for model evaluation in order to better differentiate between statistical estimation error and model error. Fig. 8. Observations vs. grid cell averages, based on the data shown in Fig. 7 and Fig. 5. Fig. 13. Standard errors yielded by point kriging. Fig. 12. Estimates yielded by kriging the observations in Fig. 7 to the grid cell centers. Fig. 14. Sample of results from kriging to the grid cell centers. Fig. 16. Standard errors yielded by block kriging. Fig. 17. Sample of results from block kriging. CMAQ output: v4.6 at 12km output for max 8hr average Ozone Regions of significant differences: Model (CMAQ) output – Observation-based estimates using 95% C.I. from block kriging analysis Matched differences: Model (CMAQ) output – Observations (AIRS, CASTNET sites) 0 20 40 60 80 -60 -40 -20 0 20 40 60 -6 -4 -2 0 20 40 60 0 20 40 60 80 -60 -40 -20 0 20 40 60 -6 -4 -2 0 20 40 60 0 20 40 60 80 -60 -40 -20 0 20 40 60 -6 -4 -2 0 20 40 60 July 22, 2004 July 23, 2004 July 24, 2004

Upload: wirt

Post on 06-Jan-2016

20 views

Category:

Documents


2 download

DESCRIPTION

Statistical Methods for Model Evaluation – Moving Beyond the Comparison of Matched Observations and Output for Model Grid Cells Kristen M. Foley 1 , Jenise Swall 1 1 Atmospheric Sciences Modeling Division, Air Resources Laboratory, NOAA, Research Triangle Park, NC, USA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Abstract

Statistical Methods for Model Evaluation – Moving Beyond the Comparison of Matched Observations and Output for Model Grid Cells

Kristen M. Foley1, Jenise Swall11Atmospheric Sciences Modeling Division, Air Resources Laboratory, NOAA, Research Triangle Park, NC, USA

On Assignment to the National Exposure Research Laboratory, USEPA

AbstractStandard evaluations of air quality models rely heavily on a direct comparison of monitoring data matched with the model output for the grid cell containing the monitor's location (e.g. Eder and Yu, 2006). While such techniques may be adequate for many applications, conclusions are limited by such factors as the sparseness of the available observations (limiting the number of grid cells at which the model can be evaluated), potential measurement error in the modeled observations, and the incommensurability between volume-averages and point-referenced observations. While we focus most closely on the latter problem, we find that it cannot be addressed without some discussion of the others. Simulated datasets are used to demonstrate cases in which incommensurability is more likely to adversely affect a traditional analysis. Analysis of observed and modeled ozone data is used to further compare standard evaluation methods and more complex statistical modeling in an operational setting.

Comparing Kriging Techniques for Use in Model Assessment

Example with Daily Maximum 8-hour Average Ozone

Simulated “Perfect-World” Example with Weak vs Strong Spatial Correlation

DiscussionAnalysis such as block kriging provides one approach to address the difference in variability between a point measurement and a volume-average prediction by using a statistical model to characterize sub-grid variability based on observed values. More sophisticated statistical modeling, e.g. Swall and Davis, 2006, provides additional information, which is not available in matched model to observation type comparisons. Spatial analysis of model errors is used to determine regions where model output is significantly different from observation-based estimates. These areas may be used for diagnostic evaluation to identify the source of consistent model errors. The added benefit of this extra layer of analysis will depend on the goals of a particular model evaluation.

ReferencesEder, B., and S. Yu, 2006: A performance evaluation of the 2004 release of Models-3 CMAQ. Atmos. Envir., 40, 4894-4905.Gelfand, A. E., Zhu, L., and B. P. Carlin, 2001: On the change of support problem for spatio-temporal data, Biostatistics, 2, 31-45.Goovaerts, P. 1997: Geostatistics for Natural Resources Evaluation. Oxford University Press, 483 pp.Swall, J. and J. Davis, 2006: A Bayesian Statistical Approach for Evaluation of CMAQ, Atmos. Envir., 40, 4883-4893.

Fig. 1. Simulated data with short-range spatial correlation structure.

Fig. 2. Grid cell averages based on the simulated data in Fig. 1. Each grid cell is a square with a side length of 12 units.

Fig. 4. Simulated data with long-range spatial correlation structure.

Fig. 5. Grid cell averages based on the simulated data portrayed in Fig.4.

Fig. 7. Observations simulated with measurement error at thelocations shown in Fig. 4 (long-range spatial correlation)

Fig. 15. Estimates yielded by block kriging the observations in Fig 7.

Disclaimer: The research presented here was performed under the Memorandum of Understanding between the U.S. Environmental Protection Agency (EPA) and the U.S. Department of Commerce's National Oceanic and Atmospheric Administration (NOAA) and under agreement number DW13921548. This work constitutes a contribution to the NOAA Air Quality Program. Although it has been reviewed by EPA and NOAA and approved for publication, it does not necessarily reflect their policies or views.

0 50 100 150

05

01

00

15

0

40

50

60

70

80

0 50 100 150

05

01

00

15

0

40

50

60

70

80

40 45 50 55 60 65 70

40

45

50

55

60

65

70

Grid cell averages

Ob

serv

atio

ns

r: 0.84NMB: -1.12%NME: 6.61%RMSE: 5.02

0 50 100 150

05

01

00

15

0

50

55

60

65

70

0 50 100 150

05

01

00

15

0

50

55

60

65

70

0 50 100 150

05

01

00

15

0

50

55

60

65

70

0 50 100 150

05

01

00

15

0

y

50

55

60

65

70

Fig. 3. Observations vs. grid cell averages, based on the data shown in Fig. 1 and Fig. 2.

Incommensurability and the impact of spatial correlation: Statistical reasoning (Gelfand et al. 2001) indicates that estimates of variability and error are generally sensitive to the incommensurability issue, with estimates of averages less variable than estimates for individual points. However, the discrepancy between point values and spatially averaged values lessens when there is a high degree of spatial correlation.

50 55 60 65 70

50

55

60

65

70

Grid cell averages

Ob

serv

atio

ns

r: 0.99NMB: -0.41%NME: 1.15%RMSE: 0.9

Fig. 6. Observations vs. grid cell averages, based on the data shown in Fig. 4 and Fig. 5.

Considering other sources of error:Here the observations are simulated with measurement error. In this case, most of the variability around the red 1:1 line comes from the observational errors rather than from the in-commensurability issue.

Operational evaluation: Using scatterplots such as Fig. 3 and Fig. 6, only grid cells which contain observations can be evaluated. Block kriging the observations allows for a spatial evaluation of model output across the entire domain, while accounting for the incommensurability issue. Confidence intervals can be used to identify regions that are significantly different from the observation-based estimates. However, as with any statistical approach, we must take into account errors related to statistical model formulation, imperfect assumptions and estimation.

0 50 100 150

05

01

00

15

0

y

50

55

60

65

70

0 50 100 150

05

01

00

15

0

y

0

2

4

6

8

0 50 100 150

05

01

00

15

0

y

0

2

4

6

8

45 50 55 60 65 70 75

45

50

55

60

65

70

75

Grid cell averages

Blo

ck k

rig

ing

re

sults

45 50 55 60 65 70 75

45

50

55

60

65

70

75

Grid cell averages

Re

sults

of

krig

ing

to

ce

ll ct

rs.

50 55 60 65 70

50

55

60

65

70

Grid cell averages

Ob

serv

atio

ns

/ b

lock

kri

gin

g r

esu

lts

Observed (pointwise) values, r=0.9Block kriging results, r=0.94

01

02

03

04

05

06

0

Kri

gin

g t

oce

ll ct

rs.

Blo

ckkr

igin

g

50 55 60 65 70

50

55

60

65

70

Grid cell averages

Ob

serv

atio

ns

r: 0.9NMB: -1.44%NME: 3.65%RMSE: 2.64

50

55

60

65

70

Ob

ser-

vatio

ns

Ce

ll a

vgs.

Kri

gin

g t

oce

ll ct

rs.

Blo

ckkr

igin

g

Block kriging: The block kriging technique (e.g., Goovaerts 1997) allows us to adjust for incommensurability by estimating the spatial field at a lattice of points and then averaging these points within each of the grid cells. The error estimates and confidence intervals (C.I.) from block kriging are better calibrated than the results from kriging to the grid cell centers. More accurate error estimates are important for model evaluation in order to better differentiate between statistical estimation error and model error.

Fig. 8. Observations vs. grid cell averages, based on the data shown in Fig. 7 and Fig. 5.

Fig. 13. Standard errors yielded by point kriging.

Fig. 12. Estimates yielded by kriging the observations in Fig. 7 to the grid cell centers.

Fig. 14. Sample of results from kriging to the grid cell centers.

Fig. 16. Standard errors yielded by block kriging.

Fig. 17. Sample of results from block kriging.

CMAQ output: v4.6 at 12km output for max 8hr average Ozone

Regions of significant differences: Model (CMAQ) output – Observation-based estimates using 95% C.I. from block kriging analysis

Matched differences:Model (CMAQ) output – Observations (AIRS, CASTNET sites)July 22, 2004

0

20

40

60

80

-60

-40

-20

0

20

40

60

-60

-40

-20

0

20

40

60

July 23, 2004

0

20

40

60

80

-60

-40

-20

0

20

40

60

-60

-40

-20

0

20

40

60

July 24, 2004

0

20

40

60

80

-60

-40

-20

0

20

40

60

-60

-40

-20

0

20

40

60

July

22,

200

4Ju

ly 2

3, 2

004

July

24,

200

4