the use of a multi-model ensemble in local operational ozone forecasting

The use of a multi-model ensemble in local operational ozone forecasting

William F. RyanJeremy Geiger

Department of MeteorologyThe Pennsylvania State University

3rd International Workshop on Air Quality Research, Washington, DC, November, 2011

Brief Background on Operational O3 Forecasting in the United States

• Forecasts are typically issued on the metropolitan scale by local (non-NOAA) forecasters, and are verified against the domain-wide peak 8-hour O3 at monitors operated by state or local AQ agencies.– In PHL, roughly 10-12 monitors.

• Key forecast threshold is an 8-hour average concentration ≥ 76 ppbv (Code Orange).

• In order to activate “Air Quality Action Day” plans, forecasts are issued ~ 1800 UTC on the previous day – an 18-30 hour forecast.

Standard Forecast Techniques

• Outside the margins of the season, climatology is not useful but persistence (one day lag) is a skillful forecast.– Persistence “explains” ~ 36% of variance in peak 8-h O3 in PHL.

• Persistence can be coupled with forecast back trajectories to assess regional transport (expected residual layer concentrations).

• Simple statistical models using meteorological and persistence predictors were useful but have become less so in the changing emissions environment post-2002.

• Conceptual models are useful at the regional scale, and for longer range forecasts, as multi-day, severe high O3 events in the mid-Atlantic typically follow certain “classic” synoptic scale patterns.

• However, daily metropolitan scale peak O3 concentrations vary on the meso-scale. These effects that may be successfully simulated by the latest generation of coupled chemistry-meteorological models.

• Post-processed model guidance is underway in Canada but has not been implemented in the US.– Though bias correction methods are available, see AQMOS later.

Hypothesis

• Several skillful numerical O3 forecast models are now reliably available in a timely manner for use in preparing operational O3 forecasts.– These models vary in terms of their

meteorological model drivers, emissions data bases and chemical mechanisms.

• An “ensemble” of these models will provide better forecasts at the critical forecast threshold (Code Orange) than a single model.

Useful Reading

• Djalalova, I., et al, 2010: Ensemble and bias-correction techniques for air quality model forecasts of surface O3 and PM2.5 during the TEXAQS-II experiment of 2006, Atmos. Environ., 44, 455-467.

• McKeen, S., et al, 2005: Assessment of an ensemble of seven real-time ozone forecasts over eastern North America during the summer of 2004, J. Geophys. Res., 110, doi:10.1029/2005JD005858.

Philadelphia Metropolitan Forecast Area:Test Case for Ensemble Hypothesis

Philadelphia is a large metropolitanarea embedded in the “I-95 Corridor”.It has large local emissions and is alsosubject to transported emissions fromother large cities as well as regional NOx from large power generation sources in the Ohio River Valley to the west

I-95 CorridorOhio River Valley

NOAA Operational Forecast Model Results - PhiladelphiaPeak 8-Hour Domain-Wide O3 (May-September, 2007-2010)

0 20 40 60 80 100 120 140

NAQC Forecast 8-h Max (ppbv)

0

20

40

60

80

100

120

140

Ob

serv

ed

8-h

Ma

x (p

pb

v)

Bias: +3.3 ppbv

Median Absolute Error:7.0 ppbv

Mean Absolute Error:8.0 ppbv

Best Linear Fit: [O3]OBS = 4.7 + 0.875*[O3]FC

Increased spread in the higher end of the distribution.

For 76 ppbv Threshold:Hit Rate: 0.77False Alarm: 0.37Threat Score: 0.58

NOAA Operational Model Sunday False Alarms

Sun Mon Tues Wed Thurs Fri Sat0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

False Alarm Hit CSI(Threat)

Skill

Mea

sure

NOAA Model Seasonal Drift

1-May

8-May

15-May

22-May

29-May

5-Jun

12-Jun

19-Jun

26-Jun

3-Jul

10-Jul

17-Jul

24-Jul

31-Jul

7-Aug

14-Aug

21-Aug

28-Aug

4-Sep

-40

-30

-20

-10

0

10

20

30

40

AverageBias07Bias08Bias09Bias10

Fore

cast

Bia

s (p

pbv)

Forecast Models Used in Ensemble

• NAQC (NOAA) – Queried at Monitor Locations– 1200 UTC Run valid following day (24-36 h forecast)– http://www.emc.ncep.noaa.gov/mmb/aq/

• ZIP/NAQC – NOAA Model Queried at all Domain Land Areas– Data extracted from AQMOS (Sonoma Tech)– http://aqmos.sonomatech.com/login.cfm

• AQMOS – NOAA Model with Seasonal Bias Correction – http://aqmos.sonomatech.com/faq/

• Barons Meteorological Services – MAQSIP RT– 0600 UTC Run– http://www.baronams.com/products/

• SUNY-Albany– 1200 UTC Run, “NYSDEC_3x12z”, CMAQ 4.7.1– http://asrc.albany.edu/research/aqf/aqvis/tomorrowforecast_maps.htm

http://www.emc.ncep.noaa.gov/mmb/aq/

http://aqmos.sonomatech.com/login.cfm

http://aqmos.sonomatech.com/faq/

http://www.baronams.com/products/

http://asrc.albany.edu/research/aqf/aqvis/tomorrowforecast_maps.htm

Comment on Determination of Peak Model 8-Hour O3

Numerical forecast models tend to develop strong sea-land O3 gradients.

At left: See high O3

in embayments andalong NJ coast.

If use entire area to verify model, rather thanforecasts at monitor locations, the number of false alarms of high O3 more than double and bias increases by 7 ppbv (2011 results, NOAA Operational Model).

Ensembles Tested in 2011

• ENS1: NAQC, SUNY, BARONS,ZIP • ENS2: NAQC, SUNY, BARONS, ZIP, AQMOS• ENS3: NAQC, SUNY, BARONS, AQMOS• ENS4: NAQC, ZIP• ENS5: NAQC, ZIP, AQMOS• ENS6: ZIP, AQMOS• ENS7: NAQC, NAQC Experimental• ENS8: NAQC, SUNY, AQMOS• ENS9: NAQC, SUNY, ZIP, AQMOS• ENS10: NAQC, SUNY, Barons• ENS11: SUNY, Barons

NAQC Forecast Results for 2011 (PHL)

• Bias: + 1.7 ppbv• r = 0.75• Best Fit:

– [O3]OBS = 18.3 + 0.68*[O3]NAQC

• For Code Orange Threshold:– Hit Rate: 0.74– False Alarm: 0.42– Threat: 0.48

20 40 60 80 100 120

NAQC Forecast 8-h Ozone (ppbv)

20

40

60

80

100

120

Ob

serv

ed

8-h

Ozo

ne

(p

pb

v)

Hit and False Alarm Rates for Individual ModelsThreshold: 76 ppbv 8-h Average (Code Orange)

NAQC SUNY BARONS ZIP AQMOS PERS0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

Hit False Alarm

Selected Skill Scores for Individual Models

NAQC SUNY BARONS ZIP AQMOS PERS0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

HeidkeCSI(Threat)

Heidke score measures skill relative to random forecast with range [-1,+1].Threat score removes influence of “correct null” forecasts, useful for rare events

Hit and False Alarm Rates for Ensembles

ENS1 ENS2 ENS3 ENS4 ENS5 ENS6 ENS7 ENS8 ENS9 ENS10 ENS110.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

False Alarm Hit

Skill

Mea

sure

ENS2: All models; ENS9: All models except Barons; ENS11: SUNY, Barons only.Dashed lines are stand alone NAQC model results.

Selected Skill Scores for Ensembles

ENS1 ENS2 ENS3 ENS4 ENS5 ENS6 ENS7 ENS8 ENS9 ENS10 ENS110.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Heidke Threat

Skill

Mea

sure


Skill Measures for Selected Ensembles,NAQC Model and Persistence

False Alarm Hit Heidke CSI(Threat) PSS0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

ens2 ens9 ens11 NAQC PERS

Skill

Mea

sure


Summary: Overall Ensemble Results

• Two groups performed best– Ensembles 1-3: NAQC, SUNY and Barons models with

various combinations of NAQC-based forecasts (AQMOS, ZIP)

– Ensembles 8-10: NAQC and SUNY, with various combinations of the other models.

– Ensemble 11: SUNY and Barons only.• Do the ensembles address other shortcomings of the

NAQC model?– Seasonal drift in bias– Weekday/Weekend effects

Seasonal Drift: NAQC Skill Scores for Early and Late Summer for 2011 (left) and 2007-2010 (right)

False Alarm

Hit Heidke Threat0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

< July 7 >= July 7

Skill

Mea

sure

False Alarm

Hit Heidke Threat0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

< July 7 > July 7

Skill

Mea

sure

2011 2007-2010

Seasonal Drift: Ensemble Skill

False Alarm Hit Heidke CSI(Threat)0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

NAQC ENS2 ENS9 ENS11

Skill

Mea

sure


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0


Skill

Mea

sure

Early Summer (< July 7) Late Summer (≥ July 7)

Forecast Skill by Day of Week


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0


Skill

Mea

sure

Sunday-Monday


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0


Skill

Mea

sure

Tuesday-Saturday

Conclusions

• A variety of ensemble combinations can improve on the performance of the NAQC model.

• The seasonal drift problem with the NAQC can be partially corrected using non-NAQC models.

• The Sunday-Monday false alarm problem can also be improved with non-NAQC members.

Acknowledgements

• Funding for air quality forecasting in the Philadelphia metropolitan area is supported by the Delaware Valley Regional Planning Commission (Sean Greene), the Commonwealth of Pennsylvania Department of the Environmental Protection and the State of Delaware.

• Thanks to Dianne Miller (Sonoma Tech), John McHenry (Barons), as well as SUNY-Albany and the New York Department of Environmental Conservation for access to their model forecasts.

Supporting Slides

Skill Score Measures

• Heidke Skill Score– Range: [-1,+1]– Compares the proportion of correct forecasts to a no skill

random forecast that is constrained in that marginal total (hit, miss, false alarm) are equal to observed.

• Threat Score– Also known as Critical Success Index (CSI) or Gilbert Skill

Score– Range: [0,1]– Excludes the “null” forecast, so that is a measure of hits

divided by sum of hits, misses and false alarms.

Example of Use of Back Trajectories Coupled with Observations (August 10, 2011)

24-Hour Forward Trajectories at 1000 m AGLNOAA ARL HYSPLIT Trajectory Modelhttp://ready.arl.noaa.gov/HYSPLIT.phpSuperimposed on AirNow Tech Hourly O3

observations:http://www.airnowtech.org/

http://ready.arl.noaa.gov/HYSPLIT.php

http://www.airnowtech.org/

Weather Summary for Summer 2011

• In Philadelphia, and the mid-Atlantic as a region, 2011 was a warmer than average summer.– 30 days ≥ 90⁰ F compared to average of 23 (1997-2010)

• Warmest temperatures occurred from May 30-August 3 – the heart of the O3 season – with very wet conditions in August and September.

• In PHL, 19 days were Code Orange or Red compared to an average of 24 (2004-2010).– Prior to regional NOx controls, average of 40 Code Orange

or Red days (1997-2003).

Weather Summary for Philadelphia (Summer, 2011):Higher than Average Frequency of Hot Weather

1997 1999 2001 2003 2005 2007 2009 20110

10

20

30

40

50

60

T > 90 JJAOr JJA

Dotted orange line is mean number of Code Orange days in two periods:1997-2003 and 2004-2010, reflecting “NOx SIP Rule” emissions changes in 2002-2003

Daily Peak and 7-Day Running Average for PHL

20-May

24-May

28-May

1-Jun

5-Jun

9-Jun13-Ju

n17-Ju

n21-Ju

n25-Ju

n29-Ju

n3-Ju

l7-Ju

l11-Ju

l15-Ju

l19-Ju

l23-Ju

l27-Ju

l31-Ju

l4-A

ug8-A

ug

12-Aug

16-Aug

20-Aug

24-Aug

0

20

40

60

80

100

120

PHL7DayRA

Warmest weather was concentrated during the climatological peak of the O3 season

1-May5-M

ay9-M

ay

13-May

17-May

21-May

25-May

29-May

2-Jun6-Ju

n

10-Jun

14-Jun

18-Jun

22-Jun

26-Jun

30-Jun

4-Jul8-Ju

l

12-Jul

16-Jul

20-Jul

24-Jul

28-Jul

1-Aug

5-Aug

9-Aug

13-Aug

17-Aug

21-Aug

25-Aug

29-Aug

-10

-5

0

5

10

15

20

0

20

40

60

80

100

120

Delta Tmax

Dep

artu

re fr

om D

aily

Ave

rage

Tem

pera

ture

(d

eg F

)

Max

imum

Sur

face

Tem

pera

ture

(deg

F)

Blue bars are departure from average temperature at PHL, red lineis daily maximum surface temperature (deg F).

Ozone conducive weather comes to an end with very wet conditions in August

1-May

6-May

11-May

16-May

21-May

26-May

31-May

5-Jun

10-Jun

15-Jun

20-Jun

25-Jun

30-Jun

5-Jul

10-Jul

15-Jul

20-Jul

25-Jul

30-Jul

4-Aug

9-Aug

14-Aug

19-Aug

24-Aug

29-Aug

0

0.5

1

1.5

2

2.5

3Rainfall at PHL (May-August, 2011)

Dai

ly R

ainf

all (

inch

es)

August 15 (4.84”) and August 29 (4.55”) are off scale, wettest Auguston record at PHL with 19.21”

Seasonal Drift in Model BiasNAQC for PHL (2007-2011)

1-May

7-May

13-May

19-May

25-May

31-May

6-Jun

12-Jun

18-Jun

24-Jun

30-Jun

6-Jul

12-Jul

18-Jul

24-Jul

30-Jul

5-Aug

11-Aug

17-Aug

23-Aug

29-Aug

4-Sep

-40

-30

-20

-10

0

10

20

30

40

50

Bias07Bias08Bias09Bias10Bias11Average

NA

QC

Fore

cast

Mod

el B

ias

(ppb

v)

Seasonal Drift in Model BiasNAQC for PHL, 2011

21-May

26-May

31-May

5-Jun

10-Jun

15-Jun

20-Jun

25-Jun

30-Jun

5-Jul

10-Jul

15-Jul

20-Jul

25-Jul

30-Jul

4-Aug

9-Aug

14-Aug

19-Aug

24-Aug

-30

-20

-10

0

10

20

30

OPERCbias

Ope

ratio

nal M

odel

For

ecas

t Bia

s (p

pbv)

1-May 21-May 10-Jun 30-Jun 20-Jul 9-Aug 29-Aug

-6

-4

-2

0

2

4

6

cbias07cbias08cbias09cbias10cbias11

Cum

ulati

ve B

ias

(ppb

v)

23-May

26-May

29-May

1-Jun

4-Jun

7-Jun

10-Jun

13-Jun

16-Jun

19-Jun

22-Jun

25-Jun

28-Jun

1-Jul

4-Jul

7-Jul

10-Jul

13-Jul

16-Jul

19-Jul

22-Jul

25-Jul

28-Jul

31-Jul

3-Aug

6-Aug

9-Aug

12-Aug

15-Aug

18-Aug

21-Aug

24-Aug

-10.0

-8.0

-6.0

-4.0

-2.0

0.0

2.0

4.0

6.0

8.0

SUNYBaronsAQMOS

24-May

27-May

30-May

2-Jun

5-Jun

8-Jun

11-Jun

14-Jun

17-Jun

20-Jun

23-Jun

26-Jun

29-Jun

2-Jul

5-Jul

8-Jul

11-Jul

14-Jul

17-Jul

20-Jul

23-Jul

26-Jul

29-Jul

1-Aug

4-Aug

7-Aug

10-Aug

13-Aug

16-Aug

19-Aug

22-Aug

25-Aug

-40

-30

-20

-10

0

10

20

30

-7

-6

-5

-4

-3

-2

-1

0

AQMOS

BiasCbias

the use of a multi-model ensemble in local operational ozone forecasting

Documents

hour forecast

hour o3

operational o3 forecasts

skillful forecast

operational o3 forecasting

multimodel ensemble

hour domainwide o3

key forecast threshold