1 – environment canada, canada 2 – university of british columbia, canada

21
The XM Tool Project: The Lessons Learned in Non-Linear Statistical Modeling for Air Quality Forecasting in Canada Andrew Teakles 1 , Sean Perry 1 , Aiming Wu 1 , Qian Li 1 , Stavros Antonopoulos 1 , Ken Lau 1 , Harry Yau 1 , Rita So 1 , Johannes Jenkner 2 IWAQFR – Washington, DC Nov 28 th – Dec 1 st , 2011 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Upload: lucita

Post on 22-Feb-2016

34 views

Category:

Documents


2 download

DESCRIPTION

Andrew Teakles 1 , Sean Perry 1 , Aiming Wu 1 , Qian Li 1 , Stavros Antonopoulos 1 , Ken Lau 1 , Harry Yau 1 , Rita So 1 , Johannes Jenkner 2 IWAQFR – Washington, DC Nov 28 th – Dec 1 st , 2011. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

The XM Tool Project: The Lessons Learned in Non-Linear Statistical Modeling for Air Quality Forecasting in Canada

Andrew Teakles1, Sean Perry1, Aiming Wu1, Qian Li1, Stavros Antonopoulos1, Ken Lau1, Harry Yau1, Rita So1, Johannes Jenkner2 IWAQFR – Washington, DCNov 28th – Dec 1st, 2011

1 – Environment Canada, Canada2 – University of British Columbia, Canada

Page 2: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 2 – 10-10-31

Overview

• XM Tool project and it’s goals• Project History• Optimizing MLR• Use of Antecedent Conditions • Non-Linear Methods• Lessons Learned• Future Work

Page 3: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 3 – 10-10-31

XM Tool Project

Mandate:• Develop and evaluate new non-linear tools for post-processing of air quality forecasts of O3, NO2, and PM2.5

Purpose:• Supports AQHI forecast program

•Key Objectives:– Improved guidance for air quality episodes.– Improve timing of air quality episodes.– Improve overall model forecast skill.

Page 4: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 4 – 10-10-31

Project History

• Project launch - Fall 2009• Data gathering and Infrastructure – 2010• Round 1 and 2 Testing - 2010 – 2011 • XM Tool prototype (pending)

Science Phases:1. Round 1 Trials to narrow prospective non-linear

techniques2. Round 2 to explore the use of antecedent conditions3. Round 3 Introduce adaptive learning algorithms and

test learning rates relative to UMOS-AQ

Page 5: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 5 – 10-10-31

Techniques TestedMethod Tested Status Comments

Modified MLR with BIC predictors

Prototype

Boosted Regression Minor refinements Optimization routine needed

Bayesian Neural Network

Needs Refinements Revise predictorsMatlab code

Support Vector Regression

Needs Refinement

Neural Network with cross-validation

Rejected Data loss

MDA Rejected

CART Rejected

Page 6: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 6 – 10-10-31

List of PredictorsPredictors

84 AQ Model Predictors

3 Persistence Predictors

Persistence Predictors

OBS at Hr 00

27 Antecedent Predictors

Antecedent Predictors

Lag 24/48/72 hrs Max Min

OBS

AQ Model Predictor types• Meteorological Variables• Sine Julian Day• Day of the week• NO2, O3, PM2.5• Spatial Average, Min, Max

6 AQ Sites Tested• Halifax• Montreal• Toronto• Winnipeg• Edmonton• Vancouver

Page 7: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 7 – 10-10-31

MLR approaches

UMOS-AQ XM version

Technique MLR MLR

Predictors AQ model + Persistence AQ model + Persistence and/or Antecedents

PredictorSelection

Forward Stepwise Regression

Leaps Minimum BIC (Ken Lau)

Seasonality 2 season 1 season

QA/QC Min, Max, rate Min, Max, rate

Minimum cases 250 250

Updatable Yes Not yet

Page 8: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 8 – 10-10-31

Flow Diagram for Leaps BIC Selection

All Predictors

Leaps Forward Selection

35 predictors

Data subset with 35 predictors

Leaps Exhaustive Search

Model Size 3 Model Size 4 Model Size 11

MinimumBIC

Optimized Predictors

• Leaps Package in R Statistical programming language• Use combination of Forward and Exhaustive Search

Page 9: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 9 – 10-10-31

MLR for O3:with Persistance

• Higher bias• No updating

Test data from May 30, 2011 to Oct 28th, 2011

Page 10: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 10 – 10-10-31

MLR for O3:with Persistance

• reduced bias• less RMSE

Page 11: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 11 – 10-10-31

MLR for O3:Persistance + Antecedents

The value of antecedants

Page 12: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 12 – 10-10-31

Bayesian Neural Network (BNN)Categories Details comments

Predictor Selection Stepwise Regression ensemble + frequency analysis

Select any predictor that occurs more 20%

Optimization Weights by Bayesian Theory

Full use of training data

Scaling Yes

Seasonality 1 SeasonCode Matlab R code exist to issue

forecast

Page 13: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 13 – 10-10-31

Improve R2+Lower RMSE

BNN vs MLR:Persistence + Antecedents

Page 14: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 14 – 10-10-31

Impact of Data Loss on Predictor SelectionMLR w/ Pers. +

Ante. MLR w/ Pers

Loss of training data due to its chose of antecedent predictors

*Last column indicates data availability for a given model (black space = loss of data)

Availability of antecedent preds.

BNN w/ Pers. +Ante.

Page 15: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 15 – 10-10-31

Generalized Boosted Regression Model (GBM)

obs obs obs

Fitte

d Va

lues

Fitte

d Va

lues

m=1 m=2 m=M

Fitte

d va

lues

Basis 1 Basis 1 Basis 2 Basis 1 Basis M+ +

Training Data

• Iterative Regression method• Minimizes a certain loss function• Applicable to many techniques [MLR, CART, SVR, …]

Page 16: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 16 – 10-10-31

Verification Results for Above 75th Percentile cases

Shrinkage: 0.001 Iterations: 6000 Interactions: 9

Page 17: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 17 – 10-10-31

Lesson Learned

• Quantile Thresholds for Verification • Minimum BIC method based on “leaps” module in R• Antecedent predictors are a double edge sword

– Add skill but increase data reliability issues– Need a good backup model

• GBM and BNN show promise

Page 18: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 18 – 10-10-31

Path Forward

• Address data loss issues in predictor selection and training stages– Fill in missing data– Account for data reliability in predictor selection

• Continue testing boosting and forecast calibration methods above case bias• Test new predictors (observed meteorology, trajectories,

new GEM-MACH fields)• New statistical techniques (Recursive NN, Fuzzy Logic)• Prototypes for Summer 2012

18

Page 19: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 19 – 10-10-31

Thank You

Page 20: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 20 – 10-10-31

MLR for NO2:Persistance + Antecedents

Pers. = Ante.

Page 21: 1 – Environment Canada, Canada 2 – University of British Columbia, Canada

Page 21 – 10-10-31

MLR for PM2.5:Persistance + Antecedents

Antecedent for PM2.5