new resources in the research data archive doug schuster

32
New Resources in the Research Data Archive Doug Schuster

Upload: antoine-verdun

Post on 14-Dec-2015

221 views

Category:

Documents


7 download

TRANSCRIPT

New Resources in the Research Data Archive

Doug Schuster

Topic Outline New Resources

Search/Discovery and Data Delivery TIGGE JRA-25 Routine Updates

Data Search, Discovery and Delivery

Popular Datasets Google Style Search Drill Down Style Search File Level Metadata

Example: Search for model generated tropical cyclone track

data using “Drill Down” method.

Data Search, Discovery, and Delivery

Data Search, Discovery, and Delivery (Drill Down)

Data Search, Discovery, and Delivery (Drill Down)

Data Search, Discovery, and Delivery (File Level Metadata)

Background on TIGGE

WMO World Weather Research Programme THORPEX

– THe Observing system Research and Predictability EXperiment

– THORPEX Interactive Global Grand Ensemble (TIGGE) Archive supports research• Grand Ensemble = multiple NWP centers ensembles

are combined (an ensemble of ensembles)• 10 international NWP Centers contributing to TIGGE

Background on TIGGE

Three mirrored archive centers• NCAR• ECMWF• CMA

{Shared System Development!}

• Daily Data Flow Metrics– 245 GB– 1.6 Million gridded fields as separate data packets– 3000+ Files/day

Data Receipt

Archive Centre

Current Data Provider

NCAR NCEP

CMC

UKMO

ECMWFMeteoFrance

JMAKMA

CMA

BoMCPTEC

IDD/LDM

HTTP

FTP

Unidata IDD/LDM

Internet Data Distribution / Local Data Manager

Commodity internet application to send and receive data

NCDC

Archive Summary

• Online Data– Period, most recent two weeks– ~ 4 TB , public products– ~ 2 TB, data preparation, subsetting, DB

• Offline Data– Full period of record– ~ 200 TB, NCAR MSS system

Major ChallengesInsure data receipt, build complete archive

Exchange manifest files as part of IDD/LDM data

transmission between Archive centersVerify send, receiveAutomated resend requests for missing fields

Collate data fields into different files typesHarvest and hold metadata in MySQL DB’s

Identify location of every field in file setUpdated often Critical for users interface and background data

processing

Major ChallengesAccess system must accurately display

what common parameters are available as users make selectionsDriven by multi-center research (Grand

Ensemble)Parameters vary between centers.

Variance between centers

N200N128

0.56x0.561.00x1.001.25x0.83

1.25x1.251.50x1.50

0 1 2 3 4

Spatial Resolution

ECMWF UKMO JMA NCEP CMA CMC BOM MF KMA CPTEC

Number of Data Providers

Mo

de

l Re

so

luti

on

ECMW

F

UKMO

JMA

NCEPCM

ACM

CBOM M

FKM

A

CPTEC

0

10

20

30

40

50

60

70

80 # fields, # ensemble members

Conforming parame-ters

Ensemble Members

ECMW

F

UKMO

JMA

NCEPCM

ACM

CBOM M

FKM

A

CPTEC

02468

1012141618

Forecast Length, Initialization

Forecast Length (Days)

Forecasts/day

Get Forecast Data

NCAR online file archive

• Selection options (Portal or RDA)

•Center(s)•Date•File type (sl, pl, etc)•Initialization time•Forecast length

Download Options• Point and click using browser, one file at a time• Script to run on local machine

•User and password encrypted ‘wget’ commands• background process to access all files

User customized files

• Selection options (Portal)•Same as for files, plus•Parameter Subsets•Grid Interpolation•Spatial subsets•Formats, GRIB2, NetCDF

Delayed ModeReal Time

Two User Interfaces

User access selection demonstration

Animation, what you will see– Multiple centers

• (ECMWF, UKMO, NCEP, CMA, CMC, KMA)– Fields/Parameters

• (Geopotential Height, 2m Temperature)– Levels

• (500 hPa, Single Level)– Spatial and temporal ranges

• (Global, 3-days, 12Z initializations, 48 hour forecasts)– Regridding to common spatial resolution

• (1.5°)– Output format

• (netCDF)

Sample Data Request for an Event

Retrieve Completed Subset

Subset Request Animation

Gustav/Hannah Animation

Features of JRA-25/JCDAS at NCAR

All data available through web/RDA portal and NCAR MSS, 11 TB• Available dates, 1979 though 2007• 23 different data products

– 4 x daily, GRIB1 format– Monthly mean, netCDF (NCAR derived from binary) format

• All data users are registered and must agree to JMA’s ‘Condition of Use’

Typhoon Sepat, 16 August 2007

Images courtesy Dave Stepaniak

Routine Updates• NCEP

FNL Global Tropospheric Analysis (Daily)BUFR/PREPBUFR obs. data (Weekly)

• Unidata IDD data (Daily)NetCDF format obs collected from GTSIDD model data (GRIB-2)

GFSNAMRUC

Routine Updates• SST

NCEP OI Global SST 1x1 Deg (weekly)NOAA OI Global 0.25 x 0.25 SST (monthly)Hadley Centre Global Sea Ice and SST (monthly)

• ReanalysisNNR Yearly updatesNARR Yearly updatesJRA-25

Questions?

Lessons Learned

Manifest files and automated resend are critical for a complete archive

The impact of different contributions from the NWP centers across archive cannot be under estimated

There are important design considerations to insure prompt browser interactions Caching data from the DB

Lessons Learned

Computational resource requirements ramp up quickly with multi-dimensional problemsD’s, center, ensemble member, parameter,

forecast length, etc. Archive file structure choices greatly impact

subsetting abilityTIGGE currently based on synoptic orderTime-series by parameter could be better?

Major Challenges Limited online storage – 4 TB, ≅ 2 weeks

temporal coverageFull archive on NCAR Mass Storage

System User registration and metrics required

Accept data policy; for research and education only

48 hour delay from forecast initialization time