experiences of a earth science data user confessions of a data hoarder rob carver, the weather...

Download Experiences of a Earth Science Data User Confessions of a Data Hoarder Rob Carver, The Weather Company

If you can't read please download the document

Upload: imogene-riley

Post on 08-Jan-2018

215 views

Category:

Documents


1 download

DESCRIPTION

Open Data and The Weather Company ❖ Our business model is taking open data and using it to tell interesting stories that engage our users. ❖ Over the years, we’ve archived over 100 Tb of data ❖ GRIB1, GRIB2, NIDS, shapefiles, netCDF, HDF5, ❖ NWS/NCEP, NCDC, FEMA, Census Bureau, NASA DAAC’s

TRANSCRIPT

Experiences of a Earth Science Data User Confessions of a Data Hoarder Rob Carver, The Weather Company Andrew S. Tanenbaum Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. Open Data and The Weather Company Our business model is taking open data and using it to tell interesting stories that engage our users. Over the years, weve archived over 100 Tb of data GRIB1, GRIB2, NIDS, shapefiles, netCDF, HDF5, NWS/NCEP, NCDC, FEMA, Census Bureau, NASA DAACs Locating Data 1.Google and literature searches 2.??? 3.Data! 100+ Tb of Weather Models Most data arrives through Unidatas LDM and FTP pull scripts. ECMWF pushes data to our FTP site. (All GRIB2/1) Ingested into the forecast system, and GRADS handles the model visualization Archived to local disk arrays and Amazon S3 Level-III NIDS Archive NCDC maintains an archive of the WSR-88D radar networks products from 1995 to present (>10 Tb) Order datasets from a tape-based archive Two years to acquire it using a set of PHP scripts Easier to acquire the entire archive than figuring out what subset to acquire Already had a NIDS parser for visualization FEMA Flood Maps Data Acquisition Method: DVD for each state Format: ESRI Shapefiles (1 shapefile of a feature class per state) Data Display: Split state shapefiles by county and then pre-render tiles for moderate to coarse zoom levels on a map mashup. Suggestions Data in a difficult/proprietary format just waste disk space Please use data formats that are well-supported by open-source software packages (i.e. OGR/GDAL) netCDF, TIFF, ESRI shapefiles, HDF5, geoJSON Instead of complex CSV or fixed-width text files, use self-describing formats (JSON,XML,SQLITE) Suggestions (cont.) Data/Navigation files should use the same naming conventions/sequences Dont use overly large archive files Data pools/ftp servers attached to large disk arrays are awesome data providers (as long as limits are in place) For really large, static datasets (>10Gb), Bittorrent would be really useful Questions/Comments/Answer s?