tim pugh-speddexes 2014

Download Tim Pugh-SPEDDEXES 2014

Post on 20-May-2015




2 download

Embed Size (px)


How OPeNDAP has transformed the way we do science plus snapshots of recent developments and BoM’s operational systems built on this technology


  • 1. How OPeNDAP has transformed the way we do science plus snapshots of recent developments and BoMs operational systems built on this technology Tim Pugh SPEDDEXES workshop 17-21 March 2014

2. Evolution Traditionally Scientific research is conducted in a quiet room in isolation utilising unique data, scripts, and code Scientific collaboration is conducted at conferences with file sharing by FTP or HTTP bulk download Today Scientific research is being driven to shared research services and supported infrastructure To relieve the scientist of laborious developments To manage more complex machinery To improve scientific integrity and collaboration To work within managed and supported infrastructure Science is moving from file sharing to data sharing collaboration 3. CAWCR Research Data Server Location: http://opendap.bom.gov.au:8080/thredds Unidata THREDDS Data Server v4.2.8 http://www.unidata.ucar.edu/projects/THREDDS/tech/TDS.html The THREDDS Data Server (TDS) is a JavaSevlet, and is contained in a single war file, which allows very easy installation into Tomcat web server. 4. OPeNDAP Now Is: An acronym Open-source Project for a Network Data Access Protocol Often a synonym for DAP A not-for-profit corp. developing/supporting DAPx - a web-services protocol for data access Deployed by hundreds of data providers internationally Employed in many analysis packages (MATLAB, e.g.) Designated a Community Standard by NASA Server & client implementations* of DAP *Note: there are other implementations 5. BROAD VISION 1. A world in which a single data access protocol is used for the exchange of data between network-based applications regardless of discipline. 2. A layer above TCP/IP providing for syntactic and semantic consistency not available in existing protocols such as FTP. 6. Fundamental Objective of OPENDAP The fundamental objective of OPeNDAP and OPeNDAP Inc. is to facilitate internet access to scientific data This is done by: Providing a protocol (DAP) to access data over the internet, Hiding the format (and organization) in which the data are stored from the user, and Providing subsetting (and other) capabilities for the data at the server OPeNDAP is based on a multi-tier architecture OPeNDAP software is open source 7. OPeNDAP Data-Type Philosophy the OPeNDAP data model has few data types simplified programming/lowered risk of errors they are intentionally discipline-neutral better trans-domain utility & programmer uptake they nonetheless fill discipline-specific needs netCDF-like (good in contexts where, e.g., data might represent functions with 4- or 5-D domains) sequences & selections match dbms sensibilities 8. TDS Server TDS is THREDDS Data Server THREDDS is Thematic Real-time Environmental Distributed Data Services Middleware to bridge the gap between data providers and data users THREDDS Data Server (TDS), a web server that provides catalog, metadata, and data access services for scientific datasets. The TDS is open source, 100% Java, and runs inside the open source Tomcat Servlet container. Unidatas Common Data Model merges the OPeNDAP, netCDF, and HDF5 data models to create a common API for scientific data implemented by the NetCDF Java library read netCDF, OPeNDAP, HDF5, HDF4, GRIB 1 & 2, BUFR, NEXRAD 2 & 3, GEMPAK, MCIDAS, GINI, among others A pluggable framework allows other developers to add readers for their own specialized formats. provides standard APIs for geo-referencing coordinate systems, and specialized queries for scientific feature types like Grid, Point, and Radial datasets 9. Some of the Technology in the TDS 1. THREDDS Dataset Inventory Catalogs provide virtual directories of available data and associated metadata. 2. The Netcdf-Java/CDM library reads NetCDF, OpenDAP, and HDF5 datasets, as well as other binary formats such as GRIB and NEXRAD, essentially an (extended) netCDF view of the data. 3. TDS can use the NetCDF Markup Language (NcML) to modify and create virtual aggregations of datasets. 4. An integrated server provides OPeNDAP access with subsetting data access method. 5. An integrated server provides bulk file access through the HTTP protocol. 6. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Coverage Service (WCS) protocol, for any "gridded" dataset whose coordinate system information is complete. 7. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Map Service (WMS) protocol, for any "gridded" dataset whose coordinate system information is complete. 8. The integrated ncISO server provides automated metadata analysis and ISO metadata generation. 10. THREDDS Catalog The goal is to simplify the discovery and use of scientific data and to allow scientific publications and educational materials to reference scientific data. initial focus was to allow data users to find datasets that are pertinent to their specific education and research needs, access the data, and use them without necessarily downloading the entire file to their local system. Catalogs are the heart of the data access services, and is the THREDDS concept. Catalogs consist of XML documents that describe on-line datasets. Catalogs can contain arbitrary metadata, however we also defined a standard set of metadata to bridge to discovery centers CF (Climate & Forecast) and Unidata Data Discovery metadata 11. Spectrum of Use Cases Application Data Representation OGC data model domain specific geospatial, 1-D, 2-D DAP2 data model domain neutral n-D, time series **DAP4 data model domain neutral new data types and data structures streaming, compressed, chunked Common Data Model (CDM) domain specific Future data model domain neutral?? Application Types Programmatic / Langauge API FORTRAN, C/C++, JAVA, Python, NetCDF, Java NetCDF Programmatic / Tools NetCDF, NCO, PyDAP Custom Tools: OPeNDAP crawler, ocean_prep Interactive Data Viewer IDV, Panolopy, IDL, MATLAB, iPython (matplotlib), NCL, web browser (metadata) Interactive Analysis MATLAB, IDL, iPython, NCL Custom Application: Inudation Modeller Web Application Live Access Server IMOS Data Portal (WMS) Custom Java Servlet Programming DAP2 Legacy Code existing tools DAP2 New Code New tools **DAP4 programming legacy code support **DAP4 programming new data model and protocols streaming support **DAP4 programming Asynchronous access modes, server-side processing Data Access Protocol Metadata Request das, dds, ddx ASCII/Binary Data Request Simple data representation DAP Binary Object Request NcML Data Request aggregation, virtual data sets **DAP4 server-side operations, async access mode, new data model, posting Syntax Return data set info file.nc.dds - readable file.nc.ddx - XML file.nc.asc - ASCII data return Select variables file.nc.dods?var1,var2,var3 subset arrays file.dods?var1(0:1:10) Return file translations file.nc.netcdf - NetCDF file Server-side operations file.nc?GEOLOC() Async access mode ?? Clients Programmatic Access Tsunami inudation modeller, NetCDF, NCO, PyDAP, PyNetCDF, MATLAB, IDL, Interactive Access Web browser - Catalog MATLAB, IDL, Python, Panolopy, Data Library & Catalog Service metadata harvesting directory listings remote THREDDS services Web Service Java servlet, Java applet Geospatial Information Service OPeNDAP data service Analysis Service Live Access Server Service Capabilities DAP2 response metadata, dods, ASCII / Binary **DAP4 Response async access mode, server- side, streaming, NcML Aggregation service Virtual Data Set Service Remote Data Access Metadata Conversion and RDF metadata definitions, translations (-> ISO) sematics, ontalogy CF->ISO, CF->WMS, CF->WCS Layered Services Catalogue service WMS, WCS services Authentication Conformance checks CF metadata check ISO metadata check **DAP4 features listed is my estimation and not the official specification 12. Use Case limitations Time to access data is dependent on the following factors: Hardware and network performance Selection of variables and dimensions Number of data requests to be issued Latency inherent in the data request Number of concurrent accesses to the server 13. DAP-enabled client tools/applications OPeNDAP Clients (partial list) http://opendap.org/whatClients 1. Web browser returning ASCII data 2. Pydap - is a pure Python library implementation of the DAP2 3. NetCDF - is a set of software libraries and self-describing, machine- independent data formats with interfaces to Python, FORTRAN, C/C++, and Java languages 4. NCO comprises a dozen standalone, command-line programs that take netCDF files as input 5. MATLAB a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation 6. Panoply Panoply is a cross-platform application which plots geo-gridded arrays from netCDF, HDF and GRIB datasets. 14. Developments by Bureau and CSIRO Development of web portals for data access services and information systems in climate and environment Seasonal Climate Outlook Rebuild (Roald de Wit) Natural Resource Management (NRM) Climate Change Portal (Tim Erwin) eReefs Marine Quality Dashboard and data services (Jonathon Hodge) National Environmental Information Infrastructure (NEII) (Andrew Woolf) CAWCR research data services (Duan Beckett) Establish Climate Data Publishing services at NCI NCI, CSIRO, Bureau of Meteorology, CoE CSS Earth System Grid (ESG) Climate and Weather Science Laboratory (CWSLab) 15. SCO-R Project overview 16. Project overview More interactivity and functionality needed Demand for POAMA multi-week forecast products Long term view of seamless transition between forecasts Building upon experiences / technologies from other BoM projects (e.g.MetEye and PASAP/PACCSAP) 17. SCO-R architecture MapCache BOM.Map / BOM.App Custom WMS