netcdf-4 interoperability with hdf4 and hdf5 ed hartnett unidata, 8/4/9

31
NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Upload: nelson

Post on 31-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9. Purpose of Interoperability Features: World Conquest. The purpose of the interoperability features is to allow users to use netCDF programs on non-netCDF data archives. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

NetCDF-4 Interoperability with HDF4 and HDF5

Ed HartnettUnidata, 8/4/9

Page 2: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Purpose of Interoperability Features: World Conquest

• The purpose of the interoperability features is to allow users to use netCDF programs on non-netCDF data archives.

• NetCDF-Java can read many data formats; the idea is to bring some of this functionality to the C/Fortran/C++ libraries.

Page 3: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Warning and Request

• HDF4 and HDF5 interoperability features are still being tested. They are not ready for operational use yet.

• The interoperability features are available in the netCDF daily snapshot release.

• Please use them and send feedback to:

[email protected]

Page 4: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Overview

• HDF4 Interoperability

– What is HDF4 and why bother with it?

– Reading HDF4 files with netCDF.

– Limitations and request for help.

• HDF5 Interoperability

– What is HDF5 and why bother with it?

– Reading HDF5 files with netCDF.

– Limitations.

Page 5: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

What is HDF4?• The original HDF format, superseded by

HDF5.

• HDF4 has built-in 32-bit limits that make it unattractive for new data sets. It is still actively supported by The HDF Group, but no new features are added.

• Get more info about HDF4 at: http://www.hdfgroup.org/products/hdf4

Page 6: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Why Read HDF4?

Some important data sets are distributed in HDF4, for example the Aqua/Terra satellite data.

Page 7: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

HDF4 Background

• HDF4 has several different APIs. The one of greatest interest to netCDF users is the SD (Scientific Data) API.

• The SD API is (intentionally) very similar to the netCDF classic data model.

Page 8: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Confusing: HDF4 Includes NetCDF v2 API

• A netCDF V2 API is provided with HDF4 which writes SD data files.

• This must be turned off at HDF4 install-time if netCDF and HDF4 are to be linked in the same application.

• There is no easy way to use both HDF4 with netCDF API and netCDF with HDF4 read capability in the same program.

Page 9: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Reading HDF4 SD Files

Starting with version 4.1, netCDF will be able to read HDF4 files created with the “Scientific Dataset” (SD) API.

This is read-only: NetCDF can't write HDF4! The intention is to make netCDF software work

automatically with important HDF4 scientific data collections.

Page 10: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Building NetCDF to Read HDF4

• This is only available for those who also build netCDF with HDF5.

• HDF4, HDF5, zlib, and other compression libraries must exist before netCDF is built.

• Build like this:

./configure –with-hdf5=/home/ed –enable-hdf4

Page 11: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Compiling with HDF4

• Include netcdf header file as usual.

• Include locations of netCDF, HDF5, and HDF4 include directories:

-I/loc/of/netcdf/include -I/loc/of/hdf5/include -I/loc/of/hdf4/include

Page 12: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Linking with HDF4The HDF4 and HDF5 libraries (and associated

libraries) are needed and must be linked into all netCDF applications. The locations of the lib directories must also be provided:

-L/loc/of/netcdf/lib -L/loc/of/hdf5/lib -L/loc/of/hdf4/lib

-lmfhdf -ldf -ljpeg -lhdf5_hl -lhdf5 -lz

Page 13: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Use nc-config to Help with Compile Flags

• The nc-config utility is provided to help with compiler flags:

$ ./nc-config --cflags-I/usr/local/include$ ./nc-config --libs-L/usr/local/lib -lnetcdf -L/machine/local/lib -lhdf5_hl -lhdf5 -lz -lm -lhdf4$ ./nc-config --flibs-M/usr/local/lib -lnetcdf -L/machine/local/lib -lhdf5_hl -lhdf5 -lz -lm -lhdf4

Page 14: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Implementation Notes

• You don't need to identify the file as HDF4 when opening it with netCDF, but you do have to open it read-only.

• The HDF4 SD API provides a named, shared dimension, which fits easily into the netCDF model.

• The HDF4 SD API uses other HDF4 APIs, (like vgroups) to store metadata. This can be confusing when using the HDF4 data dumping tool hdp.

Page 15: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

C Code to Read HDF4 SD File

/* Create a file with one SDS, containing our phony data. */ sd_id = SDstart(FILE_NAME, DFACC_CREATE); sds_id = SDcreate(sd_id, PRES_NAME, DFNT_INT32, DIMS_2, dim_size); SDwritedata(sds_id, start, NULL, edge, (void *)data_out); if (SDendaccess(sds_id)) ERR; if (SDend(sd_id)) ERR;

/* Now open with netCDF and check the contents. */ if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR; if (nc_inq(ncid, &ndims_in, &nvars_in, &natts_in, &unlimdim_in)) ERR; ...

Page 16: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

ncdump and HDF4 SD Files

• With HDF4 reading enabled, ncdump works on HDF4 files.

• Sample MODIS file: ../ncdump/ncdump -h MOD29.A2000055.0005.005.2006267200024.hdf netcdf MOD29.A2000055.0005.005.2006267200024 {dimensions: Coarse_swath_lines_5km\:MOD_Swath_Sea_Ice = 406 ; Coarse_swath_pixels_5km\:MOD_Swath_Sea_Ice = 271 ; Along_swath_lines_1km\:MOD_Swath_Sea_Ice = 2030 ; Cross_swath_pixels_1km\:MOD_Swath_Sea_Ice = 1354 ;variables: float Latitude(Coarse_swath_lines_5km\:MOD_Swath_Sea_Ice, Coarse_swath_pixels_5km\:MOD_Swath_Sea_Ice) ; Latitude:long_name = "Coarse 5 km resolution latitude" ; Latitude:units = "degrees" ;...

Page 17: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

HDF-EOS Not Understood

• Many HDF4 data sets of interest follow the HDF-EOS metadata standard.

• Stored as a long text string in global attributes, the HDF-EOS metadata looks messy.

// global attributes: :HDFEOSVersion = "HDFEOS_V2.9" ; :StructMetadata.0 = "GROUP=SwathStructure\n\tGROUP=SWATH_1\n\t\tSwathName=\"MOD_Swath_Sea_Ice\"\n\t\tGROUP=Dimension\n\t\t\\tOBJECT=Dimension_1\n\t\t\t\tDimensionName=\"Coarse_swath_lines_5km\"\n\t\t\t\tSize=406\n\t\t\tEND_OBJECT=Dimension_1\n\t\t\tOBJECT=Dimension_2\n\t\t\t\tDimensionName=\"Coarse_swath_pixels_5km\"\n\t\t\t\tSize=271\n\t\t\t...

Page 18: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

HDF4 Read Testing

• Tested in libsrc4/tst_interops2.c, which creates some HDF4 files with the SD API, and then reads them with netCDF.

• If –enable-hdf4-file-tests is used with netCDF configure, some Aura/Terra satellite data files are downloaded from Unidata FTP site, then read by libsrc4/tst_interops3.c.

Page 19: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

HDF4 Interoperability Limitations

• File must be opened read-only.

• Only HDF4 SD data files are currently understood.

• This feature cannot be used at the same time as HDF4's netCDF v2 API, because HDF4 steals the netCDF v2 API function names. So you must use –disable-netcdf when building HDF4. (It might also work to –disable-v2 for the netCDF build.)

Page 20: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Future HDF4 Work

• More tests.

• Support for HDF4 image types.

• Test support for compressed data.

• Add some support for HDF-EOS metadata in the libcf library, using the HDF-EOS toolkit.

Page 21: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Request for User Help – What Data to Read?

• Please send me pointers to scientifically important HDF4 datasets.

• The intention is not to read any HDF4 data, just those of wide scientific interest.

Page 22: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Contribute Code to Write HDF4?• Some programmers use the netCDF v2 API to

write HDF4 files.

• It would not be too hard to write the glue code to allow the v2 API -> HDF4 output from the netCDF library.

• The next step would be to allow netCDF v3/v4 API code to write HDF4 files.

• Writing HDF4 seems like a low priority to our users. I would be happy to help any user who would like to undertake this task.

Page 23: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

What is HDF5?• HDF5 is an extremely general data storage

format with many advanced features: on-the-fly compression, parallel I/O, a rich data model, etc.

• Starting with netCDF-4.0, netCDF has been able to use HDF5 as a storage layer, exposing some of the advanced features.

• But, until version 4.1, only HDF5 files created with netCDF-4 could be understood by netCDF-4.

Page 24: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Why Read HDF5 Files?

• Many important datasets are available in HDF5 format, including data from the Aqua satellite.

Page 25: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Rules for Reading HDF5 Files

• NetCDF-4.1 provides read-only access to existing HDF5 files if they do not violate some rules:

– Must not use circular group structure.

– HDF5 reference type (and some other obscure types) are not understood.

– Write access still only possible with netCDF-4/HDF5 files.

Page 26: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

HDF5 Version 1.8 Background

• In version 1.8, HDF5 introduced “dimension scales” as a way of supporting shared dimensions.

• Also in version 1.8, HDF5 introduced ordering by creation, rather than ordering alphabetically.

• But most data providers don't use these features, but instead use HDF5 1.6.

Page 27: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

NetCDF-4.1 Relaxes Some Restrictions for HDF5 Files

• Before netCDF-4.1, HDF5 files had to use creation ordering and dimension scales in order to be understood by netCDF-4.

• Starting with netCDF-4.1, read-only access is possible to HDF5 files with alphabetical ordering and no dimension scales. (Created by HDF5 1.6 perhaps.)

• HDF5 may have dimension scales for all dimensions, or for no dimensions (not for just some of them).

Page 28: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

HDF5 C Code to Write HDF5 File

/* Create file. */ if ((fileid = H5Fcreate(FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT)) < 0) ERR; /* Create the space for the dataset. */ dims[0] = LAT_LEN; dims[1] = LON_LEN; if ((pres_spaceid = H5Screate_simple(DIMS_2, dims, dims)) < 0) ERR;

/* Create a variable. It will not have dimension scales. */ if ((pres_datasetid = H5Dcreate(fileid, PRES_NAME, H5T_NATIVE_FLOAT, pres_spaceid, H5P_DEFAULT)) < 0) ERR;

if (H5Dclose(pres_datasetid) < 0 || H5Sclose(pres_spaceid) < 0 || H5Fclose(fileid) < 0) ERR;

Page 29: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

NetCDF C Code to Read HDF5 File

/* Read the data with netCDF. */ if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR; if (nc_inq(ncid, &ndims_in, &nvars_in, &natts_in, &unlimdim_in)) ERR; if (ndims_in != 2 || nvars_in != 1 || natts_in != 0 || unlimdim_in != -1) ERR; if (nc_close(ncid)) ERR;

Page 30: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Future Plans for HDF5 Interoperability

• More testing.

• Proper handling of reference types. This will require (probably) an extension of the netCDF APIs.

• Better handling of strange group structures, if this proves necessary to read important data.

Page 31: NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Summary• With the 4.1 release, the netCDF C/Fortran/C+

+ libraries allow read-only access to some existing HDF4 and HDF5 data archives.

• The intention is not to develop a completely general translation, but instead to focus on datasets of significance to the Earth science community.

• Write capability is quite possible, but we don't plan on providing it because the demand for this is low.