introduction to netcdf-4

45
Introduction to NetCDF4 MuQun Yang The HDF Group 11/6/2007 HDF and HDF-EOS Workshop XI, Landover, MD

Upload: the-hdf-eos-tools-and-information-center

Post on 26-May-2015

293 views

Category:

Technology


1 download

DESCRIPTION

This tutorial targets NetCDF application developers and users who are interested in the NetCDF-4 library features based on the underlying HDF5 library and file format. We will discuss how to use new NetCDF-4/HDF5 features and APIs to achieve optimal I/O performance.

TRANSCRIPT

Page 1: Introduction to NetCDF-4

Introduction to NetCDF4

MuQun YangThe HDF Group

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 2: Introduction to NetCDF-4

Notes

• Require basic knowledge of HDF5 and netCDF3

• Cover general NetCDF4 concepts- Several new features and their

performances

• Cover some NetCDF4 APIs but won’t review all new APIs

• Is not a netCDF3 tutorial

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 3: Introduction to NetCDF-4

Contents

• History review• Overview of NetCDF4 features,

builds and etc• Performance issues• Suggestions for users

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 4: Introduction to NetCDF-4

History Review

• Funded by NASA ESTO AIST Program• Joint project between Unidata and HDF

Group • Used HDF5 as the storage layer of NetCDF

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 5: Introduction to NetCDF-4

NetCDF-4/HDF5 Goals

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

• Combine desirable characteristics of netCDF and HDF5, while taking advantage of their separate strengths:- Widespread use and simplicity of netCDF- Generality and performance of HDF5

• Preserve format and API compatibility for netCDF users

• Demonstrate benefits of combination in advanced Earth science modeling efforts

(From : Russ Rew etc’s talk at VII HDF and HDF-EOS workshop)

Page 6: Introduction to NetCDF-4

NetCDF-4 Architecture

HDF5 Library

netCDF-4netCDF-4LibraryLibrary

netCDF-3Interface

netCDF-3applications

netCDF-3applications

netCDF-4applications

netCDF-4applications

HDF5applications

HDF5applications

netCDFfiles

netCDFfiles

netCDF-4HDF5 files

HDF5files

(From : Russ Rew etc’s talk at VII HDF and HDF-EOS workshop)

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 7: Introduction to NetCDF-4

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 8: Introduction to NetCDF-4

Contents

• History review• Overview of NetCDF4 features,

builds and etc• Performance issues• Suggestions for users

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 9: Introduction to NetCDF-4

Current Status

• http://www.unidata.ucar.edu/software/netcdf/netcdf-4/

• 4.0 beta 1 based on HDF5 1.8 beta 1 on April, 2007

• 4.0 beta 2 release is coming soon

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 10: Introduction to NetCDF-4

Compilers, platforms and language supports

• Platforms- Linux, IBM AIX, Sun OS, HP-UX, OSF1, IRIX,

Cygwin

• Programming Languages- C/C++ and fortran

• Compilers- Vendor compilers on the supported platforms

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

• Watch for Snapshot

http://www.unidata.ucar.edu/software/netcdf/builds/snapshot/netcdf-4

Page 11: Introduction to NetCDF-4

Configuration

• Only NetCDF3 will be built if you just type ./configure

• Before building NetCDF4, one must- install HDF5 1.8 beta 1 or later (note: parallel HDF5

needs separate build)

- install zlib library if using data compression

• To build sequential version - ./configure --enable-netcdf-4 --with-hdf5=/HDF5path --with-zlib=/zlibpath

• To build parallel version - ./configure --enable-netcdf-4 –enable-parallel –disable-shared --with-hdf5=/parallel HDF5path --with-zlib=/zlibpath

Parallel NetCDF4 needs more work. It has been tested on IBM AIX.

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 12: Introduction to NetCDF-4

API Changes

• Existing APIs: Essentially no differences but with new flags

NetCDF3:

NetCDF4:

• Adding new APIs for new features such as:

nc_def_var_deflate(ncid, varid, shuffle, deflate, deflate level)

Hereafter blue color in APIS implies this is an output parameter

11/6/2007

nc_create(FILE_NAME, NC_NOCLOBBER, &ncid);

HDF and HDF-EOS Workshop XI, Landover, MD

nc_create(FILE_NAME, NC_NETCDF4,&ncid);

Page 13: Introduction to NetCDF-4

Overview of NetCDF4 new features

• Data Type- Compound data type- Variable length type

• Group• Multiple Unlimited Dimension• Compression• Parallel IO

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 14: Introduction to NetCDF-4

A compound datatype example

11/8/2007HDF and HDF-EOS Workshop XI, Landover,

MD 14

types: compound wind_vector_t { float eastward ; float northward ; }

dimensions: lat = 18 ; lon = 36 ; pres = 15 ; time = 4 ;

variables: wind_vector_t gwind(time, pres, lat, lon) ; wind:long_name = "geostrophic wind vector" ; wind:standard_name = "geostrophic_wind_vector" ;

data: gwind = {1, -2.5}, {-1, 2}, {20, 10}, {1.5, 1.5}, ...;

Page 15: Introduction to NetCDF-4

Variable length type

11/8/2007HDF and HDF-EOS Workshop XI, Landover,

MD 15

Simple example: ragged array

types: float(*) row_of_floats; dimensions: m = 50; variables: row_of_floats ragged_array(m);

Page 16: Introduction to NetCDF-4

An Example – variable length and compound datatype

11/8/2007HDF and HDF-EOS Workshop XI, Landover,

MD 16

struct sea_sounding { int sounding_no;

nc_vlen_t temp_vl; } data[DIM_LEN];

/*1. Create a netcdf-4 file. */ nc_create(FILE_NAME, NC_NETCDF4, &ncid);

/* 2. Create the vlen type, with a float base type. */ nc_def_vlen(ncid, "temp_vlen", NC_FLOAT, &temp_typeid);

/* 3. Create the compound type to hold a sea sounding. */ nc_def_compound(ncid, sizeof(struct sea_sounding), "sea_sounding", &sounding_typeid); nc_insert_compound(ncid, sounding_typeid, "sounding_no",

NC_COMPOUND_OFFSET(struct sea_sounding, sounding_no), NC_INT); nc_insert_compound(ncid, sounding_typeid, "temp_vl",

NC_COMPOUND_OFFSET(struct sea_sounding, temp_vl), temp_typeid); /* 4. Define a dimension, and a 1D var of sea sounding compound type. */ nc_def_dim(ncid, DIM_NAME, DIM_LEN, &dimid); nc_def_var(ncid, "fun_soundings", sounding_typeid, 1, &dimid, &varid); /* 5. Write our array of phone data to the file, all at once. */ nc_put_var(ncid, varid, data); /*6. Close the file*/ nc_close(ncid);

Page 17: Introduction to NetCDF-4

Group

• Use of Groups is optional, with backward compatibility maintained by putting everything in the top-level unnamed Group.

• Unlike HDF5, netCDF-4 requires that Groups form a strict hierarchy.

• Potential uses for Groups includeo Factoring out common information

o Containers for data within regions, ensembleso Organizing a large number of variableso Providing name spaces for multiple uses of

same names for dimensions, variables, attributes

o Modeling large hierarchies11/6/2007

HDF and HDF-EOS Workshop XI, Landover, MD

Page 18: Introduction to NetCDF-4

Group APIs

• APIs for creating group( define APIs) nc_def_grp(parent_group_id, group name, &group_id)Examples:nc_def_grp(ncid, HENRY_VII, &henry_vii_id)nc_def_grp(henry_vii_id, MARGARET, &margaret_id)

• APIs for inquiring information from a group( inquiry APIs)

number of groups: nc_inq_grps(group_id, &num_grps, NULL); children group id list: nc_inq_grps(group_id, NULL, group_id_list);

children group name:

nc_inq_grpname(group_id_list[0], children_group_name);

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 19: Introduction to NetCDF-4

Multiple Unlimited Dimension APIs

• APIs for defining multiple unlimited dimensions Old API with the same flag:

nc_def_dim(ncid, dimension name, NC_UNLIMITED, int *idp)Examples:

nc_def_dim(ncid, dimname_1, NC_UNLIMITED, &dimid[0])nc_def_dim(ncid, dimname_2,NC_UNLIMITED, &dimid[1])

• APIs for inquiring multiple dimensions Old API with the same flag: nc_inq_unlimdim(ncid,,int *idp) New API: nc_inq_unlimdims(ncid, int nunlimdims_in, int unlimdimid[ ])

• How to use the new API 1) First obtain the number of unlimited dimensions:

nc_inq_unlimdims(ncid, &nunlimdims ,NULL) 2) Then obtain the unlimited dimensional list:

nc_inq_unlimdims(ncid, &nunlimdims, unlimdimid)

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 20: Introduction to NetCDF-4

• Deflate now• Scaleoffset, N-bit and maybe szip in the future• Only need to add one routine

nc_def_var_deflate( int netcdf id, int variable id,

int shuffle, int deflate,

int deflate_level);

Compression

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 21: Introduction to NetCDF-4

----- Data writing --------

1. Define variablenc_def_var(ncid, VAR_BYTE_NAME, NC_BYTE, 2, dimids, &byte_varid);

2. Set deflate compression nc_def_var_deflate(ncid, byte_varid, 0, 1, DEFLATE_LEVEL_3);

3. Write the datanc_put_var_schar(ncid, byte_varid, (signed char *)byte_out);

----- Data reading --------nc_get_var_schar(ncid, byte_varid, (signed char *)byte_in);

Compression example code

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 22: Introduction to NetCDF-4

Parallel IO

• Support either collective or independent• Support MPI-IO or MPI-POSIX IO via

parallel HDF5• Special functions are used to create/open a

netCDF file in parallel.

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 23: Introduction to NetCDF-4

New APIs to do parallel IO

• nc_create_parnc_create_par(const char *path, int mode,MPI_Comm comm, MPI_Info info,

int *ncidp)

“mode” must be NC_NETCDF4|NC_MPIIO or NC_NETCDF4|NC_MPIPOSIX

• nc_var_par_access nc_var_par_access(int ncid, int var_id, int data_access )

Data_access can be either NC_COLLECTIVE or NC_INDEPENDENT

• nc_open_parnc_open_par(const char *path,int mode ,MPI_Comm comm, MPI_Info

info,&ncid)

“mode” must be either NC_MPIIO or NC_MPIPOSIX

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 24: Introduction to NetCDF-4

Parallel IO Programming Model

Data writing :/* 1. Initialize MPI. */MPI_Init(&argc,&argv)

/* 2. Create a parallel netcdf-4 file. */ nc_create_par(FILE, NC_NETCDF4|NC_MPIIO, comm, info,

&ncid)nc_var_par_access(ncid, v1id, NC_COLLECTIVE)

/* 3. Write data. */ nc_put_vara_int(ncid, v1id, start, count,data)/*4. Close the file */nc_close(ncid); /* 5. Shut down MPI. */ MPI_Finalize();

Data reading: Use nc_open_par instead of nc_create_par11/6/2007

HDF and HDF-EOS Workshop XI, Landover, MD

Page 25: Introduction to NetCDF-4

Other features

• Datatype- More atomic datatype: unsigned

integer(1,2,4 and 8 bytes)- Strings: replace character arrays- Enums,Opaque types- User-defined datatype

• Fletcher32 checksum filter• UTF-8 support• Reader-Makes-Right conversion• Using HDF5 dimensional scale

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 26: Introduction to NetCDF-4

Content

• History review• Overview of NetCDF4 features,

builds and etc• Performance issues• Suggestions for users

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 27: Introduction to NetCDF-4

<2 %

NetCDF4 Data Compression: Size

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 28: Introduction to NetCDF-4

NetCDF4 Data Compression: Data Write time

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 29: Introduction to NetCDF-4

NetCDF4 Data Compression: Data Read Time

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 30: Introduction to NetCDF-4

0

500

1000

1500

2000

2500

Run 1 Run 2 Run 3 Run 4

File

Siz

e (M

B)

Four Model Runs

No CompressionWith szip

WRF Output in HDF5 -File Size

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 31: Introduction to NetCDF-4

0

1

2

3

4

5

6

Run 1 Run 2 Run 3 Run 4

Da

ta w

rite

tim

e i

n m

inu

tes

Four Model Runs

No Compression

With szip compression

WRF Output in HDF5- Data writing time

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 32: Introduction to NetCDF-4

FM 92 GRIB, NORDRAD, Universal Format, netCDF, HDF4,HDF5, XML and Scalable Vector Graphics (SVG), and GeoTIFF

• Based on the results of the detailed evaluation, HDF5 is recommended for consideration as an official European standard format for weather radar data and products.

• Compared to other formats, HDF5’s compression algorithm (ZLIB) is more efficient…• A file format with efficient compression and platform independence is essential

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

PyTables

One of the beauties of PyTables is that it supports compression on tables and arrays

EUMETNET OPERA Report in 2006They evaluated the following data format:

Their Recommendation:

Why?

Page 33: Introduction to NetCDF-4

33

Evaluation of Parallel NetCDF4 Performance

• Regional Oceanographic Modeling System• History file writer in parallel NetCDF4(PnetCDF4)• History file writer in parallel NetCDF from

Argonne(PnetCDF)• Data:

• 60 1D-4D double-precision float and integer arrays

Page 34: Introduction to NetCDF-4

34

PnetCDF4 and PnetCDF performance comparison

• Fixed problem size = 995 MB• Performance of PnetCDF4 is close to PnetCDF

0

2040

6080

100

120140

160

0 16 32 48 64 80 96 112 128 144

Number of processors

Ban

dw

idth

(M

B/S

)

PNetCDF collective NetCDF4 collective

Page 35: Introduction to NetCDF-4

35

ROMS Output with Parallel NetCDF4

0

50

100

150

200

250

300

0 16 32 48 64 80 96 112 128 144Number of Processors

Ban

dw

idth

(M

B/S

)

Output size 995 MB Output size 15.5 GB

• The IO performance gets improved as the file size increases.• It can provide decent I/O performance for big problem size.

Page 36: Introduction to NetCDF-4

Chunking

• Using chunking wisely• Review chunking tips for HDF5

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 37: Introduction to NetCDF-4

Content

• History review• Overview of NetCDF4 features,

builds and etc• Performance issues• Suggestions for users

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 38: Introduction to NetCDF-4

NetCDF Classic Model

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 39: Introduction to NetCDF-4

Using the NetCDF Classic Model

• NetCDF-4 files can be created with the CLASSIC_MODEL flag. This enforces the rules of the classic netCDF data model on this file. nc_create(FILE_NAME, NC_NETCDF4|NC_CLASSIC_MODEL, &ncid)

• Once a classic model file, always a classic model file. This sticks with the file and there is no way to change in within the netCDF API.

• Classic model files don't use any elements of the expansion of the data model in netCDF-4. They don't have groups, user-defined types, multiple unlimited dimensions, or the new atomic types.

• Since they conform to the classic model, they can be read and understood by any existing netCDF software (as soon as that software upgrades to netCDF-4 and HDF5 1.8.0).

• NetCDF-4 features which don't affect the data model are still available: compression, parallel I/O.

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 40: Introduction to NetCDF-4

HDF5 Features not in current NetCDF4.0

• No Scaleoffset, N-bit, szip filters (Plan for 4.1 release)

• No supports for user-defined filters• Can only read HDF5 files having

dimensional scales• Can only write data in chunking storage• No Fortran 90 APIs• No corresponding APIs for optimizations

- cache, MPI-IO

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 41: Introduction to NetCDF-4

NetCDF 4.1 Plan

• http://www.unidata.ucar.edu/software/netcdf/netcdf-4/req_4_1.html

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 42: Introduction to NetCDF-4

NetCDF4, HDF5 which one should I use?

• Familiarity • Features• Performance• Compatibility• Release/feature lags

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Evaluate the followings:

Page 43: Introduction to NetCDF-4

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

High Performance + many advanced HDF5 features

HDF5 definitely

Care about performance, Possibly need to use many new advanced features

HDF5: maybe

NetCDF4:Avoid transition cost from NetCDF to HDF5

NetCDF4: maybe

1. Just need one or two HDF5 features for intensive NetCDF applications NetCDF4/CLASSIC_MODEL(compression ,parallel IO)2. Existing NetCDF software or applications that don’t care about performance

NetCDF4 definitely

Priority Recommendation

Based on stability of NetCDF4

Page 44: Introduction to NetCDF-4

More NetCDF4 information

• Release and snapshot: http://www.unidata.ucar.edu/software/netcdf/netcdf-4/

• Tutorial in 2007 NetCDF workshop:

http://www.unidata.ucar.edu/software/netcdf/workshops/2007/

• Paper in 2006 AMS annual meeting:http://www.unidata.ucar.edu/software/netcdf/papers/2006-ams.pdf

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Page 45: Introduction to NetCDF-4

• Thanks Russ Rew and Ed Hartnett from Unidata for generously allowing me to use their slides and sharing their compression performance results in this workshop

• Some contents that describe New features of are copied from 2007 Unidata NetCDF workshop

• The Radar NetCDF data compression performance results are provided by Ed Hartnett at Unidata

11/6/2007HDF and HDF-EOS Workshop XI, Landover,

MD

Acknowledgements