hdf updae

61
- 1 - HDF HDF Mike Folk National Center for Supercomputing Applications HDF and HDF-EOS Workshop IX December 1, 2005 HDF Update HDF Update HDF HDF

Upload: the-hdf-eos-tools-and-information-center

Post on 12-May-2015

66 views

Category:

Technology


4 download

DESCRIPTION

Update on HDF, including recent changes to the software, upcoming releases, collaborations, future plans. Will include an overview of the upcoming HDF5 1.8 release, and updates on the netCDF4/HDF5 merge, HDF5 support for indexing, BioHDF, the HDF5-Storage Resource Broker project, and the HDF spin-off THG.

TRANSCRIPT

Page 1: HDF Updae

- 1 - HDFHDF

Mike Folk

National Center for Supercomputing Applications

HDF and HDF-EOS Workshop IX

December 1, 2005

HDF UpdateHDF Update

HDFHDF

Page 2: HDF Updae

- 2 - HDFHDF

OutlineOutline

• Organizational info

• HDF Software Update

• Other Activities of Interest

Page 3: HDF Updae

- 3 - HDFHDF

Organizational infoOrganizational info

Page 4: HDF Updae

- 4 - HDFHDF

The HDF TeamThe HDF Team

Frank BakerFrank BakerChristian ChilanChristian ChilanPeter CaoPeter CaoVailin ChoiVailin ChoiMike FolkMike FolkFang GuoFang GuoAnne JenningsAnne JenningsBarbara JonesBarbara JonesQuincey KoziolQuincey KoziolJames LairdJames Laird

Raymond LuRaymond LuJohn MainzerJohn MainzerPedro NunesPedro NunesElena PourmalElena PourmalBinh-minh RiblerBinh-minh RiblerEric ShapiroEric ShapiroRishi SinhaRishi SinhaArash TermehchyArash TermehchyKent YangKent Yang

And all those wonderful folks out there who And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and contribute ideas, requests, bug reports, code, and support.support.

Page 5: HDF Updae

- 7 - HDFHDF

The HDF Group is MovingThe HDF Group is Moving

HDFHDF

Page 6: HDF Updae

- 8 - HDFHDF

““The HDF Group” = “THG”The HDF Group” = “THG”

Page 7: HDF Updae

- 9 - HDFHDF

THGTHG

• Why spin off from U of Illinois?– Creating a sustainable organization– We do more than R&D

• THG already exists

Page 8: HDF Updae

- 10 - HDFHDF

How will THG be different from the NCSA How will THG be different from the NCSA HDF Group?HDF Group?

• Business model

• Location

• Staff

• THG – NCSA – UIUC relations

• Affect on NASA and other affiliation

• Intellectual property

Page 9: HDF Updae

- 11 - HDFHDF

HDF Software UpdateHDF Software Update

Page 10: HDF Updae

- 12 - HDFHDF

Major software milestones since Oct. 2004Major software milestones since Oct. 2004

JanJan FebFeb MarMar AprApr MayMay JunJun JulJul AugAug SepSep OctOctNovNov

HDF Java 2.1HDF Java 2.1

HDF Web browser plug-inHDF Web browser plug-in

HDF5 1.6.5HDF5 1.6.5

HDF 4.2r1HDF 4.2r1

HDF5 1.6.4HDF5 1.6.4HDF4-to-HDF5 conversion tools 1.2HDF4-to-HDF5 conversion tools 1.2HDF Java 2.2HDF Java 2.2

20052005

DecDec NovNov

20042004

Page 11: HDF Updae

- 13 - HDFHDF

Release highlightsRelease highlights

Page 12: HDF Updae

- 14 - HDFHDF

HDF 4.2r1 – February 2005HDF 4.2r1 – February 2005

• Szip compression fixed

• Windows– hdiff and hrepack added– Config, build, testing procedures improved

• h4fc utility fixed

Page 13: HDF Updae

- 15 - HDFHDF

HDF 4.2r1 – new compilers and platformsHDF 4.2r1 – new compilers and platforms

• Mac OS X– Fortran IBM xlf v. 8.1 – Absoft f95 v. 8.2

• AMD Opteron• Cray TS IEEE

• Linux 2.4– Absoft Fortran f95 v.

9.0– PGI C and Fortran– Intel C and Fortran

Page 14: HDF Updae

- 16 - HDFHDF

HDF5-1.6.4 – HDF5-1.6.4 – March 2005March 2005

• High-Level (HL) library– Some new C APIs added– Fortran APIs added– HL library now built and installed by default

• Library built and tested with SZIP 2.0.• Many changes to improve library performance

– Especially for variable length types and metadata cache

• H5jam – a new utility– Allows a text file to be added to the "user block" at the

beginning of an HDF5 file

Page 15: HDF Updae

- 17 - HDFHDF

Platforms to be dropped in future releases Platforms to be dropped in future releases

• Operating systems– Solaris 2.8– HPUX B.11.00– Crays T3E and T90– Linux RH 7.* and 8.*– Windows 2000

• Compilers– We use the latest

versions of vendors compilers as they become available and drop the previous ones

Page 16: HDF Updae

- 18 - HDFHDF

Platforms to be addedPlatforms to be added

• Systems– Solaris 2.10– Cray X1– Cray XT3– NEC SX6– HP 64-bit (HPUX

11.23)– Mac OS 10.4

• Compilers– gcc 4.*– HDF5 Fortran: Leahy,

NAG, G95– MPI-2

Page 17: HDF Updae

- 19 - HDFHDF

Coming next: Major release HDF5 1.8Coming next: Major release HDF5 1.8

• Windows MPICH support: prototype• Integer to float conversions

– Will support integer to float conversions during I/O– http://hdf.ncsa.uiuc.edu/RFC/dtype_conv_overflow/Overflow.html

• New error-handling API• Dimension scales

– Similar to dimension scales in HDF4– http://hdf.ncsa.uiuc.edu/RFC/H5DimScales/H5dimscale_Specification_1_

0-5.pdf

Page 18: HDF Updae

- 20 - HDFHDF

• N-bit compression filter– Compact storage for user-defined datatypes.– http://hdf.ncsa.uiuc.edu/RFC/NBitPacking/NBitPacking.html

• Offset+size storage filter – Performs a scale and/or offset operation on each data value,

truncating the resulting value to a lesser number of bits before storing it.

– http://hdf.ncsa.uiuc.edu/RFC/ScaleOffsetCompress/ScaleOffsetCompress.html

Page 19: HDF Updae

- 21 - HDFHDF

• Group revisions – Option to access objects according to creation order– Improved performance for groups containing a large

number of objects.– http://hdf.ncsa.uiuc.edu/RFC/ReviseGroups/

• Improved metadata cache– New metadata cache improves performance and

memory usage in the library.– Apps that access files with a large number of objects

should see significant performance improvement and should use less memory.

Page 20: HDF Updae

- 22 - HDFHDF

• Data transformation filter– Performs data transformation during I/O operations.– Transform expressed by algebraic formula (e.g. a*x + b) – http://hdf.ncsa.uiuc.edu/HDF5/doc_dev_snapshot/H5_dev/

html/RM_H5P.html#Property-SetDataTransform

• Ph5diff – parallel h5diff– Compares two files in an MPI parallel environment.– Compares multiple datasets simultaneously.– http://hdf.ncsa.uiuc.edu/RFC/PH5DIFF/

Page 21: HDF Updae

- 23 - HDFHDF

• HDFpacket API  – Data collected in “packets”– “Horizontal” view, per time step– Efficient access to fixed- and variable-length records– http://hdf.ncsa.uiuc.edu/RFC/HDF5Packet/Tech_reprt_

HDF5Packet.pdf

• Possible: HDFtime_history API– Archival, viewing, analysis– “Vertical” view, per parameter

Page 22: HDF Updae

- 24 - HDFHDF

SZIP integration with HDF4 and HDF5SZIP integration with HDF4 and HDF5

• Development and integration completed– Includes libraries and tools

• SZIP documentation web page– http://hdf.ncsa.uiuc.edu/doc_resource/SZIP/– Examples and performance studies for HDF5

Page 23: HDF Updae

- 25 - HDFHDF

Parallel I/O and chunkingParallel I/O and chunking

• Collective I/O – key to improving performance for parallel HDF5

• Current versions only allow collective I/O for regular selection in contiguous storage

• Expanding use of collective IO in HDF5– For regular selection in chunked storage– For irregular selection for both chunked and

contiguous storage

Page 24: HDF Updae

- 26 - HDFHDF

Java and other toolsJava and other tools

Page 25: HDF Updae

- 27 - HDFHDF

Tools developmentTools development

• HDF4– hrepack and hdiff performance improved

• H4 to H5 Conversion Tools– Updated to HDF4.2r1, HDF5-1.6.4

• H5jam– New tools to add/remove user block in front of file

• H5dump– Faster for files with large numbers of objects– Can dump contents of the boot block– Can dump dataset filters, storage layout, fill value

• Parallel h5diff – Enables h5diff to run in parallel

Page 26: HDF Updae

- 28 - HDFHDF

HDF Java Products HDF Java Products

Page 27: HDF Updae

- 29 - HDFHDF

HDFView changesHDFView changes

• Support for Storage Resource Broker (SRB)– HDF5 object level access to remote files

• Display HDF5 compound datatypes with arrays• Create/display HDF5 named datatypes• Create links in HDF5• Improve ability to manipulate palette • Select row/column for xy plot in the table view

Page 28: HDF Updae

- 30 - HDFHDF

New Functions in Java APINew Functions in Java API

• Request an individual object without loading entire structure of file

• Send client request to SRB server and receive result from server

• Create HDF5 indexing table

• Query for HDF5 datasets

Page 29: HDF Updae

- 31 - HDFHDF

HDF Web-browser Plug-inHDF Web-browser Plug-in

• Extends browser to display HDF4/5 files

• A ‘lite” version of HDFView

• Analogous to PDF reader

• Fewer browsing features

• No editing features

• Windows Only

Page 30: HDF Updae

- 32 - HDFHDF

Page 31: HDF Updae

- 33 - HDFHDF

HDF Web-browser Plug-inHDF Web-browser Plug-in

• Not an applet– It is downloaded and installed once– An applet is downloaded with each invocation

• http://hdf.ncsa.uiuc.edu/plugins/

Page 32: HDF Updae

- 34 - HDFHDF

HDF-EOS module for HDFViewHDF-EOS module for HDFView

• Developed by HDF-EOS team• Optional module for HDF-EOS files

– Reads, displays HDF-EOS grid, swath, etc.– (Generic modules show native HDF5 objects)

• Tested with HDFView 2.3• To do -- get permission to release with

HDFView

Page 33: HDF Updae

- 35 - HDFHDF

Page 34: HDF Updae

- 36 - HDFHDF

Future work for JavaFuture work for Java

• Add OPeNDAP client support to HDFview– Seamlessly retrieve data from any OPeNDAP server

• Support HDF5 Dimension Scales– Recognize geospatial coordinates

• Support for HDF5 Indexing– Create indexing table and query HDF5 datasets

• H5Gen– Generate HDF5 file from XML file

Page 35: HDF Updae

- 37 - HDFHDF

Other Activities of InterestOther Activities of Interest

Page 36: HDF Updae

- 38 - HDFHDF

DOE/ASC*DOE/ASC*

• Massively parallel computing and I/O

• Complex data models and big data

• HDF5 a standard format for ASC apps

* “Advanced Simulation and Computing Program”

“ASC provides the integrating simulation and modeling capabilities and technologies needed …for future

design assessment and certification of nuclear weapons and their components”

Page 37: HDF Updae

- 39 - HDFHDF

BoeingBoeing

Page 38: HDF Updae

- 40 - HDFHDF

BoeingBoeingHDF5 for flight test dataHDF5 for flight test data

• Commercial (Boeing 787) and military planes• 787 active archive

– HDFtime_history – 10 TB per flight-test day– Also post-testing data

• Must handle raw, real-time data– Variable-length datatypes/records– High speed ingest – HDFpacket API

Page 39: HDF Updae

- 41 - HDFHDF

Boeing High Level API’sBoeing High Level API’s

• HDFpacket (see above)

• HDFtime_history– Structured records for archive, analysis, viewing– “Vertical” view, per parameter

Page 40: HDF Updae

- 42 - HDFHDF

Object encryption to support access controlObject encryption to support access control

• For Boeing

• Investigated the role of encryption in developing access control

• Developed a prototype, now being tested

Page 41: HDF Updae

- 43 - HDFHDF

IndexingIndexing

Page 42: HDF Updae

- 44 - HDFHDF

Projection Indexes in HDF5Projection Indexes in HDF5

• Standardize indexing in HDF5

• Make indexes portable

• Just a prototype

• See Rishi Sinha’s talk

Page 43: HDF Updae

- 45 - HDFHDF

Product model dataProduct model data

Page 44: HDF Updae

- 46 - HDFHDF

Product data exchange – STEPProduct data exchange – STEP

• STEP is an ISO data transfer standard.

• Defines characteristics of product throughout its life cycle.

• Widely used in design and manufacturing.

• Uses EXPRESS data modeling language to describe data.

STEPSTEP

Page 45: HDF Updae

- 47 - HDFHDF

STEP Limitations

• Currently text-based format• Requires all the objects to be in memory

• Apps starting to produce very large data volumes

• EU looking for a binary equivalent for STEP

Page 46: HDF Updae

- 48 - HDFHDF

HDF5 as binary format for STEPHDF5 as binary format for STEP

• EU identified HDF5 as best candidate

• Prototype in the works– EXPRESS HDF5 mappings– Convert sample data collections

• Workshop at U of Illinois next week.

• National Archives also funding HDF study.

Page 47: HDF Updae

- 49 - HDFHDF

BioinformaticsBioinformatics

Page 48: HDF Updae

- 50 - HDFHDF

DNA sequencing workflowsDNA sequencing workflows

• Diverse formats, some proprietary

• Highly redundant data• Repeated file processing• Disconnected programs• In-core processing

models• Lack of persistence

Page 49: HDF Updae

- 51 - HDFHDF

Multiple Levels of InformationMultiple Levels of Information

Contig Summaries

Discrepancies

Contig Qualities

Coverage Depth

Read Read qualityquality

Aligned bases

ContigContig

Reads

Percent match

TraceTrace

SNP ScoreSNP Score

Page 50: HDF Updae

- 52 - HDFHDF

HDF5 as binary format for bioinformaticsHDF5 as binary format for bioinformatics

Page 51: HDF Updae

- 54 - HDFHDF

netCDF and OPeNDAPnetCDF and OPeNDAP

Page 52: HDF Updae

- 55 - HDFHDF

netCDF-HDF ProjectnetCDF-HDF Project

• Enhanced NetCDF-4 Interface to HDF5– Combine features of netCDF and HDF5– Take advantage of their separate strengths

• Collaboration between NCSA and Unidata• Currently in Alpha Release

Page 53: HDF Updae

- 56 - HDFHDF

New OPeNDAP HDF5 ProjectNew OPeNDAP HDF5 Project

• Four parts– Bring existing prototype into conformance with the

DAP2 NASA/ESE RFC– Develop a DAP4 server for HDF5 – Develop server-side utilities to convert DAP4 data

responses to an HDF5 file– Investigate an integrated DAP-aware HDF5 library,

that could provide seamless access to both local and remote data

• Funded by NASA ROSES “Advancing Collaborative Connections for Earth-Sun System Science”

Page 54: HDF Updae

- 57 - HDFHDF

Archival formatsArchival formats

Page 55: HDF Updae

- 58 - HDFHDF

Archival formatsArchival formats

• Ruth Duerr (NSIDC) initiated investigations

• How to preserve the content & performance features of complex scientific data formats

• At the same time provide the requisite simplicity needed for long term archival storage.

• Ruth will speak about this

Page 56: HDF Updae

- 62 - HDFHDF

HydroinformaticsHydroinformatics

Page 57: HDF Updae

- 63 - HDFHDF

HydroinformaticsHydroinformatics

• HDF5 as exchange format for hydroinformatics data– Groundswell of interest lately– Sometimes in combination with netCDF 4– Talk to Mike Folk

Page 58: HDF Updae

- 64 - HDFHDF

““Hydroinformatics”Hydroinformatics”

Page 59: HDF Updae

- 65 - HDFHDF

Thank YouThank You

Page 60: HDF Updae

- 66 - HDFHDF

Questions/comments?Questions/comments?

Page 61: HDF Updae

- 67 - HDFHDF

Information SourcesInformation Sources

• HDF website– http://hdf.ncsa.uiuc.edu/

• HDF5 Information Center– http://hdf.ncsa.uiuc.edu/HDF5/

• HDF Helpdesk– [email protected]

• HDF users mailing list– [email protected]