hdf updae
DESCRIPTION
Update on HDF, including recent changes to the software, upcoming releases, collaborations, future plans. Will include an overview of the upcoming HDF5 1.8 release, and updates on the netCDF4/HDF5 merge, HDF5 support for indexing, BioHDF, the HDF5-Storage Resource Broker project, and the HDF spin-off THG.TRANSCRIPT
- 1 - HDFHDF
Mike Folk
National Center for Supercomputing Applications
HDF and HDF-EOS Workshop IX
December 1, 2005
HDF UpdateHDF Update
HDFHDF
- 2 - HDFHDF
OutlineOutline
• Organizational info
• HDF Software Update
• Other Activities of Interest
- 3 - HDFHDF
Organizational infoOrganizational info
- 4 - HDFHDF
The HDF TeamThe HDF Team
Frank BakerFrank BakerChristian ChilanChristian ChilanPeter CaoPeter CaoVailin ChoiVailin ChoiMike FolkMike FolkFang GuoFang GuoAnne JenningsAnne JenningsBarbara JonesBarbara JonesQuincey KoziolQuincey KoziolJames LairdJames Laird
Raymond LuRaymond LuJohn MainzerJohn MainzerPedro NunesPedro NunesElena PourmalElena PourmalBinh-minh RiblerBinh-minh RiblerEric ShapiroEric ShapiroRishi SinhaRishi SinhaArash TermehchyArash TermehchyKent YangKent Yang
And all those wonderful folks out there who And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and contribute ideas, requests, bug reports, code, and support.support.
- 7 - HDFHDF
The HDF Group is MovingThe HDF Group is Moving
HDFHDF
- 8 - HDFHDF
““The HDF Group” = “THG”The HDF Group” = “THG”
- 9 - HDFHDF
THGTHG
• Why spin off from U of Illinois?– Creating a sustainable organization– We do more than R&D
• THG already exists
- 10 - HDFHDF
How will THG be different from the NCSA How will THG be different from the NCSA HDF Group?HDF Group?
• Business model
• Location
• Staff
• THG – NCSA – UIUC relations
• Affect on NASA and other affiliation
• Intellectual property
- 11 - HDFHDF
HDF Software UpdateHDF Software Update
- 12 - HDFHDF
Major software milestones since Oct. 2004Major software milestones since Oct. 2004
JanJan FebFeb MarMar AprApr MayMay JunJun JulJul AugAug SepSep OctOctNovNov
HDF Java 2.1HDF Java 2.1
HDF Web browser plug-inHDF Web browser plug-in
HDF5 1.6.5HDF5 1.6.5
HDF 4.2r1HDF 4.2r1
HDF5 1.6.4HDF5 1.6.4HDF4-to-HDF5 conversion tools 1.2HDF4-to-HDF5 conversion tools 1.2HDF Java 2.2HDF Java 2.2
20052005
DecDec NovNov
20042004
- 13 - HDFHDF
Release highlightsRelease highlights
- 14 - HDFHDF
HDF 4.2r1 – February 2005HDF 4.2r1 – February 2005
• Szip compression fixed
• Windows– hdiff and hrepack added– Config, build, testing procedures improved
• h4fc utility fixed
- 15 - HDFHDF
HDF 4.2r1 – new compilers and platformsHDF 4.2r1 – new compilers and platforms
• Mac OS X– Fortran IBM xlf v. 8.1 – Absoft f95 v. 8.2
• AMD Opteron• Cray TS IEEE
• Linux 2.4– Absoft Fortran f95 v.
9.0– PGI C and Fortran– Intel C and Fortran
- 16 - HDFHDF
HDF5-1.6.4 – HDF5-1.6.4 – March 2005March 2005
• High-Level (HL) library– Some new C APIs added– Fortran APIs added– HL library now built and installed by default
• Library built and tested with SZIP 2.0.• Many changes to improve library performance
– Especially for variable length types and metadata cache
• H5jam – a new utility– Allows a text file to be added to the "user block" at the
beginning of an HDF5 file
- 17 - HDFHDF
Platforms to be dropped in future releases Platforms to be dropped in future releases
• Operating systems– Solaris 2.8– HPUX B.11.00– Crays T3E and T90– Linux RH 7.* and 8.*– Windows 2000
• Compilers– We use the latest
versions of vendors compilers as they become available and drop the previous ones
- 18 - HDFHDF
Platforms to be addedPlatforms to be added
• Systems– Solaris 2.10– Cray X1– Cray XT3– NEC SX6– HP 64-bit (HPUX
11.23)– Mac OS 10.4
• Compilers– gcc 4.*– HDF5 Fortran: Leahy,
NAG, G95– MPI-2
- 19 - HDFHDF
Coming next: Major release HDF5 1.8Coming next: Major release HDF5 1.8
• Windows MPICH support: prototype• Integer to float conversions
– Will support integer to float conversions during I/O– http://hdf.ncsa.uiuc.edu/RFC/dtype_conv_overflow/Overflow.html
• New error-handling API• Dimension scales
– Similar to dimension scales in HDF4– http://hdf.ncsa.uiuc.edu/RFC/H5DimScales/H5dimscale_Specification_1_
0-5.pdf
- 20 - HDFHDF
• N-bit compression filter– Compact storage for user-defined datatypes.– http://hdf.ncsa.uiuc.edu/RFC/NBitPacking/NBitPacking.html
• Offset+size storage filter – Performs a scale and/or offset operation on each data value,
truncating the resulting value to a lesser number of bits before storing it.
– http://hdf.ncsa.uiuc.edu/RFC/ScaleOffsetCompress/ScaleOffsetCompress.html
- 21 - HDFHDF
• Group revisions – Option to access objects according to creation order– Improved performance for groups containing a large
number of objects.– http://hdf.ncsa.uiuc.edu/RFC/ReviseGroups/
• Improved metadata cache– New metadata cache improves performance and
memory usage in the library.– Apps that access files with a large number of objects
should see significant performance improvement and should use less memory.
- 22 - HDFHDF
• Data transformation filter– Performs data transformation during I/O operations.– Transform expressed by algebraic formula (e.g. a*x + b) – http://hdf.ncsa.uiuc.edu/HDF5/doc_dev_snapshot/H5_dev/
html/RM_H5P.html#Property-SetDataTransform
• Ph5diff – parallel h5diff– Compares two files in an MPI parallel environment.– Compares multiple datasets simultaneously.– http://hdf.ncsa.uiuc.edu/RFC/PH5DIFF/
- 23 - HDFHDF
• HDFpacket API – Data collected in “packets”– “Horizontal” view, per time step– Efficient access to fixed- and variable-length records– http://hdf.ncsa.uiuc.edu/RFC/HDF5Packet/Tech_reprt_
HDF5Packet.pdf
• Possible: HDFtime_history API– Archival, viewing, analysis– “Vertical” view, per parameter
- 24 - HDFHDF
SZIP integration with HDF4 and HDF5SZIP integration with HDF4 and HDF5
• Development and integration completed– Includes libraries and tools
• SZIP documentation web page– http://hdf.ncsa.uiuc.edu/doc_resource/SZIP/– Examples and performance studies for HDF5
- 25 - HDFHDF
Parallel I/O and chunkingParallel I/O and chunking
• Collective I/O – key to improving performance for parallel HDF5
• Current versions only allow collective I/O for regular selection in contiguous storage
• Expanding use of collective IO in HDF5– For regular selection in chunked storage– For irregular selection for both chunked and
contiguous storage
- 26 - HDFHDF
Java and other toolsJava and other tools
- 27 - HDFHDF
Tools developmentTools development
• HDF4– hrepack and hdiff performance improved
• H4 to H5 Conversion Tools– Updated to HDF4.2r1, HDF5-1.6.4
• H5jam– New tools to add/remove user block in front of file
• H5dump– Faster for files with large numbers of objects– Can dump contents of the boot block– Can dump dataset filters, storage layout, fill value
• Parallel h5diff – Enables h5diff to run in parallel
- 28 - HDFHDF
HDF Java Products HDF Java Products
- 29 - HDFHDF
HDFView changesHDFView changes
• Support for Storage Resource Broker (SRB)– HDF5 object level access to remote files
• Display HDF5 compound datatypes with arrays• Create/display HDF5 named datatypes• Create links in HDF5• Improve ability to manipulate palette • Select row/column for xy plot in the table view
- 30 - HDFHDF
New Functions in Java APINew Functions in Java API
• Request an individual object without loading entire structure of file
• Send client request to SRB server and receive result from server
• Create HDF5 indexing table
• Query for HDF5 datasets
- 31 - HDFHDF
HDF Web-browser Plug-inHDF Web-browser Plug-in
• Extends browser to display HDF4/5 files
• A ‘lite” version of HDFView
• Analogous to PDF reader
• Fewer browsing features
• No editing features
• Windows Only
- 32 - HDFHDF
- 33 - HDFHDF
HDF Web-browser Plug-inHDF Web-browser Plug-in
• Not an applet– It is downloaded and installed once– An applet is downloaded with each invocation
• http://hdf.ncsa.uiuc.edu/plugins/
- 34 - HDFHDF
HDF-EOS module for HDFViewHDF-EOS module for HDFView
• Developed by HDF-EOS team• Optional module for HDF-EOS files
– Reads, displays HDF-EOS grid, swath, etc.– (Generic modules show native HDF5 objects)
• Tested with HDFView 2.3• To do -- get permission to release with
HDFView
- 35 - HDFHDF
- 36 - HDFHDF
Future work for JavaFuture work for Java
• Add OPeNDAP client support to HDFview– Seamlessly retrieve data from any OPeNDAP server
• Support HDF5 Dimension Scales– Recognize geospatial coordinates
• Support for HDF5 Indexing– Create indexing table and query HDF5 datasets
• H5Gen– Generate HDF5 file from XML file
- 37 - HDFHDF
Other Activities of InterestOther Activities of Interest
- 38 - HDFHDF
DOE/ASC*DOE/ASC*
• Massively parallel computing and I/O
• Complex data models and big data
• HDF5 a standard format for ASC apps
* “Advanced Simulation and Computing Program”
“ASC provides the integrating simulation and modeling capabilities and technologies needed …for future
design assessment and certification of nuclear weapons and their components”
- 39 - HDFHDF
BoeingBoeing
- 40 - HDFHDF
BoeingBoeingHDF5 for flight test dataHDF5 for flight test data
• Commercial (Boeing 787) and military planes• 787 active archive
– HDFtime_history – 10 TB per flight-test day– Also post-testing data
• Must handle raw, real-time data– Variable-length datatypes/records– High speed ingest – HDFpacket API
- 41 - HDFHDF
Boeing High Level API’sBoeing High Level API’s
• HDFpacket (see above)
• HDFtime_history– Structured records for archive, analysis, viewing– “Vertical” view, per parameter
- 42 - HDFHDF
Object encryption to support access controlObject encryption to support access control
• For Boeing
• Investigated the role of encryption in developing access control
• Developed a prototype, now being tested
- 43 - HDFHDF
IndexingIndexing
- 44 - HDFHDF
Projection Indexes in HDF5Projection Indexes in HDF5
• Standardize indexing in HDF5
• Make indexes portable
• Just a prototype
• See Rishi Sinha’s talk
- 45 - HDFHDF
Product model dataProduct model data
- 46 - HDFHDF
Product data exchange – STEPProduct data exchange – STEP
• STEP is an ISO data transfer standard.
• Defines characteristics of product throughout its life cycle.
• Widely used in design and manufacturing.
• Uses EXPRESS data modeling language to describe data.
STEPSTEP
- 47 - HDFHDF
STEP Limitations
• Currently text-based format• Requires all the objects to be in memory
• Apps starting to produce very large data volumes
• EU looking for a binary equivalent for STEP
- 48 - HDFHDF
HDF5 as binary format for STEPHDF5 as binary format for STEP
• EU identified HDF5 as best candidate
• Prototype in the works– EXPRESS HDF5 mappings– Convert sample data collections
• Workshop at U of Illinois next week.
• National Archives also funding HDF study.
- 49 - HDFHDF
BioinformaticsBioinformatics
- 50 - HDFHDF
DNA sequencing workflowsDNA sequencing workflows
• Diverse formats, some proprietary
• Highly redundant data• Repeated file processing• Disconnected programs• In-core processing
models• Lack of persistence
- 51 - HDFHDF
Multiple Levels of InformationMultiple Levels of Information
Contig Summaries
Discrepancies
Contig Qualities
Coverage Depth
Read Read qualityquality
Aligned bases
ContigContig
Reads
Percent match
TraceTrace
SNP ScoreSNP Score
- 52 - HDFHDF
HDF5 as binary format for bioinformaticsHDF5 as binary format for bioinformatics
- 54 - HDFHDF
netCDF and OPeNDAPnetCDF and OPeNDAP
- 55 - HDFHDF
netCDF-HDF ProjectnetCDF-HDF Project
• Enhanced NetCDF-4 Interface to HDF5– Combine features of netCDF and HDF5– Take advantage of their separate strengths
• Collaboration between NCSA and Unidata• Currently in Alpha Release
- 56 - HDFHDF
New OPeNDAP HDF5 ProjectNew OPeNDAP HDF5 Project
• Four parts– Bring existing prototype into conformance with the
DAP2 NASA/ESE RFC– Develop a DAP4 server for HDF5 – Develop server-side utilities to convert DAP4 data
responses to an HDF5 file– Investigate an integrated DAP-aware HDF5 library,
that could provide seamless access to both local and remote data
• Funded by NASA ROSES “Advancing Collaborative Connections for Earth-Sun System Science”
- 57 - HDFHDF
Archival formatsArchival formats
- 58 - HDFHDF
Archival formatsArchival formats
• Ruth Duerr (NSIDC) initiated investigations
• How to preserve the content & performance features of complex scientific data formats
• At the same time provide the requisite simplicity needed for long term archival storage.
• Ruth will speak about this
- 62 - HDFHDF
HydroinformaticsHydroinformatics
- 63 - HDFHDF
HydroinformaticsHydroinformatics
• HDF5 as exchange format for hydroinformatics data– Groundswell of interest lately– Sometimes in combination with netCDF 4– Talk to Mike Folk
- 64 - HDFHDF
““Hydroinformatics”Hydroinformatics”
- 65 - HDFHDF
Thank YouThank You
- 66 - HDFHDF
Questions/comments?Questions/comments?
- 67 - HDFHDF
Information SourcesInformation Sources
• HDF website– http://hdf.ncsa.uiuc.edu/
• HDF5 Information Center– http://hdf.ncsa.uiuc.edu/HDF5/
• HDF Helpdesk– [email protected]
• HDF users mailing list– [email protected]