using hdf5 tools for performance tuning and troubleshooting
TRANSCRIPT
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
1
Using HDF5 tools for performance tuning and
troubleshooting
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
2
Introduction
• HDF5 tools may be very useful for performance tuning and troubleshooting• Discover objects and their properties in
HDF5 filesh5dump -p
• Get file size overhead informationh5stat
• Get locations of the objects in a fileh5ls
• Discover differencesh5diff, h5ls
• Location of raw datah5ls –vra
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
3
h5stat
• Prints different statistics about HDF5 file• Helps
• To troubleshoot size overhead in HDF5 files• To choose specific object’s properties and
storage strategies
• To use h5stat --help
h5stat file.h5
• Spec can be found http://www.hdfgroup.org/RFC/h5stat/
• Let us know if you need some “special” type of statistics
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
4
h5stat
• Reports two types of statistics:• High-level information about objects
(examples):• Number of different objects (groups,
datasets, datatypes) in a file• Number of unique datatypes• Size of raw data in a file
• Information about object’s structural metadata • Sizes of structural metadata (total/free)
• Object headers, local and global heaps• Sizes of B-trees
• Object headers fragmentation
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
5
h5stat
• Examples of high-level information:
File information # of unique groups: 10008 # of unique datasets: 30 # of unique named datatypes: 0……………………Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19……………………Group bins: # of groups of size 0: 10000 # of groups of size 1 - 9: 7 # of groups of size 10 - 99: 1……………………
Max. dimension size of 1-D datasets: 1643……………………Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
6
h5stat
• Conclusion:
• There are a lot of empty groups in the file; good candidate for compact group feature
• Some datasets use “user-defined” filters and may not be readable by HDF5 library
• SZIP compression is needed to read some datasets
Oh… my application uses buffers of size 1024 to read data…No wonder it crashes on reading…Do I have all filters needed to read the data?
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
7
h5stat
• Examples of structural metadata information:Object header size: (total/unused)
Groups: 1808/72
Datasets: 15792/832
………
Dataset storage information:
Total raw data size: 6140688
………
Dataset datatype #3:
Count (total/named) = (2/0)
Size (desc./elmt) = (10/65535)
Dataset datatype #4:
Count (total/named) = (1/0)
Size (desc./elmt) = (10/32000)
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
8
h5stat
• Conclusions• File size: 6228197• 1.5% overhead (not bad at all!)• There some elements are of size 65535
and 32000
Oh… Is it really what I want?Should I use other datatype and get advantage of compression?
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
9
Case study: Using HDF5tools to debug a problem
• My applications creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong?
• h5diff good.h5 bad.h5 Datatype: </Definitions/timespec> and </Definitions/timespec> 1
differences found
• h5ls –vr good.h5 /Definitions/timespec Type Location: 0:1:0:900
• h5debug good.h5 900Message Information:Type class: compoundSize: 8 bytes
• h5debug bad.h5 900Message Information:Type class: compoundSize: 16 bytes
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
10
Case study: Using HDF5tools to debug a problem
• Conclusions• Compound datatype “timespec” requires
different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes)
Oh… How do I read my data back?I assumed that my struct would need only 8 bytes for each elements but it needs 16 bytes on VS2005. I need H5Tget_native_type functionto find the type of my data in memory
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
11
Where is my data?
• h5ls –var be_data.h5:Opened "be_data.h5" with sec2 driver.
/Array Dataset {5/5, 6/6}
Location: 0:1:0:792
Links: 1
Modified: 2006-04-07 15:08:39 CDT
Storage: 240 logical bytes, 240 allocated bytes, 100.00% utilization
Type: IEEE 64-bit big-endian float
Address: 2048
• 30 8-byte elements can be read from address 2048 by non-HDF5 application
04/12/23 HDF and HDF-EOS Workshop X, Landover, MD
12
Questions? Comments?
?
Thank you!