![Page 1: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/1.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 1
HDF5 Advanced Topics
![Page 2: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/2.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 2
Outline
• Part I• Overview of HDF5 datatypes
• Part II• Partial I/O in HDF5
• Hyperslab selection• Dataset region references
• Chunking and compression
• Part III• Performance issues (how to do it right)
• Part IV• Performance benefits of HDF5 version 1.8
![Page 3: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/3.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 3
Part IHDF5 Datatypes
Quick overview of the most difficult topics
![Page 4: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/4.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 4
HDF5 Datatypes
• HDF5 has a rich set of pre-defined datatypes and supports the creation of an unlimited variety of complex user-defined datatypes.
• Datatype definitions are stored in the HDF5 file with the data.
• Datatype definitions include information such as byte order (endianess), size, and floating point representation to fully describe how the data is stored and to insure portability across platforms.
• Datatype definitions can be shared among objects in an HDF file, providing a powerful and efficient mechanism for describing data.
![Page 5: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/5.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 5
Example
Array of integers on IA32 platformNative integer is little-endian, 4 bytes
H5T_SDT_I32LE
H5Dwrite
Array of integers on SPARC64 platformNative integer is big-endian, 8 bytes
H5T_NATIVE_INT H5T_NATIVE_INT
H5Dread
Little-endian 4 bytes integer
VAX G-floating
H5Dwrite
![Page 6: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/6.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 6
Storing Variable Length Data in HDF5
![Page 7: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/7.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 7
•Data
Time•Data
•Data
•Data
•Data
•Data
•Data
•Data
•Data
Time
HDF5 Fixed and Variable Length Array Storage
![Page 8: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/8.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 8
Storing Strings in HDF5
• Array of characters (Array datatype or extra dimension in dataset)• Quick access to each character• Extra work to access and interpret each string
• Fixed lengthstring_id = H5Tcopy(H5T_C_S1);H5Tset_size(string_id, size);
• Wasted space in shorter strings• Can be compressed
• Variable lengthstring_id = H5Tcopy(H5T_C_S1);H5Tset_size(string_id, H5T_VARIABLE);
• Overhead as for all VL datatypes• Compression will not be applied to actual data
![Page 9: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/9.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 9
Storing Variable Length Data in HDF5
• Each element is represented by C structure typedef struct {
size_t length;
void *p;
} hvl_t;
• Base type can be any HDF5 typeH5Tvlen_create(base_type)
![Page 10: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/10.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 10
•Data
•Data
•Data
•Data
•Data
Example
hvl_t data[LENGTH];
for(i=0; i<LENGTH; i++) { data[i].p=malloc((i+1)*sizeof(unsigned int)); data[i].len=i+1;
}
tvl = H5Tvlen_create (H5T_NATIVE_UINT);
data[0].p
data[4].len
![Page 11: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/11.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 11
Reading HDF5 Variable Length Array
hvl_t rdata[LENGTH];
/* Create the memory vlen type */
tvl = H5Tvlen_create (H5T_NATIVE_UINT);
ret = H5Dread(dataset,tvl,H5S_ALL,H5S_ALL,
H5P_DEFAULT, rdata);
/* Reclaim the read VL data */
H5Dvlen_reclaim(tvl,H5S_ALL,H5P_DEFAULT,rdata);
On read HDF5 Library allocates memory to read data in, application only needs to allocate array of hvl_t elements (pointers and lengths).
![Page 12: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/12.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 12
Storing Tables in HDF5 file
![Page 13: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/13.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 13
Example
a_name (integer)
b_name
(float)
c_name (double)
0 0. 1.0000
1 1. 0.5000
2 4. 0.3333
3 9. 0.2500
4 16. 0.2000
5 25. 0.1667
6 36. 0.1429
7 49. 0.1250
8 64. 0.1111
9 81. 0.1000
Multiple ways to store a table Dataset for each field Dataset with compound datatype If all fields have the same type: 2-dim array 1-dim array of array datatype continued…..Choose to achieve your goal!How much overhead each type of storage will create?Do I always read all fields?Do I need to read some fields more often?Do I want to use compression?Do I want to access some records?
![Page 14: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/14.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 14
HDF5 Compound Datatypes
• Compound types• Comparable to C structs • Members can be atomic or compound
types • Members can be multidimensional• Can be written/read by a field or set of
fields• Not all data filters can be applied (shuffling,
SZIP)
![Page 15: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/15.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 15
HDF5 Compound Datatypes
• Which APIs to use?• H5TB APIs
• Create, read, get info and merge tables• Add, delete, and append records• Insert and delete fields• Limited control over table’s properties (i.e. only GZIP
compression, level 6, default allocation time for table, extendible, etc.)
• PyTables http://www.pytables.org• Based on H5TB• Python interface• Indexing capabilities
• HDF5 APIs • H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a
compound datatype• H5Dcreate, etc.• See H5Tget_member* functions for discovering properties of the
HDF5 compound datatype
![Page 16: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/16.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 16
Creating and Writing Compound Dataset
h5_compound.c example
typedef struct s1_t { int a; float b; double c; } s1_t;
s1_t s1[LENGTH];
![Page 17: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/17.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 17
Creating and Writing Compound Dataset
/* Create datatype in memory. */
s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t)); H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT); H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE); H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT);
Note: • Use HOFFSET macro instead of calculating offset by hand.• Order of H5Tinsert calls is not important if HOFFSET is used.
![Page 18: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/18.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 18
Creating and Writing Compound Dataset
/* Create dataset and write data */
dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT, H5P_DEFAULT);status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);
Note: • In this example memory and file datatypes are the same.• Type is not packed. • Use H5Tpack to save space in the file.
status = H5Tpack(s1_tid);status = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT, H5P_DEFAULT);
![Page 19: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/19.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 19
File Content with h5dump
HDF5 "SDScompound.h5" {GROUP "/" { DATASET "ArrayOfStructures" { DATATYPE { H5T_STD_I32BE "a_name"; H5T_IEEE_F32BE "b_name"; H5T_IEEE_F64BE "c_name"; } DATASPACE { SIMPLE ( 10 ) / ( 10 ) } DATA { { [ 0 ], [ 0 ], [ 1 ] }, { [ 1 ], …
![Page 20: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/20.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 20
Reading Compound Dataset
/* Create datatype in memory and read data. */
dataset = H5Dopen(file, DATASETNAME, H5P_DEFAULT);s2_tid = H5Dget_type(dataset);mem_tid = H5Tget_native_type (s2_tid);s1 = malloc(H5Tget_size(mem_tid)*number_of_elements); status = H5Dread(dataset, mem_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);
Note:
• We could construct memory type as we did in writing example.
• For general applications we need to discover the type in the file, find out corresponding memory type, allocate space and do read.
![Page 21: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/21.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 21
Reading Compound Dataset by Fields
typedef struct s2_t { double c; int a;} s2_t; s2_t s2[LENGTH];…s2_tid = H5Tcreate (H5T_COMPOUND, sizeof(s2_t)); H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c), H5T_NATIVE_DOUBLE); H5Tinsert(s2_tid, “a_name", HOFFSET(s2_t, a), H5T_NATIVE_INT);…status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2);
![Page 22: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/22.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 22
New Way of Creating Datatypes
Another way to create a compound datatype
#include H5LTpublic.h…..
s2_tid = H5LTtext_to_dtype( "H5T_COMPOUND {H5T_NATIVE_DOUBLE \"c_name\"; H5T_NATIVE_INT \"a_name\"; }", H5LT_DDL);
![Page 23: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/23.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 23
Need Help with Datatypes?
Check our support web pages
http://www.hdfgroup.uiuc.edu/UserSupport/examples-by-api/api18-c.html
http://www.hdfgroup.uiuc.edu/UserSupport/examples-by-api/api16-c.html
![Page 24: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/24.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 24
Part IIWorking with subsets
![Page 25: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/25.jpg)
Collect data one way ….
Array of images (3D)
March 9, 2009 2510th International LCI Conference - HDF5 Tutorial
![Page 26: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/26.jpg)
Stitched image (2D array)
Display data another way …
March 9, 2009 2610th International LCI Conference - HDF5 Tutorial
![Page 27: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/27.jpg)
Data is too big to read….
March 9, 2009 2710th International LCI Conference - HDF5 Tutorial
![Page 28: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/28.jpg)
Need to select and access the same elements of a dataset
Refer to a region…
March 9, 2009 2810th International LCI Conference - HDF5 Tutorial
![Page 29: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/29.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 29
HDF5 Library Features
• HDF5 Library provides capabilities to• Describe subsets of data and perform write/read
operations on subsets• Hyperslab selections and partial I/O
• Store descriptions of the data subsets in a file• Object references• Region references
• Use efficient storage mechanism to achieve good performance while writing/reading subsets of data
• Chunking, compression
![Page 30: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/30.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 30
Partial I/O in HDF5
![Page 31: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/31.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 31
How to Describe a Subset in HDF5?
• Before writing and reading a subset of data one has to describe it to the HDF5 Library.
• HDF5 APIs and documentation refer to a subset as a “selection” or “hyperslab selection”.
• If specified, HDF5 Library will perform I/O on a selection only and not on all elements of a dataset.
![Page 32: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/32.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 32
Types of Selections in HDF5
• Two types of selections• Hyperslab selection
• Regular hyperslab• Simple hyperslab• Result of set operations on hyperslabs (union,
difference, …)
• Point selection
• Hyperslab selection is especially important for doing parallel I/O in HDF5 (See Parallel HDF5 Tutorial)
![Page 33: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/33.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 33
Regular Hyperslab
Collection of regularly spaced equal size blocks
![Page 34: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/34.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 34
Simple Hyperslab
Contiguous subset or sub-array
![Page 35: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/35.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 35
Hyperslab Selection
Result of union operation on three simple hyperslabs
![Page 36: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/36.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 36
Hyperslab Description
• Start - starting location of a hyperslab (1,1)• Stride - number of elements that separate each
block (3,2)• Count - number of blocks (2,6)• Block - block size (2,1)• Everything is “measured” in number of elements
![Page 37: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/37.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 37
Simple Hyperslab Description
• Two ways to describe a simple hyperslab• As several blocks
• Stride – (1,1)• Count – (2,6)• Block – (2,1)
• As one block• Stride – (1,1)• Count – (1,1)• Block – (4,6)
No performance penalty for one way or another
![Page 38: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/38.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 38
H5Sselect_hyperslab Function
space_id Identifier of dataspace
op Selection operatorH5S_SELECT_SET or H5S_SELECT_OR
start Array with starting coordinates of hyperslab stride Array specifying which positions along a dimension to select count Array specifying how many blocks to select from the
dataspace, in each dimension block Array specifying size of element block
(NULL indicates a block size of a single element in
a dimension)
![Page 39: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/39.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 39
Reading/Writing Selections
Programming model for reading from a dataset in
a file1. Open a dataset.
2. Get file dataspace handle of the dataset and specify subset to read from.a. H5Dget_space returns file dataspace handle
a. File dataspace describes array stored in a file (number of dimensions and their sizes).
b. H5Sselect_hyperslab selects elements of the array that participate in I/O operation.
3. Allocate data buffer of an appropriate shape and size
![Page 40: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/40.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 40
Reading/Writing Selections
Programming model (continued)4. Create a memory dataspace and specify subset to write
to.1. Memory dataspace describes data buffer (its rank and
dimension sizes).
2. Use H5Screate_simple function to create memory dataspace.
3. Use H5Sselect_hyperslab to select elements of the data buffer that participate in I/O operation.
5. Issue H5Dread or H5Dwrite to move the data between file and memory buffer.
6. Close file dataspace and memory dataspace when done.
![Page 41: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/41.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 41
Example : Reading Two Rows
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Data in a file4x6 matrix
Buffer in memory1-dim array of length 14
![Page 42: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/42.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 42
Example: Reading Two Rows
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
start = {1,0}count = {2,6}block = {1,1}stride = {1,1}
filespace = H5Dget_space (dataset);H5Sselect_hyperslab (filespace, H5S_SELECT_SET, start, NULL, count, NULL)
![Page 43: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/43.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 43
Example: Reading Two Rows
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
start[1] = {1}count[1] = {12}dim[1] = {14}
memspace = H5Screate_simple(1, dim, NULL);H5Sselect_hyperslab (memspace, H5S_SELECT_SET, start, NULL, count, NULL)
![Page 44: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/44.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 44
Example: Reading Two Rows
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
-1 7 8 9 10 11 12 13 14 15 16 17 18 -1
H5Dread (…, …, memspace, filespace, …, …);
![Page 45: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/45.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 45
Things to Remember
• Number of elements selected in a file and in a memory buffer must be the same • H5Sget_select_npoints returns number of
selected elements in a hyperslab selection
• HDF5 partial I/O is tuned to move data between selections that have the same dimensionality; avoid choosing subsets that have different ranks (as in example above)
• Allocate a buffer of an appropriate size when reading data; use H5Tget_native_type and H5Tget_size to get the correct size of the data element in memory.
![Page 46: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/46.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 46
HDF5 Region References and Selections
![Page 47: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/47.jpg)
Need to select and access the same elements of a dataset
Saving Selected Region in a File
March 9, 2009 4710th International LCI Conference - HDF5 Tutorial
![Page 48: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/48.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 48
Reference Datatype
• Reference to an HDF5 object• Pointer to a group or a dataset in a file
• Predefined datatype H5T_STD_REG_OBJ describe object references
• Reference to a dataset region (or to selection)• Pointer to the dataspace selection
• Predefined datatype H5T_STD_REF_DSETREG to describe regions
![Page 49: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/49.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 49
Reference to Dataset Region
REF_REG.h5
Root
Region ReferencesMatrix
1 1 2 3 3 4 5 5 61 2 2 3 4 4 5 6 6
![Page 50: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/50.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 50
Reference to Dataset Region
Example
dsetr_id = H5Dcreate(file_id, “REGION REFERENCES”, H5T_STD_REF_DSETREG, …);
H5Sselect_hyperslab(space_id, H5S_SELECT_SET, start, NULL, …);H5Rcreate(&ref[0], file_id, “MATRIX”,H5R_DATASET_REGION, space_id);
H5Dwrite(dsetr_id, H5T_STD_REF_DSETREG, H5S_ALL, H5S_ALL, H5P_DEFAULT,ref);
![Page 51: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/51.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 51
Reference to Dataset RegionHDF5 "REF_REG.h5" {GROUP "/" { DATASET "MATRIX" { …… } DATASET "REGION_REFERENCES" { DATATYPE H5T_REFERENCE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): DATASET /MATRIX {(0,3)-(1,5)}, (1): DATASET /MATRIX {(0,0), (1,6), (0,8)} } }}}
![Page 52: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/52.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 52
Chunking in HDF5
![Page 53: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/53.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 53
HDF5 Chunking
• Dataset data is divided into equally sized blocks (chunks).• Each chunk is stored separately as a contiguous block in
HDF5 file.
Application memory
Metadata cacheDataset headerDataset header
………….Datatype
Dataspace………….Attributes
…
File
Dataset data
A DC BheaderChunkindex
Chunkindex
A B C D
![Page 54: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/54.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 54
HDF5 Chunking
• Chunking is needed for• Enabling compression and other filters
• Extendible datasets
![Page 55: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/55.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 55
HDF5 Chunking
• If used appropriately chunking improves partial I/O for big datasets
Only two chunks are involved in I/O
![Page 56: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/56.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 56
HDF5 Chunking
• Chunk has the same rank as a dataset• Chunk’s dimensions do not need to be factors of
dataset’s dimensions
![Page 57: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/57.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 57
Creating Chunked Dataset
1. Create a dataset creation property list.2. Set property list to use chunked storage layout.3. Create dataset with the above property list.
dcpl_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 100; H5Pset_chunk(dcpl_id, rank, ch_dims); dset_id = H5Dcreate (…, dcpl_id); H5Pclose(dcpl_id);
![Page 58: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/58.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 58
Writing or Reading Chunked Dataset
1. Chunking mechanism is transparent to application.
2. Use the same set of operation as for contiguous dataset, for example,
H5Dopen(…);
H5Sselect_hyperslab (…);
H5Dread(…);
3. Selections do not need to coincide precisely with the chunks boundaries.
![Page 59: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/59.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 59
HDF5 Filters
• HDF5 filters modify data during I/O operations• Available filters:
1. Checksum (H5Pset_fletcher32)2. Shuffling filter (H5Pset_shuffle)3. Data transformation (in 1.8.*)4. Compression
• Scale + offset (in 1.8.*)• N-bit (in 1.8.*)• GZIP (deflate), SZIP (H5Pset_deflate, H5Pset_szip)• User-defined filters (BZIP2)
• Example of a user-defined compression filter can be found http://www.hdfgroup.uiuc.edu/papers/papers/bzip2/
![Page 60: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/60.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 60
Creating Compressed Dataset
1. Create a dataset creation property list2. Set property list to use chunked storage layout3. Set property list to use filters4. Create dataset with the above property list
crp_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 100; H5Pset_chunk(crp_id, rank, ch_dims); H5Pset_deflate(crp_id, 9); dset_id = H5Dcreate (…, crp_id); H5Pclose(crp_id);
![Page 61: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/61.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 61
Writing Compressed Dataset
C BA
…………..
Default chunk cache size is 1MB. Filters including compression are applied when chunk is evicted from cache.Chunks in the file may have different sizes
AB C
C
File
Chunk cache (per dataset)Chunked dataset
Filter pipeline
![Page 62: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/62.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 62
Chunking Basics to Remember
• Chunking creates storage overhead in the file.• Performance is affected by
• Chunking and compression parameters • Chunking cache size (H5Pset_cache call)
• Some hints for getting better performance• Use chunk size not smaller than block size (4k) on
a file system.• Use compression method appropriate for your
data.• Avoid using selections that do not coincide with
the chunking boundaries.
![Page 63: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/63.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 63
Example
Creates a compressed 1000x20 integer dataset in a file
%h5dump –p –H zip.h5
HDF5 "zip.h5" {GROUP "/" { GROUP "Data" { DATASET "Compressed_Data" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 1000, 20 )……… STORAGE_LAYOUT { CHUNKED ( 20, 20 ) SIZE 5316 }
![Page 64: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/64.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 64
Example (continued)
FILTERS { COMPRESSION DEFLATE { LEVEL 6 } } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_INCR } } }}}
![Page 65: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/65.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 65
Example (bigger chunk)
Creates a compressed integer dataset 1000x20 in afile; better compression ratio is achieved.
h5dump –p –H zip.h5
HDF5 "zip.h5" {GROUP "/" { GROUP "Data" { DATASET "Compressed_Data" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 1000, 20 )……… STORAGE_LAYOUT { CHUNKED ( 200, 20 ) SIZE 2936 }
![Page 66: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/66.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 66
Part IIIPerformance Issues(How to Do it Right)
![Page 67: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/67.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 67
Performance of Serial I/O Operations
• Next slides show the performance effects of using different access patterns and storage layouts.
• We use three test cases which consist of writing a selection to an array of characters.
• Data is stored in a row-major order.• Tests were executed on THG Linux x86_64 box
using h5perf_serial and HDF5 version 1.8.0
![Page 68: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/68.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 68
Serial Benchmarking Tool
• Benchmarking tool, h5perf_serial, publicly released with HDF5 1.8.1
• Features inlcude:• Support for POSIX and HDF5 I/O calls.• Support for datasets and buffers with multiple
dimensions.• Entire dataset access using a single or several I/O
operations.• Selection of contiguous and chunked storage for HDF5
operations.
![Page 69: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/69.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 69
Contiguous Storage (Case 1)
• Rectangular dataset of size 48K x 48K, with write selections of 512 x 48K.
• HDF5 storage layout is contiguous.• Good I/O pattern for POSIX and
HDF5 because each selection is contiguous.
• POSIX: 5.19 MB/s• HDF5: 5.36 MB/s
1
2
3
4
1 2 3 4
![Page 70: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/70.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 70
Contiguous Storage (Case 2)
• Rectangular dataset of 48K x 48K, with write selections of 48K x 512.
• HDF5 storage layout is contiguous.
• Bad I/O pattern for POSIX and HDF5 because each selection is noncontiguous.
• POSIX: 1.24 MB/s• HDF5: 0.05 MB/s
1 2 3 4
1 2 3 4 1 2 3 4 …….
![Page 71: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/71.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 71
Chunked Storage
• Rectangular dataset of 48K x 48K, with write selections of 48K x 512.
• HDF5 storage layout is chunked. Chunks and selections sizes are equal.
• Bad I/O case for POSIX because selections are noncontiguous.
• Good I/O case for HDF5 since selections are contiguous due to chunking layout settings.
• POSIX: 1.51 MB/s• HDF5: 5.58 MB/s
1 2 3 4
1 2 3 4
1 2 3 4 1 2 3 4 …….
POSIX
HDF5
![Page 72: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/72.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 72
Conclusions
• Access patterns with small I/O operations incur high latency and overhead costs many times.
• Chunked storage may improve I/O performance by affecting the contiguity of the data selection.
![Page 73: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/73.jpg)
Writing Chunked Dataset
• 1000x100x100 dataset• 4 byte integers
• Random values 0-99
• 50x100x100 chunks (20 total)• Chunk size: 2 MB
• Write the entire dataset using 1x100x100 slices• Slices are written sequentially
March 9, 2009 7310th International LCI Conference - HDF5 Tutorial
![Page 74: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/74.jpg)
Test Setup
• 20 Chunks
• 1000 slices• Chunk size is 2MB
March 9, 2009 7410th International LCI Conference - HDF5 Tutorial
![Page 75: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/75.jpg)
Test Setup (continued)
• Tests performed with 1 MB and 5MB chunk cache size• Cache size set with H5Pset_cache function
H5Pget_cache (fapl, NULL, &rdcc_nelmts,
&rdcc_nbytes, &rdcc_w0);
H5Pset_cache (fapl, 0, rdcc_nelmts,
5*1024*1024, rdcc_w0);
• Tests performed with no compression and with gzip (deflate) compression
March 9, 2009 7510th International LCI Conference - HDF5 Tutorial
![Page 76: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/76.jpg)
Effect of Chunk Cache Size on Write
Cache size I/O operations Total data written
File size
1 MB (default) 1002 75.54 MB 38.15 MB
5 MB 22 38.16 MB 38.15 MB
No compression
Gzip compression
Cache size I/O operations Total data written
File size
1 MB (default) 1982 335.42 MB(322.34 MB read)
13.08 MB
5 MB 22 13.08 MB 13.08 MB
March 9, 2009 7610th International LCI Conference - HDF5 Tutorial
![Page 77: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/77.jpg)
Effect of Chunk Cache Size on Write
• With the 1 MB cache size, a chunk will not fit into the cache• All writes to the dataset must be immediately
written to disk• With compression, the entire chunk must be read
and rewritten every time a part of the chunk is written to
• Data must also be decompressed and recompressed each time
• Non sequential writes could result in a larger file• Without compression, the entire chunk must be
written when it is first written to the file• If the selection were not contiguous on disk, it could
require as much as 1 I/O operation for each elementMarch 9, 2009 7710th International LCI Conference - HDF5 Tutorial
![Page 78: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/78.jpg)
Effect of Chunk Cache Size on Write
• With the 5 MB cache size, the chunk is written only after it is full• Drastically reduces the number of I/O operations
• Reduces the amount of data that must be written (and read)
• Reduces processing time, especially with the compression filter
March 9, 2009 7810th International LCI Conference - HDF5 Tutorial
![Page 79: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/79.jpg)
Conclusion
• It is important to make sure that a chunk will fit into the raw data chunk cache
• If you will be writing to multiple chunks at once, you should increase the cache size even more• Try to design chunk dimensions to minimize the
number you will be writing to at once
March 9, 2009 7910th International LCI Conference - HDF5 Tutorial
![Page 80: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/80.jpg)
Reading Chunked Dataset
• Read the same dataset, again by slices, but the slices cross through all the chunks
• 2 orientations for read plane• Plane includes fastest changing dimension
• Plane does not include fastest changing dimension
• Measure total read operations, and total size read• Chunk sizes of 50x100x100, and 10x100x100• 1 MB cache
March 9, 2009 8010th International LCI Conference - HDF5 Tutorial
![Page 81: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/81.jpg)
• Chunks
• Read slices• Vertical and horizontal
Test Setup
March 9, 2009 8110th International LCI Conference - HDF5 Tutorial
![Page 82: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/82.jpg)
Results
• Read slice includes fastest changing dimension
Chunk size Compression I/O operations Total data read
50 Yes 2010 1307 MB
10 Yes 10012 1308 MB
50 No 100010 38 MB
10 No 10012 3814 MB
March 9, 2009 8210th International LCI Conference - HDF5 Tutorial
![Page 83: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/83.jpg)
Results (continued)
• Read slice does not include fastest changing dimension
Chunk size Compression I/O operations Total data read
50 Yes 2010 1307 MB
10 Yes 10012 1308 MB
50 No 10000010 38 MB
10 No 10012 3814 MB
March 9, 2009 8310th International LCI Conference - HDF5 Tutorial
![Page 84: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/84.jpg)
Effect of Cache Size on Read
• When compression is enabled, the library must always read each entire chunk once for each call to H5Dread.
• When compression is disabled, the library’s behavior depends on the cache size relative to the chunk size.• If the chunk fits in cache, the library reads each
entire chunk once for each call to H5Dread• If the chunk does not fit in cache, the library reads
only the data that is selected• More read operations, especially if the read plane
does not include the fastest changing dimension• Less total data read
March 9, 2009 8410th International LCI Conference - HDF5 Tutorial
![Page 85: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/85.jpg)
Conclusion
• In this case cache size does not matter when reading if compression is enabled.
• Without compression, a larger cache may not be beneficial, unless the cache is large enough to hold all of the chunks.• The optimum cache size depends on the exact
shape of the data, as well as the hardware.
March 9, 2009 8510th International LCI Conference - HDF5 Tutorial
![Page 86: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/86.jpg)
Hints for Chunk Settings
• Chunk dimensions should align as closely as possible with hyperslab dimensions for read/write
• Chunk cache size (rdcc_nbytes) should be large enough to hold all the chunks in the selection• If this is not possible, it may be best to disable chunk
caching altogether (set rdcc_nbytes to 0)
• rdcc_nelmts should be a prime number that is at least 10 to 100 times the number of chunks that can fit into rdcc_nbytes
• rdcc_w0 should be set to 1 if chunks that have been fully read/written will never be read/written again
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 86
![Page 87: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/87.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 87
Part IVPerformance Benefits of
HDF5 version 1.8
![Page 88: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/88.jpg)
What Did We Do in HDF5 1.8?
• Extended File Format Specification • Reviewed group implementations• Introduced new link object• Revamped metadata cache implementation• Improved handling of datasets and datatypes• Introduced shared object header message• Extended error handling• Enhanced backward/forward APIs and file format
compatibility
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 88
![Page 89: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/89.jpg)
What Did We Do in HDF5 1.8?
And much more good stuff to make HDF5
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 89
Better and Faster
![Page 90: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/90.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 90
HDF5 File Format Extension
![Page 91: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/91.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 91
HDF5 File Format Extension
• Why: • Address deficiencies of the original file format
• Address space overhead in an HDF5 file
• Enable new features
• What: • New routine that instructs the HDF5 library to
create all objects using the latest version of the HDF5 file format (cmp. with the earliest version when object became available, for example, array datatype)
![Page 92: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/92.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 92
HDF5 File Format Extension
Example
/* Use the latest version of a file format for each object created in a file */
fapl_id = H5Pcreate(H5P_FILE_ACCESS);H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);fid = H5Fcreate(…,…,…,fapl_id);orfid = H5Fopen(…,…,fapl_id);
![Page 93: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/93.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 93
Group Revisions
![Page 94: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/94.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 94
Better Large Group Storage
• Why: • Faster, more scalable storage and access for large
groups
• What: • New format and method for storing groups with
many links
![Page 95: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/95.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 95
Informal Benchmark
• Create a file and a group in a file• Create up to 10^6 groups with one dataset in
each group• Compare files sizes and performance of HDF5
1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7
• Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups
![Page 96: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/96.jpg)
Time to Open and Read a Dataset
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 96
![Page 97: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/97.jpg)
File Size
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 97
![Page 98: March 9, 200910th International LCI Conference - HDF5 Tutorial1 HDF5 Advanced Topics](https://reader037.vdocuments.mx/reader037/viewer/2022102910/5697c02a1a28abf838cd8393/html5/thumbnails/98.jpg)
March 9, 2009 10th International LCI Conference - HDF5 Tutorial 98
Questions?