hdf5 advanced topics: selections, object's properties, storage methods and filters

74
1 HDF HDF HDF5 Advanced Topics HDF5 Advanced Topics Selections Selections Object’s Properties Object’s Properties Storage Methods and Filters Storage Methods and Filters HDF and HDF HDF and HDF - - EOS Workshop IX EOS Workshop IX November 30, 2005 November 30, 2005

Upload: the-hdf-eos-tools-and-information-center

Post on 26-May-2015

95 views

Category:

Technology


3 download

DESCRIPTION

This Tutorial is designed for the HDF5 users with some HDF5 experience. It will cover advanced features of the HDF5 library for achieving better I/O performance and efficient storage. The following HDF5 features will be discussed: partial I/O, compression and other filters including new n-bit and scale+offset filters, and data storage options. Significant time will be devoted to the discussion of complex HDF5 datatypes such as strings, variable-length, array and compound datatypes. Participants will work with the Tutorial examples and exercises during the hands-on sessions.

TRANSCRIPT

Page 1: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

1 HDFHDF

HDF5 Advanced TopicsHDF5 Advanced TopicsSelectionsSelections

Object’s PropertiesObject’s PropertiesStorage Methods and FiltersStorage Methods and Filters

HDF and HDFHDF and HDF--EOS Workshop IXEOS Workshop IXNovember 30, 2005November 30, 2005

Page 2: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

2 HDFHDF

TopicsTopics

Goal: Introduce HDF5 selections and object’s properties

Hyperslab and Point Selection

HDF5 Dataset propertiesI/O and Storage Properties (filters)

HDF5 File propertiesI/O and Storage Properties (drivers)

Page 3: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

3 HDFHDF

Working with SelectionsWorking with Selections

Page 4: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

4 HDFHDF

What is a Selection?What is a Selection?

A portion of a dataset’s dataspace:

• Hyperslab: It can be a logically contiguous collection of points in a dataspace, or it can be a regular pattern of points or blocks in a dataspace.

• Individual Points: Selected points in the dataspace

• Results of Set Operations on hyperslabsor points (union, difference, …)

Page 5: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

5 HDFHDF

HyperslabHyperslab SelectionSelectionDataset

Hyperslab

+

Hyperslab

=

Union ofHyperslabs

Page 6: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

6 HDFHDF

Reading Dataset into Memory from FileReading Dataset into Memory from File

2D array of 16-bit ints 3D array of 32-bit ints

File Memory

2-d array

Regularlyspaced series

of cubes

The only restriction is that the number of selected elements on the left be the same as on the right.

Page 7: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

7 HDFHDF

Steps for Making SelectionsSteps for Making Selections

• Open the file• Open the dataset• Create a file dataspace for the dataset• Create a memory dataspace for the dataset• Make the selection(s)• Read from or write to the dataset• Close the dataset, file dataspace, memory dataspace,

and file

Page 8: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

8 HDFHDF

herr_therr_t H5Sselect_hyperslab H5Sselect_hyperslab ((hid_t hid_t space_idspace_id, , H5S_seloper_t opH5S_seloper_t op, , const const hsize_thsize_t *offset*offset, , const hsize_t *strideconst hsize_t *stride, , const hsize_t *countconst hsize_t *count, , const hsize_t *block)const hsize_t *block)

space_id IN: Identifier of dataspaceop IN: Selection operator to use

H5S_SELECT_SET: replace existing selection w/parameters from this call

offset IN: Array with starting coordinates of hyperslab stride IN: Array specifying which positions along a

dimension to selectcount IN: Array specifying how many blocks to select from

the dataspace, in each dimension block IN: Array specifying size of element block (NULL

indicates a block size of a single element in a dimension)

Page 9: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

9 HDFHDF

herr_t herr_t H5Sselect_hyperslab H5Sselect_hyperslab ((hid_t space_idhid_t space_id, , H5S_seloper_t opH5S_seloper_t op, , const const hsize_thsize_t *offset*offset, , const hsize_t *strideconst hsize_t *stride, , const const hsize_thsize_t *count*count, , const hsize_t *block)const hsize_t *block)

space_id IN: Identifier of dataspace op IN: Selection operator to use

H5S_SELECT_SET: replace existing selection w/parameters from this call

offset IN: Array with starting coordinates of hyperslab stride IN: Array specifying which positions along a

dimension to selectcount IN: Array specifying how many blocks to select from

the dataspace, in each dimension block IN: Array specifying size of element block (NULL

indicates a block size of a single element in a dimension)

Page 10: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

10 HDFHDF

herr_t herr_t H5Sselect_hyperslabH5Sselect_hyperslab ((hid_t space_idhid_t space_id, , H5S_seloper_t opH5S_seloper_t op, , const const hsize_thsize_t *offset*offset, , const hsize_t *strideconst hsize_t *stride, , const hsize_t *countconst hsize_t *count, , const hsize_t *block)const hsize_t *block)

space_id IN: Identifier of dataspace op IN: Selection operator to use

H5S_SELECT_SET: replace existing selection w/parameters from this call

offset IN: Array with starting coordinates of hyperslabstride IN: Array specifying which positions along a

dimension to selectcount IN: Array specifying how many blocks to select from

the dataspace, in each dimension block IN: Array specifying size of element block (NULL

indicates a block size of a single element in a dimension)

Page 11: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

11 HDFHDF

herr_t herr_t H5Sselect_hyperslab H5Sselect_hyperslab ((hid_t hid_t space_idspace_id, , H5S_seloper_t opH5S_seloper_t op, , const const hsize_thsize_t *offset*offset, , const hsize_t *strideconst hsize_t *stride, , const hsize_t *countconst hsize_t *count, , const hsize_t *block)const hsize_t *block)

space_id IN: Identifier of dataspace op IN: Selection operator to use

H5S_SELECT_SET: replace existing selection w/parameters from this call

offset IN: Array with starting coordinates of hyperslab stride IN: Array specifying which positions along a

dimension to selectcount IN: Array specifying how many blocks to select from

the dataspace, in each dimension block IN: Array specifying size of element block (NULL

indicates a block size of a single element in a dimension)

Page 12: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

12 HDFHDF

herr_therr_t H5Sselect_hyperslabH5Sselect_hyperslab ((hid_t space_idhid_t space_id, , H5S_seloper_t opH5S_seloper_t op, , const const hsize_thsize_t *offset*offset, , const hsize_t *strideconst hsize_t *stride, , const hsize_t *countconst hsize_t *count, , const hsize_t *block)const hsize_t *block)

space_id IN: Identifier of dataspace op IN: Selection operator to use

H5S_SELECT_SET: replace existing selection w/parameters from this call

offset IN: Array with starting coordinates of hyperslab stride IN: Array specifying which positions along a

dimension to selectcount IN: Array specifying how many blocks to select from

the dataspace, in each dimensionblock IN: Array specifying size of element block (NULL

indicates a block size of a single element in a dimension)

Page 13: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

13 HDFHDF

herr_t herr_t H5Sselect_hyperslabH5Sselect_hyperslab ((hid_t space_idhid_t space_id, , H5S_seloper_t opH5S_seloper_t op, , const const hsize_thsize_t *offset*offset, , const hsize_t *strideconst hsize_t *stride, , const hsize_t *countconst hsize_t *count, , const hsize_t *blockconst hsize_t *block))

space_id IN: Identifier of dataspace op IN: Selection operator to use

H5S_SELECT_SET: replace existing selection w/parameters from this call

offset IN: Array with starting coordinates of hyperslab stride IN: Array specifying which positions along a

dimension to selectcount IN: Array specifying how many blocks to select from

the dataspace, in each dimension block IN: Array specifying size of element block (NULL

indicates a block size of a single element in a dimension)

Page 14: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

14 HDFHDF

HyperslabHyperslab Example (1Example (1--D) D)

offset (0) = 1 block (0) = 1 stride (0) = 2(or NULL)

Page 15: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

15 HDFHDF

HyperslabHyperslab ExampleExampledim1 10

X X

XXX

X

XX

X

X To select X’sDataset size= {8, 10}Offset= {0, 1}Block size= {3, 2}Count= {1, 2}Stride= {4, 5}

X Xdim0

0-based

8

What happens if you change Stride= {2, 5} ? (won’t work)What happens if you change Count = {2, 2} ?

Page 16: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

16 HDFHDF

HyperslabHyperslab ExampleExample10

X X

X XX

X

XX

X

X To select X’sDataset size= {8, 10}Offset= {0, 1}Block size= {3, 2}Count= {2, 2}Stride= {4, 5}

X X

8X XX XX X

X XX XX X

Page 17: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

17 HDFHDF

HyperslabHyperslab ExampleExample10

X X

X XX

X

XX

X

X To select X’sDataset size= {8, 10}Offset= {0, 1}Block size= {3, 2}Count= {2, 2}Stride= {4, 5}

X X

8X XX XX X

X XX XX X

What happens if you changed Block size= {1, 1} ?

Page 18: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

18 HDFHDF

HyperslabHyperslab ExampleExample10

X X To select X’sDataset size= {8, 10}Offset= {0, 1}Block size= {1, 1}Count= {2, 2}Stride= {4, 5}

8X X

Page 19: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

19 HDFHDF

Example: Selection from Dataset Example: Selection from Dataset -- CC

X X X X

X X X X

X X X X

Y = 6

X = 5

count[1] = 4

block[0]=1

offset = {1,2}count[0] = 3block[1]=1

1 offset [0] = 1;2 offset [1] = 2;3 count [0] = 3;4 count [1] = 4;5 status = H5Sselect_hyperslab (dataspace,

H5S_SELECT_SET,offset,NULL, count, NULL);

Page 20: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

20 HDFHDF

Set Up Memory Set Up Memory DataspaceDataspace

dimsm[0] = 3;dimsm[1] = 4;memspace = H5Screate_simple (2, dimsm, NULL);

Page 21: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

21 HDFHDF

Read/Write Using SelectionRead/Write Using Selection

status = H5Dread (…, …, memspace, dataspace, …, …);

number of elements selected in memory space must be the same as the number of elements selected in dataspace

Page 22: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

22 HDFHDF

Individual Points SelectionIndividual Points Selection

Page 23: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

23 HDFHDF

herr_therr_t H5Sselect_elements (H5Sselect_elements (hid_t space_idhid_t space_id, , H5S_seloper_t opH5S_seloper_t op, , size_tsize_t num_elemnum_elem, , const const hsize_thsize_t **coord**coord ))

space_id IN: Identifier of the dataspace

op IN: Selection operator to use H5S_SELECT_SET: replace existing selection

with parameters from this call

num_elem IN: Number of elements to be selected

coord IN: A 2-D array specifying the coordinates of theelements being selected

Page 24: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

24 HDFHDF

herr_therr_t H5Sselect_elements (H5Sselect_elements (hid_t hid_t space_idspace_id, , H5S_seloper_t opH5S_seloper_t op, , size_tsize_t num_elemnum_elem, , const const hsize_thsize_t **coord**coord ))

space_id IN: Identifier of the dataspace

op IN: Selection operator to use H5S_SELECT_SET: replace existing selection

with parameters from this call

num_elem IN: Number of elements to be selected

coord IN: A 2-D array specifying the coordinates of theelements being selected

Page 25: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

25 HDFHDF

herr_therr_t H5Sselect_elements (H5Sselect_elements (hid_t hid_t space_IDspace_ID, H5S_seloper_t op, H5S_seloper_t op, , size_tsize_t num_elemnum_elem, , const const hsize_thsize_t **coord**coord ))

space_id IN: Identifier of the dataspace

op IN: Selection operator to use H5S_SELECT_SET: replace existing selection

with parameters from this call

num_elem IN: Number of elements to be selected

coord IN: A 2-D array specifying the coordinates of theelements being selected

Page 26: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

26 HDFHDF

herr_therr_t H5Sselect_elements (H5Sselect_elements (hid_t spacEH5S_seloper_t ophid_t spacEH5S_seloper_t op, , size_t size_t num_elemnum_elem, , const const hsize_thsize_t **coord**coord ))

space_id IN: Identifier of the dataspace

op IN: Selection operator to use H5S_SELECT_SET: replace existing selection

with parameters from this call

num_elem IN: Number of elements to be selected

coord IN: A 2-D array specifying the coordinates of theelements being selected

Page 27: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

27 HDFHDF

ExampleExample

0 53 0 59

0 0 0 0

0 0 0 0

53(0,1)

59(0,3)53 59val

Writes 53 and 59 to coordinates (0,1) and (0,3) in first dataset.

Page 28: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

28 HDFHDF

Example: C CodeExample: C Code

1 hsize_t coord[2][2];

2 sid = H5Dget_space (dataset1);

3 coord[0][0] = 0; coord[0][1] = 3;4 coord[1][0] = 0; coord[1][1] = 1;

5 ret = H5Sselect_elements (sid, H5S_SELECT_SET, 2, (const hssize_t **)coord);

Get the dataspace identifier from the file

Set the selected point positions

Select the elements in the file space

Page 29: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

29 HDFHDF

Memory Memory DataspaceDataspace

hsize_t marray[] = {2};…mid1 = H5Screate_simple (1, marray, NULL); .

Page 30: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

30 HDFHDF

Read/Write Using SelectionRead/Write Using Selection

status = H5Dread (…, …, memspace, dataspace, …, …);

The number of elements selected in the memory space must be the same number asis selected in the dataspace.

Page 31: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

31 HDFHDF

HDF5 PropertiesHDF5 Properties

Page 32: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

32 HDFHDF

PropertiesPropertiesDefinitionDefinition

• Mechanism to control different features of the HDF5 objects– There are default values for these features– HDF5 H5P (Property List) interface allows users to

modify the default features• At object creation time (creation properties)• At object access time (access or transfer properties)

Page 33: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

33 HDFHDF

PropertiesPropertiesDefinitionsDefinitions

• A property list is a list of name-value pairs

• A property list is passed as an optional parameters to the HDF5 APIs

• Property lists are used/ignored by all the layers of the library, as needed

Page 34: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

34 HDFHDF

Type of PropertiesType of Properties

• Predefined and User defined property lists

• Predefined:– File creation– File access– Dataset creation– Dataset access

Page 35: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

35 HDFHDF

Properties (Example)Properties (Example)HDF5 FileHDF5 File

• H5Fcreate(…,creation_prop_id,…)• Creation properties (how file is created?)

– Library’s defaults• no user’s block• predefined sizes of offsets and addresses of the objects in the

file (64-bit for DEC Alpha, 32-bit on Windows)– User’s settings

• User’s block • 32-bit sizes on 64-bit platform• Control over B-trees for chunking storage (split factor)

Page 36: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

36 HDFHDF

User’s BlockUser’s Block

– User block stores user-defined information (e.gASCII text to describe a file) at the beginning of the file

– h5jam – utility to add user block to HDF5 file

Page 37: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

37 HDFHDF

Properties (Example)Properties (Example)HDF5 FileHDF5 File

• H5Fcreate(…,access_prop_id)• Access properties or drivers (How is file

accessed? What is the physical layout on the disk?)– Library defaults

• STDIO Library (UNIX fwrite, fread)– User’s defined

• MPI I/O for parallel access• Family of files (100 Gb HDF5 represented by 50 2Gb UNIX

files)• Size of the chunk cache

Page 38: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

38 HDFHDF

Properties (Example)Properties (Example)HDF5 DatasetHDF5 Dataset

• H5Dcreate(…,creation_prop_id)• Creation properties (how dataset is created)

– Library’s defaults• Storage: Contiguous• Compression: None• Space is allocated when data is first written• No fill value is written

– User’s settings • Storage: Compact, or chunked, or external • Compression• Fill value• Control over space allocation in the file for raw data

– at creation time– at write time

Page 39: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

39 HDFHDF

Properties (Example)Properties (Example)HDF5 DatasetHDF5 Dataset

• H5Dwrite<read>(…,access_prop_id)• Access (transfer) properties

– Library defaults• 1MB conversion buffer• Error detection on read (if was set during write)• MPI independent I/O for parallel access

– User defined• MPI collective I/O for parallel access• Size of the datatype conversion buffer• Control over partial I/O to improve performance

Page 40: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

40 HDFHDF

Properties Properties Programming modelProgramming model

• Use predefined property type– H5P_FILE_CREATE – H5P_FILE_ACCESS– H5P_DATASET_CREATE– H5P_DATASET_ACCESS

• Create new property instance– H5Pcreate – H5Pcopy– H5Fget_access_plist; H5Fget_create_plist– H5Dget_create_plist

• Modify property (see H5P APIs)• Use property to modify object feature• Close property when done

– H5Pclose

Page 41: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

41 HDFHDF

PropertiesPropertiesProgramming modelProgramming model

• General model of usage: get plist, set values, pass to libraryhid_t plist = H5Pcreate(copy);

H5Pset_foo( plist, vals);H5Xdo_something( Xid, …, plist);H5Pclose(plist);

Page 42: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

42 HDFHDF

HDF5 Dataset Creation HDF5 Dataset Creation Properties and Predefined Properties and Predefined

FiltersFilters

Page 43: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

43 HDFHDF

Dataset Creation PropertiesDataset Creation Properties

• Storage Layout– Contiguous (default)– Compact – Chunked – External

• Filters applied to raw data– Compression– Checksum

• Fill value• Space allocation for raw data in the file

Page 44: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

44 HDFHDF

Dataset Creation Properties Dataset Creation Properties Storage LayoutsStorage Layouts

Storage layout is important for I/O performance and size of the HDF5 files

Page 45: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

45 HDFHDF

Storage Layout: Contiguous (default)Storage Layout: Contiguous (default)

• Used when data will be written/read at once

• Sub-sampling can be faster than chunked• H5Dcreate(…,H5P_DEFAULT)

Page 46: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

46 HDFHDF

Storage Layout: CompactStorage Layout: Compact

• Used for small datasets (order of O(bytes)) for better I/O

• Raw data is written/read at the time when dataset is open

• File is less fragmented

Page 47: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

47 HDFHDF

Storage Layout: ChunkedStorage Layout: Chunked

• Chunked layout is needed for– Extendible datasets– Compression and other filters– To improve partial I/O for big datasets

Better subsetting access time; extendiblechunked

Only two chunks will be written/read

Page 48: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

48 HDFHDF

Storage Layout: ExternalStorage Layout: External

• Dataset’s raw data is stored in an external file• Easy to include existing data into HDF5 file• Easy to export raw data if application needs it• Disadvantage: user has to keep track of additional files

to preserve integrity of the HDF5 file

Metadata for “A”

Dataset “A”HDF5 fileHDF5 file

External fileExternal file

Raw data for “ARaw data for “A””

Raw data can be stored in external file

Page 49: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

49 HDFHDF

Setting Storage LayoutSetting Storage Layout

hid_t plist = H5Pcreate (H5P_DATASET_CREATE);

Compact: H5Pset_layout (plist, H5D_COMPACT)

Chunked: H5Pset_chunk (plist, rank, ch_dims);

External: H5Pset_external (plist, “raw_data.ext”, offset, size);

dset_id = H5Dcreate (…, … ,…, plist);H5Pclose (plist);

Page 50: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

50 HDFHDF

HDF5 Dataset Creation FiltersHDF5 Dataset Creation Filters

Filters are a mechanism to manipulate data while transferring it between memory and disk.

Chunks of a dataset can be arranged in a pipeline so that output of one filter becomes input of the next filter.

Page 51: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

51 HDFHDF

Dataset Creation Properties Dataset Creation Properties Compression and other Pipeline FiltersCompression and other Pipeline Filters

• HDF5 predefined filters (H5P interface)– Compression (gzip, szip)– Shuffling and checksum filters

• User defined filters (H5Z and H5P interfaces)– Example: Bzip2 compression

http://hdf.ncsa.uiuc.edu/HDF5/papers/papers/bzip2/

Page 52: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

52 HDFHDF

Compression and other Pipeline FiltersCompression and other Pipeline Filters(continued)(continued)

• Currently used only with chunked datasets• Filters can be combined together

– Shuffle + checksum filter + GZIP– Checksum filter + user define encryption filter

• Filters are called in the order they are defined on writing and in the reverse order on reading

• The order is important!• User is responsible for “filter pipeline sanity”

– GZIP + SZIP + shuffle doesn’t make sense– Shuffle + SZIP does

Page 53: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

53 HDFHDF

Creating compressed DatasetCreating compressed Dataset

• Compression– Improves transmission speed– Improves storage efficiency– Requires chunking– May increase CPU time needed for compression

Memory File

Compressed

Page 54: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

54 HDFHDF

Checksum FilterChecksum Filter

• HDF5 includes the Fletcher32 checksum algorithm for error detection.

• It is automatically included in HDF5• To use this filter you must add it to the filter pipeline

with H5Pset_filter.

Memory

Checksum value

Page 55: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

55 HDFHDF

Shuffling filterShuffling filter

• Predefined HDF5 filter

• Not a compression; change of byte order in a stream of data

Page 56: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

56 HDFHDF

00 00 00 01 00 00 00 17 00 00 00 2B

00 00 00 00 00 00 01 17 2B

00 00 00 01 00 00 00 17 00 00 00 2B

00 00 00

Page 57: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

57 HDFHDF

Effect of data shuffling Effect of data shuffling (H5Pset_shuffle + H5Pset_deflate)(H5Pset_shuffle + H5Pset_deflate)

• Write 4-byte integer dataset 256x256x1024 (256MB)• Using chunks of 256x16x1024 (16MB)• Values: random integers between 0 and 255

File size Total time Write Time

No Shuffle

Shuffle

102.9MB 671.049 629.45

67.34MB 83.353 78.268

Compression combined with shuffling provides•Better compression ratio•Better I/O performance

Page 58: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

58 HDFHDF

Enabling FiltersEnabling Filters

hid_t plist = H5Pcreate (H5P_DATASET_CREATE);H5Pset_chunk (plist, ndims, chkdims);

GZIP Compression: H5Pset_deflate (plist, level);

SZIP Compression: H5Pset_szip (plist, options-mask, numpixels);Checksum Filter: H5Pset_filter (plist, H5Z_FILTER_FLETCHER32,

0, 0, NULL);Shuffle Filter w/GZIP: H5Pset_shuffle(plist);

H5Pset_deflate(plist, level);

dset_id = H5Dcreate (…, … ,…, plist);H5Pclose (plist);

Page 59: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

59 HDFHDF

UserUser--defined Filtersdefined Filters

Page 60: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

60 HDFHDF

Standard Interface for UserStandard Interface for User--defined Filtersdefined Filters

• H5Zregister : Register filter so that HDF5 knows about it

• H5Zunregister: Unregister a filter• H5Pset_filter: Adds a filter to the filter pipeline• H5Pget_filter: Returns information about a filter

in the pipeline• H5Zfilter_avail: Check if filter is available

Page 61: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

61 HDFHDF

HDF5 Dataset Access (Transfer) HDF5 Dataset Access (Transfer) PropertiesProperties

Page 62: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

62 HDFHDF

Dataset Access/Transfer PropertiesDataset Access/Transfer Properties

• Improve performance• H5Pset_buffer

– Sets the size of the datatype conversion buffer during I/O (default is 1MB)

• Other functions

Page 63: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

63 HDFHDF

File Creation PropertiesFile Creation Properties

Page 64: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

64 HDFHDF

hid_t H5Fcreate (const char *name, unsigned flags, hid_t create_id, hid_t access_id)

name IN: Name of the file to accessflags IN: File access flagscreate_id IN: File creation property list identifier access_id IN: File access property list identifier

Page 65: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

65 HDFHDF

File Creation PropertiesFile Creation Properties

• H5Pset_userblock– User block stores user-defined information (e.g ASCII

text to describe a file) at the beginning of the file– Sets the size of the user block – 512 bytes, 1024 bytes, … (2N for N>7).

• H5Pset_sizes– Sets the byte size of the offsets and lengths used to

address objects in the file

• Others

Page 66: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

66 HDFHDF

File Access PropertiesFile Access Properties

Page 67: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

67 HDFHDF

File Access Properties (Performance)File Access Properties (Performance)

• H5Pset_cache (this function is changing in 5-1.8)– Sets raw data chunk parameters– Improper size will degrade performance

• H5Pset_meta_block_size– Reduces the number of small objects in the file– Block of metadata is written in a single I/O operation

(default 2K)– VFL driver has to set

H5FD_AGGREGATE_METADATA• H5Pset_sieve_buffer

– Improves partial I/O

Page 68: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

68 HDFHDF

File Access Properties (Physical storage File Access Properties (Physical storage and Usage of Lowand Usage of Low--level I/O Libraries)level I/O Libraries)

VFL layer file drivers:• Define physical storage of the HDF5 file

– Memory driver (HDF5 file in the application’s memory)– Stream driver (HDF5 file written to a socket)– Split(multi) files driver– Family driver

• Define low level I/O library– MPI I/O driver for parallel access– STDIO vs. SEC2

Page 69: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

69 HDFHDF

Files needn’t be files Files needn’t be files -- Virtual File LayerVirtual File LayerVFL: A public API for writing I/O drivers

memorympiostdio

Hid_t

Files Memory

““File” HandleFile” Handle

I/O drivers

network

Network

VFL: Virtual File I/O LayerVFL: Virtual File I/O Layer

““Storage”Storage”

splitfamily

Page 70: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

70 HDFHDF

Split FilesSplit Files• Allows you to split metadata and data into separate files• May reside on different file systems for better I/O• Disadvantage: User has to keep track of the files

HDF5 file

Dataset “A”

Dataset “B” Data A

Data B

Metadata file Raw data file

Page 71: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

71 HDFHDF

File FamiliesFile Families

• Allows you to access files larger than 2GB on file systems that don't support large files

• Any HDF5 file can be split into a family of files and vice versa

• A family member size must be a power of two

Page 72: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

72 HDFHDF

Modifying File Access PropertiesModifying File Access Properties

hid_t plist = H5Pcreate (H5P_FILE_ACCESS);

Split Files: H5Pset_fapl_split (plist, “.met”, H5P_DEFAULT, “.dat”, H5P_DEFAULT);

File Family: H5Pset_fapl_family (plist, family_size, H5P_DEFAULT);

file_id = H5Fcreate (…, … ,…, plist);H5Pclose (plist);

Page 73: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

73 HDFHDF

HDF InformationHDF Information

• HDF Information Center– http://hdf.ncsa.uiuc.edu/

• HDF Help email address– [email protected]

• HDF users mailing list– [email protected]

Page 74: HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Filters

74 HDFHDF

Thank youThank youThis presentation is based upon work supported in part by a Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA. Other support provided by NCSA and other sponsors and agencies(http://hdf.ncsa.uiuc.edu/acknowledge.html).