a plfs plugin for hdf5 for improved i/o performance and analysis kshitij mehta 1, john bent 2, aaron...
TRANSCRIPT
A PLFS Plugin for HDF5 for Improved I/O Performance and
AnalysisKshitij Mehta1, John Bent2, Aaron Torres3, Gary
Grider3, Edgar Gabriel1
1 University of Houston, Texas2 EMC Corp.3 Los Alamos National Lab
DISCS 2012
Talk Outline● Background
– HDF5– PLFS
● Plugin– Goals and Design
● Semantic Analysis● Experiments and Results● Conclusion
HDF5 – An Overview
● Hierarchical Data Format● Data model, File format, and API● Tool for managing complex data● Widely used in industry and academia● User specifies data objects and logical relationship
between them● HDF5 maintains data structures, memory
management, metadata creation, file I/O
HDF5 – An Overview (II)
● Parallel HDF5– Build with an MPI library– File create, dataset create, group create etc. are
collective calls
● User can select POSIX I/O, or parallel I/O using MPI-IO (individual/collective)
● File portable between access by sequential, PHDF5
File
HDF5 – An Overview (III)
Group
Group
D1 D2
D3
Metadata D1 D2 D3
.h5 file
PEs
HDF5 – An Overview (IV)● File is a top level object, collection of objects● Dataset is a multi-dimensional array
– Dataspace● Number of dimensions● Size of each dimension
– Datatype● Native (int, float, etc.)● Compound (~struct)
● Group is a collection of objects (groups, datasets, attributes)● Attributes used to annotate user data● Hyperslab selection
– Specify offset, stride in the dataspace– e.g. write selected hyperslab from matrix in memory to selected
hyperslab in dataset in file
HDF5 Virtual Object Layer (VOL)
● Recently introduced by the HDF group
● New abstraction layer, intercepts API calls
● Forwards calls to object plugin● Allows third party plugin
development ● Data can be stored in any
format– netCDF, HDF4 etc.
Public API
.h5netCDF
Object Plugin
Opportunities in HDF5• Preserve semantic information about HDF5
objects• Single .h5 file a black box
• Allows performing post-processing on individual HDF5 objects
• Improve I/O performance on certain file systems• N-1 access often results in sub-optimal I/O
performance on file systems like Lustre
PLFS
• Parallel Log Structured File System developed at LANL, CMU, EMC
• Middleware positioned between application and underlying file system
• Transforms N-1 access pattern into N-N
• Processes write to separate files, sufficient metadata maintained to re-create the original shared file
• Demonstrated benefits on many parallel file systems
Goals of the new plugin
• Store data in a new format, different from the native single file format• Preserves semantic information
• Perform additional analysis and optimizations
• Use PLFS to read/write data objects• Tackles performance problem due to N-1 access
Plugin Design
• Implementation for various object functions
• Provide a raw mapping of HDF5 objects to the underlying file system
• HDF5 file, groups stored as directories
• Datasets as PLFS files
• Attributes as PLFS files stored as dataset_name.attr_name
• Use PLFS API calls in the plugin
• PLFS Xattrs store dataset metadata (datatype, dataspace,..)
• Xattrs provide key-value type access
PLFS Plugin● Relative path describes relationship between objects● User still sees the same API
File
Group
Group
D1 D2
D3
File/
Group/
D1
D2
Group/
D3
Semantic Analysis (I)
• Active Analysis• Application can provide a data parser function
• PLFS applies function on the streaming data
• Function outputs key-value pairs which can be embedded in extensible metadata
• e.g. recording the height of the largest wave in ocean data within each physical file
• Quick retrieval of the largest wave, since only need to search extensible metadata
• Extensible metadata can be stored on burst buffers for faster retrieval
Active Analysis (II)
PE PLFSdata
Parser
Parser Outpu
t
FS
Burst Buffer
Semantic Analysis (II)
• Semantic Restructuring• Allows re-organizing data into a new set of PLFS
shards
• e.g. assume ocean model stored row-wise
• Column-wise access expensive
• Analysis routine can ask for “column-wise re-ordering”
• PLFS knows what it means, since it knows the structure
• Avoids application having to restructure data by calculating a huge list of logical offsets
Semantic Restructuring (II)
Restructure
HDF5 Datasets
“Re-order wave lengths recorded in October 2012 in column-major (Hour x Day)”
Experiments and Results● Lustre FS, 12 OSTs, 1M stripe size● HDF5 performance tool “h5perf”● Multiple processes write data to multiple
datasets in a file● Bandwidth values presented are average of 3
runs● 1,2,4,8,32,64 PEs
– 4 PEs/node max
● 10 datasets, minimum total data size 64G● Comparing MPI-IO Lustre, Plugin, AD_PLFS
(PLFS MPI-IO driver)● Individual I/O (non-collective) tests
Write Contiguous
• Aligned transfer size of 1M• For almost all cases, plugin better than
MPI-IO , AD_PLFS shows best performance
Write Interleaved
• Unaligned transfer size of (1M + 10 bytes)• Plugin performance > MPI-IO
Read Performance
• Contiguous reads ( 1M ) and Interleaved reads ( 1M+10 bytes )
• Similar trend as in writes• MPI-IO < Plugin < AD_PLFS
Conclusion● New plugin for HDF5 developed using
PLFS API● New output format allows for Semantic
Analysis● Using PLFS improves I/O performance● Tests show plugin performs better than
MPI-IO in most cases, AD_PLFS shows best performance
● Future Work: Use AD_PLFS API calls in the plugin instead of native PLFS API calls, provide collective I/O in the plugin
Thank You
Acknowledgements:• Quincey Koziol, Mohamad Chaarawi – HDF group• University of Dresden for access to Lustre FS
• Why not use AD_PLFS on default .h5 file ?• Changing output format allows for
semantic analysis• Provides a more object-based storage
(DOE fast forward proposal – EMC, Intel, HDF working towards an object stack)
Questions