high performance computing: concepts, methods & means parallel i/o : file systems and libraries

75
High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries Prof. Thomas Sterling Department of Computer Science Louisiana State University March 29 th , 2007

Upload: lewis

Post on 26-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries. Prof. Thomas Sterling Department of Computer Science Louisiana State University March 29 th , 2007. Topics. Introduction RAID Distributed File Systems (NFS) Parallel File Systems (PVFS2) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

High Performance Computing: Concepts, Methods & Means

Parallel I/O : File Systems and Libraries

Prof. Thomas SterlingDepartment of Computer Science

Louisiana State University

March 29th, 2007

Page 2: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

2

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI-IO)

• Parallel File Formats (HDF5)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 3: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

3

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI-IO)

• Parallel File Formats (HDF5)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 4: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

• Storage capacity: 1 TB per drive• Areal density: 132 Gbit/in2 (perpendicular recording)• Rotational speed: 15,000 RPM• Average latency: 2 ms• Seek time

– Track-to-track: 0.2 ms– Average: 3.5 ms– Full stroke: 6.7 ms

• Sustained transfer rate: up to 125 MB/s• Non-recoverable error rate: 1 in 1017

• Interface bandwidth:– Fibre channel: 400 MB/s– Serially Attached SCSI (SAS): 300 MB/s– Ultra320 SCSI: 320 MB/s– Serial ATA (SATA): 300 MB/s

Permanent Storage: Hard Disks Review

4

Page 5: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Storage – SATA & Overview- Review

• Serial ATA is the newest commodity hard disk standard.

• SATA uses serial buses as opposed to parallel buses used by ATA and SCSI.

• The cables attached to SATA drives are smaller and run faster (around 150 MB/s).

• The Basic disk technologies remain the same across the three busses

• The platters in disk spin at variety of speeds, faster the platters spin the faster the data can be read off the disk and data on the far end of the platter will become available sooner.

• Rotational speeds range between 5400 RPM to 15000 RPM

• Faster the platters rotate, the lower the latency and higher the bandwidth.

5

PATA vs SATA

Page 6: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

I/O Needs on Parallel Computers

• High Performance– Take advantage of parallel I/O paths (when available) – Support application-level data access and throughput needs

• Data Integrity– sanely deal with hardware and power failures

• Single Namespace– All nodes and users “see” the same file systems– Equal access from anywhere on the resource.

• Ease of Use – Where possible, a parallel file system should be accessible

in consistent way, in the same ways as a traditional UNIX-style file systems.

6Ohio Supercomputer Center

Page 7: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

7

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI-IO)

• Parallel File Formats (HDF5)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 8: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Parallel I/O - RAID• RAID stands for Redundant Array of Inexpensive

Disks provides a mechanism by which the performance and storage properties of individual disks can be aggregated

• Group of disks appear to be a single large disks; performance of multiple disks is better than single disks.

• Using multiple disks helps store data in multiple places allowing the system to continue functioning.

• Both software and hardware raid solutions available.

• Hardware solutions are more expensive, but provide better performance without CPU overhead.

• Software solutions provide various levels of flexibility but have associated computational overhead.

8

Page 9: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

RAID : Key Concepts• Variety of RAID allocation schemes :• RAID 0 (disk striping without redundant

storage) :– Data is striped across multiple disks.

– The result of striping is a logical storage device that has the capacity of each disk times the number of disks present in the raid array.

– Both read and write performances are accelerated.

– Each byte of data can be read from multiple locations, so interleaving reads between disks can help double read performance.

– No Fault tolerance

– High transfer rates

– High request rates

9http://www.drivesolutions.com/datarecovery/raid.shtml

Page 10: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

RAID : Key Concepts• RAID 1 (disk mirroring):

– Complete copies of data are stored on multiple locations.

– Capacity of one of these RAID sets will be half of its raw capacity. Read performance is accelerated and is comparable to Raid 0.

– Writes are slowed down, as new data needs to be transmitted multiple times.

• RAID 5:– Like Raid 0 data is striped across multiple disks,

with parity being distributed across the disks.

– For any block of data stored across the drives, their parity checksum is computed and is stored on a predetermined disk.

– Read performance of RAID 5 is reduced as the parity data is distributed across drives, and the write performance lags behind because of checksum computation. 10

http://www.drivesolutions.com/datarecovery/raid.shtml

Page 11: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

11

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI-IO)

• Parallel File Formats (HDF5)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 12: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Distributed File Systems

• A distributed file system is a file system that is stored locally on one system (server) but is accessible by processes on many systems (clients).

• Multiple processes access multiple files simultaneously.

• Other attributes of a DFS may include :– Access control lists (ACLs)– Client-side file replication– Server- and client- side caching

• Some examples of DFSes:– NFS (Sun)– AFS (CMU)– DCE/DFS (Transarc / IBM)– CIFS (Microsoft)

• Distributed file systems can be used by parallel programs, but they have significant disadvantages :

– The network bandwidth of the server system is a limiting factor on performance– To retain UNIX-style file consistency, the DFS software must implement some form of

locking which has significant performance implications

12Ohio Supercomputer Center

Page 13: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Distributed File System : NFS

• Popular means for accessing remote file systems in a local area network.

• Based on the client-server model , the remote file systems are “mounted” via NFS and accessed through the Linux virtual file system (VFS) layer.

• NFS clients cache file data, periodically checking with the original file for any changes.

• The loosely-synchronous model makes for convenient, low-latency access to shared spaces.

• NFS avoids the common locking systems used to implement POSIX semantics.

13

Page 14: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Why NFS is bad for Parallel I/O• Clients can cache data indiscriminately, and tend to

block boundaries. • When nearby regions of a file are written by different

processes on different clients, the result is undefined due to lack of consistency control.

• Secondly all file operations are remote operations. Extensive file locking required to implement sequential consistency

• Communication between client and server typically uses relatively slow communication channels, adding to performance degradation.

• Inefficient specification (eg. a read operation involves two RPC operations (one for look-up of file handle and second for reading of file data) 14

Page 15: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

15

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI-IO)

• Parallel File Formats (HDF5)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 16: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Parallel File Systems• Parallel File System is one in which there are multiple servers as

well as clients for a given file system, equivalent of RAID across several file systems.

• Multiple processes can access the same file simultaneously• Parallel File Systems are usually optimized for high performance

rather than general purpose use, common optimization criterion being : – very large block sizes ( => 64kB)

– relatively slow metadata operations (eg. fstat()) compared to reads and writes

– Special APIs for direct access

• Examples of Parallel file systems include : – GPFS (IBM)

– LUSTRE (Cluster File Systems)

– PVFS2 (Clemson/ANL)

16Ohio Supercomputer Center

Page 17: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Characteristics of Parallel File Systems

• Three Key Characteristics :– Various hardware I/O data storage resources

– Multiple connections between these hardware devices and compute resources.

– High-performance, concurrent access to these I/O resources.

• Multiple physical I/O devices and paths ensures sufficient bandwidth for the high performance desired.

• Parallel I/O systems include both the hardware and number of layers of software

17

Storage HardwareStorage Hardware

Parallel File SystemParallel File System

Parallel I/O (MPI I/O)Parallel I/O (MPI I/O)

High-Level I/O LibraryHigh-Level I/O Library

Page 18: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Parallel File Systems: Hardware Layer

• I/O Hardware is usually comprised of disks, controllers, and interconnects for data movement.

• Hardware determines the maximum raw bandwidth and the minimum latency of the system.

• Bisection bandwidth of the underlying transport determines the aggregate bandwidth of the resulting parallel I/O system.

• At the hardware level, data is accessed at the granularity of blocks, either physical disk blocks or logical blocks spread across multiple physical devices such as in a RAID array.

• Parallel File Systems :– manage data on the storage hardware,– present this data as a directory hierarchy, – coordinate access to files and directories in a consistent

manner

• File systems usually provide a UNIX like interface, allowing users to access contiguous regions of files.

18

Storage HardwareStorage Hardware

Parallel I/O (MPI I/O)Parallel I/O (MPI I/O)

High-Level I/O LibraryHigh-Level I/O Library

Parallel File SystemParallel File System

Page 19: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Parallel File Systems :Other Layers

• Lower level interfaces may be provided by the file system for higher-performance access.

• Above the parallel file systems are the parallel I/O layers provided in the form of libraries such as MPI I/O.

• The parallel I/O layer provides a low level interface and operations such as collective I/O.

• Scientific applications work with structured data for which a higher level API written on top of MPI-IO such as HDF5 or parallel netCDF are used.

• HDF5 and parallel netCDF allow the scientists to represent the data sets in terms closer to those used in their applications.

19

Storage HardwareStorage Hardware

Parallel I/O (MPI I/O)Parallel I/O (MPI I/O)

Parallel File SystemParallel File System

High-Level I/O LibraryHigh-Level I/O Library

Page 20: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

PVFS2• PVFS2 designed to provide :

– a modular networking and storage subsystems– structured data request format modeled after MPI datatypes– flexible and extensible data distribution models– distributed metadata– tunable consistency semantics, and – support for data redundancy.

• Supports variety of network technologies including Myrinet, Quadrics, and Infiniband.

• Also supports variety of storage devices including locally attached hardware, SANs and iSCSI

• Key abstractions include : – Buffered Message Interface (BMI) : non-blocking network interface– Trove : non-blocking storage interface– Flows : mechanism to specify a flow of data between network and storage

20

Page 21: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

PVFS2 Software Architecture • Buffered Messaging Interface (BMI)

– Non blocking interface that can be used with many High performance network fabrics

– Currently TCP/IP and Myrinet (GM) networks exist

• Trove : – Non blocking interface that can be used with

a number of underlying storage mechanisms. – Trove storage objects consist of stream of

bytes and keyword/value pair space.– Keyword/value pairs are convenient for

arbitrary metadata storage and directory entries, while stream of bytes provides ideal storage for the stream of bytes.

21

Network Disk

Client API Request Processing

Job Sched

BMI Flo-wsDist

Job Sched

BMI Flo-wsDist

Tro-ve

Client Server

Page 22: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

PVFS2 Software Architecture• Flows :

– Combines network and storage subsystems by providing mechanism to describe flow of data between network and storage.

– Provide a point for optimization to optimize data movement between a particular network and storage pair to exploit fast paths.

• The job scheduling layer provides a common interface to interact with BMI, Flows, and Trove and checks on their completion

• The job scheduler is tightly integrated with a state machine that is used to track operations in progress.

22

Network Disk

Client API Request Processing

Job Sched

BMI Flo-wsDist

Job Sched

BMI Flo-wsDist

Tro-ve

Client Server

Page 23: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

The PVFS2 Components• The 4 major components to a PVFS

system are : – Metadata Server (mgr)– I/O Server (iod)– PVFS native API (libpvfs)– PVFS Linux kernel support

• Metadata Server (mgr) : – manages all the file metadata for PVFS

files, using a daemon which atomically operates on the file metadata.

– PVFS avoids the pitfalls of many storage area network approaches, which have to implement complex locking schemes to ensure that metadata stays consistent in the face of multiple accesses.

23

Page 24: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

The PVFS2 Components

• I/O daemon: – handles storing and retrieving

file data stored on local disks connected to a node using traditional read(), write, etc for access to these files.

• PVFS native API provides user-space access to the PVFS servers.

• The library handles the operations necessary to move data between user buffers and PVFS servers.

24

metadata access data access

http://csi.unmsm.edu.pe/paralelo/pvfs/desc.html

Page 25: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Parallel File Systems Comparison

25

Page 26: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Comparison of NFS vs. GPFSFile-System Features NFS GPFS Introduced: 1985 1998

Original vendor: Sun IBM

Example at LC: /nfs/tmpn /p/gx1

Primary role: Share files among machines Fast parallel I/O for large files

Easy to scale? No Yes

Network needed: Any TCP/IPnetwork

Only IBM SP"switch"

Access control method: UNIX permission bits (CHMOD)

UNIX permission bits (CHMOD)

Block size: 256 byte 512 Kbyte (White) Stripe width: Depends on RAID 256 Kbyte

Maximum file size: 2 Gbyte (longer with v3) 26 Gbyte

File consistency:

.....uses client buffering? Yes Yes (see diagram)

.....uses server buffering? Yes (see diagram)

.....uses locking? No Yes (token passing)

.....lock granularity? Byte range

.....lock managed by? Requesting compute node

Purged at LC? Home, No;Tmp, Yes

Yes

Supports file quotas? Yes No 26

Page 27: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

27

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI-IO)

• Parallel File Formats (HDF5)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 28: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

MPI-IO Overview

• Initially developed as a research project at the IBM T. J. Watson Research Center in 1994

• Voted by the MPI Forum to be included in MPI-2 standard (Chapter 9)

• Most widespread open-source implementation is ANL’s ROMIO, written by Rajeev Thakur (http://www-unix.mcs.anl.gov/romio/ )

• Integrates file access with the message passing infrastructure, using similarities between send/receive and file write/read operations

• Allows MPI datatypes to describe meaningfully data layouts in files instead of dealing with unorganized streams of bytes

• Provides potential for performance optimizations through the mechanism of “hints”, collective operations on file data, or relaxation of data access atomicity

• Enables better file portability by offering alternative data representations 28

Page 29: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

MPI-IO Features (I)

• Basic file manipulation (open/close, delete, space preallocation, resize, storage synchronization, etc.)

• File views (define what part of a file each process can see and how it is interpreted)

– Processes can view file data independently, with possible overlaps– The users may define patterns to describe data distributions both in file and

in memory, including non-contiguous layouts– Permit skipping over fixed header blocks (“displacements”)– Views can be changed by tasks at any time

• Data access positioning– Explicitly specified offsets (suffix “_at”)– Independent data access by each task via individual file pointers (no suffix)– Coordinated access through shared file pointer (suffix “_shared”)

• Access synchronism– Blocking– Non-blocking (include split-collective operations)

29

Page 30: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

MPI-IO Features (II)

• Access coordination– Non-collective (no additional suffix)– Collective (suffix: “_all” for most blocking calls, “_begin” and “_end” for split-

collective, or “_ordered” for equivalent of shared pointer access)

• File interoperability (ensures portability of data representation)– Native: for purely homogeneous environments– Internal: heterogeneous environments with implementation-defined data

representation (subset of “external32”)– External32: heterogeneous environments using data representation defined

by the MPI-IO standard

• Optimization hints (the “info” interface)– Access style (e.g. read_once, write_once, sequential, random, etc.)– Collective buffering components (buffer and block sizes, number of target

nodes)– Striping unit and factor– Chunked I/O specification– Preferred I/O devices

• C, C++ and Fortran bindings30

Page 31: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

MPI-IO Types

• Etype (elementary datatype): the unit of data access and positioning; all data accesses are performed in etype units and offsets are measured in etypes

• Filetype: basis for partitioning the file among processes: a template for accessing the file; may be identical to or derived from the etype

31Source: http://www.mhpcc.edu/training/workshop2/mpi_io/MAIN.html

Page 32: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

MPI-IO File ViewsA view defines the current set of data visible and accessible from an open

file as an ordered set of etypes• Each process has its own view of the file, defined by: a displacement, an etype,

and a filetype

• Displacement: an absolute byte position relative to the beginning of file; defines where a view begins

32

Page 33: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

33

MPI-IO: File Open

Function: MPI_File_open()

int MPI_File_open(MPI_Comm comm, char *filename, int amode,

MPI_Info info, MPI_File *fh);

Description:Opens the file identified by filename on all processes in comm group, using access mode specified in amode. The operation is collective; all participating processes must pass identical values for amode and use the filename referencing the same file. Successful call returns the open file handle in fh, which can be used to subsequently access the file.

It is possible to open file independently from other processes by passing MPI_COMM_SELF in comm argument.

#include <mpi.h>...MPI_File fh;int err;.../* create a writable file with default parameters */err = MPI_File_open(MPI_COMM_WORLD, “/mnt/piofs/testfile”, MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);if (err != MPI_SUCCESS) {/* handle error here */}...

#include <mpi.h>...MPI_File fh;int err;.../* create a writable file with default parameters */err = MPI_File_open(MPI_COMM_WORLD, “/mnt/piofs/testfile”, MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);if (err != MPI_SUCCESS) {/* handle error here */}...

Page 34: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

34

MPI-IO: File Close

Function: MPI_File_close()

int MPI_File_open(MPI_File *fh);

Description:Synchronizes file state (equivalent to implicit invocation of MPI_File_sync), and then closes the file associated with handle fh. The user must ensure that all oustanding non-blocking requests and split-collective operations associated with handle fh have completed. If the file was opened with access mode MPI_MODE_DELETE_ON_CLOSE, it is deleted from the file system.

#include <mpi.h>...MPI_File fh;int err;.../* open a file storing the handle in fh *//* perform file access */...err = MPI_File_close(&fh);if (err != MPI_SUCCESS) {/* handle error here */}...

#include <mpi.h>...MPI_File fh;int err;.../* open a file storing the handle in fh *//* perform file access */...err = MPI_File_close(&fh);if (err != MPI_SUCCESS) {/* handle error here */}...

Page 35: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

35

MPI-IO: Set File View

Function: MPI_File_set_view()

int MPI_File_set_view(MPI_File fh, MPI_Offset disp, MPI_Datatype etype,

MPI_Datatype filetype, char *datarep, MPI_Info info);

Description:Changes the process’ view of data file, setting the start of the view to disp, the type of file data to etype, the distribution of file data to processes to filetype, and data representation to datarep. Resets the individual and shared file pointers to zero. The call is collective, requiring the values for datarep and etype extents to be identical for all processes. The data representation must be one of: “native”, “internal” or “external32”.

#include <mpi.h>...MPI_File fh;int err;.../* open file storing the handle in fh */.../* view the file as stream of integers with no header, using native data representation */err = MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, “native”, MPI_INFO_NULL);if (err != MPI_SUCCESS) {/* handle error */}...

#include <mpi.h>...MPI_File fh;int err;.../* open file storing the handle in fh */.../* view the file as stream of integers with no header, using native data representation */err = MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, “native”, MPI_INFO_NULL);if (err != MPI_SUCCESS) {/* handle error */}...

Page 36: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

36

MPI-IO: Read File with Explicit Offset

Function: MPI_File_read_at()

int MPI_File_read_at(MPI_File fh, MPI_Offset offs, void *buf, int count,

MPI_Datatype type, MPI_Status *status);

Description:Reads count elements of type type from file represented by fh at offset offs, storing them in buffer pointed to by buf. Offset offs is expressed in etype units relative to the current view associated with the file handle fh. Successful call returns the amount of data transferred in status.

#include <mpi.h>...MPI_File fh;MPI_Status stat;int buf[3], err;.../* open file storing the handle in fh */...MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, “native”, MPI_INFO_NULL);/* read the third triad of integers from file */err = MPI_File_read_at(fh, 6, buf, 3, MPI_INT, &stat);...

#include <mpi.h>...MPI_File fh;MPI_Status stat;int buf[3], err;.../* open file storing the handle in fh */...MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, “native”, MPI_INFO_NULL);/* read the third triad of integers from file */err = MPI_File_read_at(fh, 6, buf, 3, MPI_INT, &stat);...

Page 37: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

37

MPI-IO: Write to File with Explicit Offset

Function: MPI_File_write_at()

int MPI_File_write_at(MPI_File fh, MPI_Offset offs, void *buf, int count,

MPI_Datatype type, MPI_Status *status);

Description:Writes count elements of type type from buffer buf to file represented by fh at offset offs. Offset offs is expressed in etype units relative to the current view associated with the file handle fh. Successful call returns the amount of data transferred in status.

#include <mpi.h>...MPI_File fh;MPI_Status stat;int err;double dt = 0.0005;.../* open file storing the handle in fh */...MPI_File_set_view(fh, 0, MPI_DOUBLE, MPI_DOUBLE, “native”, MPI_INFO_NULL);/* store timestep as the first item in file */err = MPI_File_write_at(fh, 0, &dt, 1, MPI_DOUBLE, &stat);...

#include <mpi.h>...MPI_File fh;MPI_Status stat;int err;double dt = 0.0005;.../* open file storing the handle in fh */...MPI_File_set_view(fh, 0, MPI_DOUBLE, MPI_DOUBLE, “native”, MPI_INFO_NULL);/* store timestep as the first item in file */err = MPI_File_write_at(fh, 0, &dt, 1, MPI_DOUBLE, &stat);...

Page 38: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

38

MPI-IO: Read File Collectively with Individual File Pointers

Function: MPI_File_read_all()

int MPI_File_read_all(MPI_File fh, void *buf, int count, MPI_Datatype type,

MPI_Status *status);

Description:All processes in communicator group associated with the file handle fh read their respective count elements of types type from file at the offsets determined by the current values of file pointers cached on their file handles, storing them in buffers pointed to by buf. Successful call returns the amount of data transferred in status.

#include <mpi.h>...MPI_File fh;MPI_Status stat;int buf[20], err;.../* open file storing the handle in fh */...MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, “native”, MPI_INFO_NULL);/* read 20 integers at current file offset in every process */err = MPI_File_read_all(fh, buf, 20, MPI_INT, &stat);...

#include <mpi.h>...MPI_File fh;MPI_Status stat;int buf[20], err;.../* open file storing the handle in fh */...MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, “native”, MPI_INFO_NULL);/* read 20 integers at current file offset in every process */err = MPI_File_read_all(fh, buf, 20, MPI_INT, &stat);...

Page 39: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

39

MPI-IO: Write to File Collectively with Individual File Pointers

Function: MPI_File_write_all()

int MPI_File_write_all(MPI_File fh, void *buf, int count, MPI_Datatype type,

MPI_Status *status);

Description:All processes in communicator group associated with the file handle fh write their respective count elements of types type from buffers buf to file at the offsets determined by the current values of file pointers cached on their file handles. Successful call returns the amount of data transferred in status.

#include <mpi.h>...MPI_File fh;MPI_Status stat;double t;int err, rank;.../* open file storing the handle in fh; compute t */...MPI_Comm_rank(MPI_COMM_WORLD, &rank);/* interleave time values t from each process at the beginning of file */MPI_File_set_view(fh, rank*sizeof(t), MPI_DOUBLE, MPI_DOUBLE, “native”, MPI_INFO_NULL);err = MPI_File_write_all(fh, &t, 1, MPI_DOUBLE, &stat);...

#include <mpi.h>...MPI_File fh;MPI_Status stat;double t;int err, rank;.../* open file storing the handle in fh; compute t */...MPI_Comm_rank(MPI_COMM_WORLD, &rank);/* interleave time values t from each process at the beginning of file */MPI_File_set_view(fh, rank*sizeof(t), MPI_DOUBLE, MPI_DOUBLE, “native”, MPI_INFO_NULL);err = MPI_File_write_all(fh, &t, 1, MPI_DOUBLE, &stat);...

Page 40: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

40

MPI-IO: File Seek

Function: MPI_File_seek()

int MPI_File_seek(MPI_File fh, MPI_Offset offs, int whence);

Description:Updates the value of the individual file pointer according to whence, which has the following possible values:• MPI_SEEK_SET: the pointer is set to offs• MPI_SEEK_CUR: the pointer is set to the current value plus offs• MPI_SEEK_END: the pointer is set to the end of file plus offs.

#include <mpi.h>...MPI_File fh;MPI_Status stat;double t;int rank;.../* open file storing the handle in fh; compute t */...MPI_Comm_rank(MPI_COMM_WORLD, &rank);/* interleave time values t from each process at the beginning of file */MPI_File_set_view(fh, 0, MPI_DOUBLE, MPI_DOUBLE, “native”, MPI_INFO_NULL);MPI_File_seek(fh, MPI_SEEK_SET, rank);MPI_File_write_all(fh, &t, 1, MPI_DOUBLE, &stat);...

#include <mpi.h>...MPI_File fh;MPI_Status stat;double t;int rank;.../* open file storing the handle in fh; compute t */...MPI_Comm_rank(MPI_COMM_WORLD, &rank);/* interleave time values t from each process at the beginning of file */MPI_File_set_view(fh, 0, MPI_DOUBLE, MPI_DOUBLE, “native”, MPI_INFO_NULL);MPI_File_seek(fh, MPI_SEEK_SET, rank);MPI_File_write_all(fh, &t, 1, MPI_DOUBLE, &stat);...

Page 41: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

MPI-IO Data Access Classification

41Source: http://www.mpi-forum.org/docs/mpi2-report.pdf

Page 42: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Example: Scatter to File

42Example created by Jean-Pierre Prost from IBM Corp.

Page 43: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Scatter Example Source

43

#include "mpi.h"

static int buf_size = 1024;static int blocklen = 256;static char filename[] = "scatter.out";

main(int argc, char **argv){ char *buf, *p; int myrank, commsize; MPI_Datatype filetype, buftype; int length[3]; MPI_Aint disp[3]; MPI_Datatype type[3]; MPI_File fh; int mode, nbytes; MPI_Offset offset; MPI_Status status;

/* initialize MPI */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &commsize);

#include "mpi.h"

static int buf_size = 1024;static int blocklen = 256;static char filename[] = "scatter.out";

main(int argc, char **argv){ char *buf, *p; int myrank, commsize; MPI_Datatype filetype, buftype; int length[3]; MPI_Aint disp[3]; MPI_Datatype type[3]; MPI_File fh; int mode, nbytes; MPI_Offset offset; MPI_Status status;

/* initialize MPI */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &commsize);

/* initialize buffer */ buf = (char *) malloc(buf_size); memset(( void *)buf, '0' + myrank, buf_size);

/* create and commit buftype */ MPI_Type_contiguous(buf_size, MPI_CHAR, &buftype); MPI_Type_commit(&buftype);

/* create and commit filetype */ length[0] = 1; length[1] = blocklen; length[2] = 1; disp[0] = 0; disp[1] = blocklen * myrank; disp[2] = blocklen * commsize; type[0] = MPI_LB; type[1] = MPI_CHAR; type[2] = MPI_UB;

MPI_Type_struct(3, length, disp, type, &filetype); MPI_Type_commit(&filetype);

/* open file */ mode = MPI_MODE_CREATE | MPI_MODE_WRONLY;

/* initialize buffer */ buf = (char *) malloc(buf_size); memset(( void *)buf, '0' + myrank, buf_size);

/* create and commit buftype */ MPI_Type_contiguous(buf_size, MPI_CHAR, &buftype); MPI_Type_commit(&buftype);

/* create and commit filetype */ length[0] = 1; length[1] = blocklen; length[2] = 1; disp[0] = 0; disp[1] = blocklen * myrank; disp[2] = blocklen * commsize; type[0] = MPI_LB; type[1] = MPI_CHAR; type[2] = MPI_UB;

MPI_Type_struct(3, length, disp, type, &filetype); MPI_Type_commit(&filetype);

/* open file */ mode = MPI_MODE_CREATE | MPI_MODE_WRONLY;

Page 44: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Scatter Example Source (cont.)

44

MPI_File_open(MPI_COMM_WORLD, filename, mode, MPI_INFO_NULL, &fh);

/* set file view */ offset = 0; MPI_File_set_view(fh, offset, MPI_CHAR, filetype, "native", MPI_INFO_NULL);

/* write buffer to file */ MPI_File_write_at_all(fh, offset, (void *)buf, 1, buftype, &status);

/* print out number of bytes written */ MPI_Get_elements(&status, MPI_CHAR, &nbytes); printf( "TASK %d ====== number of bytes written = %d ======\n", myrank, nbytes);

/* close file */ MPI_File_close(&fh);

/* free datatypes */ MPI_Type_free(&buftype); MPI_Type_free(&filetype);

/* free buffer */ free (buf);

/* finalize MPI */ MPI_Finalize();}

MPI_File_open(MPI_COMM_WORLD, filename, mode, MPI_INFO_NULL, &fh);

/* set file view */ offset = 0; MPI_File_set_view(fh, offset, MPI_CHAR, filetype, "native", MPI_INFO_NULL);

/* write buffer to file */ MPI_File_write_at_all(fh, offset, (void *)buf, 1, buftype, &status);

/* print out number of bytes written */ MPI_Get_elements(&status, MPI_CHAR, &nbytes); printf( "TASK %d ====== number of bytes written = %d ======\n", myrank, nbytes);

/* close file */ MPI_File_close(&fh);

/* free datatypes */ MPI_Type_free(&buftype); MPI_Type_free(&filetype);

/* free buffer */ free (buf);

/* finalize MPI */ MPI_Finalize();}

Page 45: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Data Access Optimizations

45

Data Sieving 2-phase I/O

Collective Read Implementation in ROMIO

Source: http://www-unix.mcs.anl.gov/~thakur/papers/romio-coll.pdf

Page 46: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

ROMIO Scaling Examples

• Bandwidths obtained for 5123 arrays (astrophysics benchmark) on Argonne IBM SP

46

Processors Independent I/O Collective I/O

16 1.26 MB/s 64.8 MB/s

32 1.25 MB/s 69.5 MB/s

48 1.36 MB/s 70.6 MB/s

Processors Independent I/O Collective I/O

16 12.8 MB/s 68.5 MB/s

32 6.46 MB/s 82.6 MB/s

48 5.83 MB/s 88.4 MB/s

Write Operations

Read Operations

Source: http://www-unix.mcs.anl.gov/~thakur/sio-demo/astro.html

Page 47: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Independent vs. Collective Access

47

Collective I/O on IBM SPIndividual I/O on IBM SP

Source: http://www-unix.mcs.anl.gov/~thakur/sio-demo/upshot.html

Page 48: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

48

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI-IO)

• Parallel File Formats (HDF5)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 49: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Introduction to HDF5

• Acronym for Hierarchical Data Format, a portable, freely distributable, and well supported library, file format, and set of utilities to manipulate it

• Explicitly designed for use with scientific data and applications• Initial HDF version was created at NCSA/University of Illinois at Urbana-

Champaign in 1988• First revision in widespread use was HDF4• Main HDF features include:

– Versatility: supports different data models and associated metadata

– Self-describing: allows an application to interpret the structure and contents of a file without any extraneous information

– Flexibility: permits mixing and grouping various objects together in one file in a user-defined hierarchy

– Extensibility: accommodates new data models, added both by the users and developers

– Portability: can be shared across different platforms without preprocessing or modifications

• HDF5 is the most recent incarnation of the format, adding support for new type and data models, parallel I/O, and streaming, and removing a number of existing restrictions (maximal file size, number of objects per file, flexibility of type use, storage management configurability, etc.) as well as improving the performance

49

Page 50: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

HDF5 File Layout• Major object classes: groups and datasets

• Namespace resembles file system directory hierarchy (groups ≡ directories, datasets ≡ files)

• Alias creation supported through links (both soft and hard)

• Mounting of sub-hierachies is possible

50

User’s viewLow-level

organization

Page 51: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

HDF5 API & Tools

Library functionality grouped by function name prefix

• H5: general purpose functions

• H5A: attribute interface

• H5D: dataset manipulation

• H5E: error handling

• H5F: file interface

• H5G: group creation and access

• H5I: object identifiers

• H5P: property lists

• H5R: references

• H5S: dataspace definition

• H5T: datatype manipulation

• H5Z: inline data filters and compression

51

Command-line utilities• h5cc, h5c++, h5fc: C, C++ and

Fortran compiler wrappers• h5redeploy: updates compiler tools

after installation in new location• h5ls, h5dump: lists hierarchy and

contents of a HDF5 file• h5diff: compares two HDF5 files• h5repack, h5repart: rearranges or

repartitions a file• h5toh4, h4toh5: converts between

HDF5 and HDF4 formats• h5import: imports data into HDF5 file• gif2h5, h52gif: converts image data

between gif and HDF5 formats

Page 52: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Basic HDF5 Concepts• Group

– Structure containing zero or more HDF5 objects (possibly other groups)

– Provides a mechanism for mapping a name (path) to an object

– “Root” group is a logical container of all other objects in a file

• Dataset– A named array of data elements (possibly multi-dimensional)

– Specifies the representation of the dataset the way it will be stored in HDF5 file through associated datatype and dataspace parameters

• Dataspace– Defines dimensionality of a dataset (rank and dimension sizes)

– Determines the effective subset of data to be stored or retrieved in subsequent file operations (aka selection)

• Datatype– Describes atomically accessed element of a dataset

– Permits construction of derived (compound) types, such as arrays, records, enumerations

– Influences conversion of numeric values between different platforms or implementations

• Attribute– A small, user-defined structure attached to a group, dataset or named datatype,

providing additional information 52

Page 53: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

HDF5 Spatial Subset Examples

53Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf

Page 54: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

HDF5 Virtual File Layer

54Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf

• Developed to cope with large number of available storage subsystem variations

• Permits custom file driver implementations and related optimizations

Page 55: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Overview of Data Storage Options

55Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf

Page 56: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Simultaneous Spatial and Type Transformation Example

56Source: http://hdf.ncsa.uiuc.edu/HDF5/RD100-2002/All_About_HDF5.pdf

Page 57: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Simple HDF5 Code Example

57

/* Writing and reading an existing dataset. */#include "hdf5.h"#define FILE "dset.h5"

int main() { hid_t file_id, dataset_id; /* identifiers */ herr_t status; int i, j, dset_data[4][6];

/* Initialize the dataset. */ for (i = 0; i < 4; i++) for (j = 0; j < 6; j++) dset_data[i][j] = i * 6 + j + 1;

/* Open an existing file. */ file_id = H5Fopen(FILE, H5F_ACC_RDWR, H5P_DEFAULT); /* Open an existing dataset. */ dataset_id = H5Dopen(file_id, "/dset"); /* Write the dataset. */ status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

/* Close the dataset. */ status = H5Dclose(dataset_id); /* Close the file. */ status = H5Fclose(file_id);}

/* Writing and reading an existing dataset. */#include "hdf5.h"#define FILE "dset.h5"

int main() { hid_t file_id, dataset_id; /* identifiers */ herr_t status; int i, j, dset_data[4][6];

/* Initialize the dataset. */ for (i = 0; i < 4; i++) for (j = 0; j < 6; j++) dset_data[i][j] = i * 6 + j + 1;

/* Open an existing file. */ file_id = H5Fopen(FILE, H5F_ACC_RDWR, H5P_DEFAULT); /* Open an existing dataset. */ dataset_id = H5Dopen(file_id, "/dset"); /* Write the dataset. */ status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

/* Close the dataset. */ status = H5Dclose(dataset_id); /* Close the file. */ status = H5Fclose(file_id);}

Page 58: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Parallel HDF5

• Relies on MPI-IO as the file layer driver• Uses MPI for internal communications• Most of the functionality controlled through property lists

(requires minimal HDF5 interface changes)• Supports both individual and collective file access• Three raw data storage layouts: contiguous, chunking and

compact• Enables additional optimizations through derived MPI datatypes

(esp. for regular collective accesses)• Limitations

– Chunked storage with overlapping chunks (results non-deterministic)

– Read-only compression

– Writes with variable length datatypes not supported

58

Page 59: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

59

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI IO, ROMIO)

• Parallel File Formats (HDF5..)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 60: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

General Parallel File System (GPFS)• Brief history:

– Based on the Tiger Shark parallel file system developed at the IBM Almaden Research Center in 1993 for AIX

• Originally targeted at dedicated video servers• The multimedia orientation influenced GPFS command names: they all contain “mm”

– First commercial release was GPFS V1.1 in 1998

– Linux port released in 2001; Linux-AIX interoperability supported since V2.2 in 2004

• Highly scalable– Distributed metadata management

– Permits incremental scaling

• High-performance– Large block size with wide striping

– Parallel access to files from multiple nodes

– Deep prefetching

– Adaptable mechanism for recognizing access patterns

– Multithreaded daemon

• Highly available and fault tolerant– Data protection through journaling, replication, mirroring and shadowing

– Ability to recover from multiple disk, node and connectivity failures (heartbeat mechanism)

– Recovery mechanism implemented in all layers 60

Page 61: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

GPFS Features (I)

61Source: http://www-03.ibm.com/systems/clusters/software/gpfs.pdf

Page 62: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

GPFS Features (II)

62

Page 63: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

GPFS Architecture

63Source: http://www.redbooks.ibm.com/redbooks/pdfs/sg245610.pdf

Page 64: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Components Internal to GPFS Daemon

• Configuration Manager (CfgMgr)– Selects the node acting as Stripe Group Manager for each file system

– Checks for the quorum of nodes required for the file system usage to continue

– Appoints successor node in case of failure

– Initiates and controls recovery procedure

• Stripe Group Manager (FSMgr, aka File System Manager)– Strictly one per each GPFS file system

– Maintains availability information of disks comprising the file system (physical storage)

– Processes modifications (disk removals and additions)

– Repairs file system and coordinates data migration when required

• Metanode– Manages metadata (directory block updates)

– Its location may change (e.g. a node obtaining access to the file may become the metanode)

• Token Manager Server– Synchronizes concurrent access to files and ensures consistency among caches

– Manages tokens, or per-object locks• Mediates token migration when another node requests token conflicting with the existing token (token

stealing)

– Always located on the same node as Stripe Group Manager 64

Page 65: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

GPFS Management Functions & Their Dependencies

65Source: http://www.redbooks.ibm.com/redbooks/pdfs/sg246700.pdf

Page 66: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Components External to GPFS Daemon

• Virtual Shared Disk (VSD, aka logical volume)– Enables nodes in one SP system partition to share disks with the other nodes in the same

system partition

– VSD node can be a client, a server (owning a number of VSDs, and performing data reads and writes requested by client nodes), or both at the same time

• Recoverable Virtual Shared Disk (RVSD)– Used together with VSD to provide high availability against node failures reported by Group

Services

– Runs recovery scripts and notifies client applications

• Switch (interconnect) Subsystem– Starts switch daemon, responsible for initializing and monitoring the switch

– Discovers and reacts to topology changes; reports and services status/error packets

• Group Services– Fault-tolerant, highly available and partition-sensitive service monitoring and coordinating

changes related to another subsystem operating in the partition

– Operates on each node within the partition, plus the control workstation for the partition

• System Data Repository (RSD)– Location where the configuration data are stored

66

Page 67: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Read Operation Flow in GPFS

67

Page 68: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Write Operation Flow in GPFS

68

Page 69: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Token Management in GPFS

69

• First lock request for an object requires a message from node N1 to the token manager• Token server grants token to N1 (subsequent lock requests can be granted locally)• Node N2 requests token for the same file (lock conflict)• Token server detects conflicting lock request and revokes token from N1• If N1 was writing to file, the data is flushed to disk before the revocation is complete• Node N2 gets the token from N1

Page 70: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

GPFS Write-behind and Prefetch

70

• As soon as application’s write buffer is copied into the local pagepool, the write is operation is complete from client’s perspective

• GPFS daemon schedules a worker thread to finalize the request by issuing I/O calls to the device driver

• GPFS estimates the number of blocks to read ahead based on disk performance and rate at which application is reading the data

• Additional prefetch requests are processed asynchronously with the completion of the current read

Page 71: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Some GPFS Cluster Models

71Joined (AIX and Linux) modelMixed (NSD and direct attached) model

Network Shared Disk (NSD) with dedicated server model Direct attached model

Page 72: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

Comparison of GPFS to Other File Systems

72

Page 73: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

73

Topics

• Introduction

• RAID

• Distributed File Systems (NFS)

• Parallel File Systems (PVFS2)

• Parallel I/O Libraries (MPI IO, ROMIO)

• Parallel File Formats (HDF5..)

• Additional Parallel File Systems (GPFS)

• Summary – Materials for Test

Page 74: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

74

Summary – Material for the Test

• Need for Parallel I/O (slide 6)• RAID concepts (slides 8-10)• Distributed File System Concepts NFS (slides 12, 13)• Why NFS is bad for parallel I/O (slide 14)• Parallel File System Concepts (slides 16-19)• PVFS (slides 20-24)• MPI-IO concepts & features (slides 29-32)• MPI-IO API & functionalities (slides 33-41)

Page 75: High Performance Computing: Concepts, Methods & Means Parallel I/O : File Systems and Libraries

75