center for high performance computing introduction to i/o in the hpc environment brian haymore,...

21
CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, [email protected] Sam Liston, [email protected] Center for High Performance Computing Spring 2013

Upload: james-singleton

Post on 13-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Introduction to I/O in the HPC Environment

Brian Haymore, [email protected]

Sam Liston, [email protected]

Center for High Performance Computing

Spring 2013

Page 2: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Overview

• CHPCFS

Topology

04/18/23 http://www.chpc.utah.edu Slide 2

Page 3: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Overview• Couplet Topology

– Redundency– Performance– Used on:

• Redbutte• Drycreek• Greenriver• Saltflats• IBrix Scratch

– Virt/Horiz Scalable

04/18/23 http://www.chpc.utah.edu Slide 3

Network Clients

10GigE/40GigE10GigE/40GigE

Page 4: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Overview• SAN storage redundancy – RAID• RAID = Redundant Array of Inexpensive Disks

04/18/23 http://www.chpc.utah.edu Slide 4

Page 5: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Overview• Types of storage available at CHPC

– Home Directory (i.e. /uufs/chpc.utah.edu/common/home/uNID)• Per department backed up (except CHPC_HPC file system) • Intended for critical/volatile data• Expected to maintain a high level of responsiveness

– Group Data Space (i.e. /uufs/chpc.utah.edu/common/home/pi_grp)• Optional per department archive• Intended for active projects, persistent data, etc.• Usage expectations to be set by group

– Network Mounted Scratch (i.e. /scratch/serial)• No expectation of data retention (It’s scratch)• Expected to maintain a high level of I/O performance under significant load

– Local Disk (i.e. /tmp)• Most consistent I/O• No expectation of data retention• Unique per machine

04/18/23 http://www.chpc.utah.edu Slide 5

Page 6: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Overview• Good Citizens

– Shared Environment Characteristics• Many to one relationship (over-subscribed)• Globally accessible• Global resource are still finite• Consider your usage impact when choosing a storage location• Be aware of any usage policies• Evaluate different I/O methodologies• Seek additional assistance from CHPC

04/18/23 http://www.chpc.utah.edu Slide 6

Page 7: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Best Practices

04/18/23 http://www.chpc.utah.edu Slide 7

• Data Segregation; Where should files be stored? – Classify your Data

• How important is it?• Can it be recreated?• Is this dataset currently in use?• Will this data be used by others?• Does this data need to be backed up?

– Put your data in the appropriate space• Home Directory• Group Space• Scratch• /tmp

Page 8: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Considerations• Backup Impact

– Performance Characteristics• Time (when, duration)

• Competition/concurrent access

• Capacity of files backed up

• Quantity of files backed up

• Unintended consequences

04/18/23 http://www.chpc.utah.edu Slide 8

Page 9: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Considerations• Network Performance

04/18/23 http://www.chpc.utah.edu Slide 9

Page 10: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Considerations• Data Migration

– What are we moving; What does it look like?– From where to where are we moving the data?– What transfer performance expectation do we have?– What tool will we use to make the transfer?

• SSH/SFTP– Simple– Very portable

• rsync– Restart able– File verification

• tar via SSH– More efficient with many small files

• Compression? • Secure

04/18/23 http://www.chpc.utah.edu Slide 10

Page 11: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

File Operations• Directory Structure

– Poor performance when too many files are in the same directory – Organizing files in a tree avoids this issue– Directory block count significance

• Network vs. Local– IOPS vs. Bandwidth– Network I/O

• Overhead• Limited by network pipe• More efficient for bandwidth vs. IOPS

– Local I/O • Limited size• Not globally accessible• Depending on hardware offers a fair balance between bandwidth and IOPS

04/18/23 http://www.chpc.utah.edu Slide 11

Page 12: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

File Operations• Metadata Operations Performance Considerations

– Create, Destroy, Stat, etc.– IOPS oriented performance

• Application I/O Performance Considerations– How often do we open and close files?– What I/O granularity do our applications write files?– Are we doing anything else silly?

04/18/23 http://www.chpc.utah.edu Slide 12

Page 13: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples• Code (user example)

– Code wrote millions of small files (<10kB) to a single dir – Code wrote thousands of larger files (100s of MB)– Code uses a compression algorithm to speed up I/O of the larger files.

• Observations– Changing default r/w chunk of compression I/O from 16kB to 32kB/64kB

improved performance 10%-20%.– Changing code to write files in a hierarchical directory structure produced a

multiple times speed up.

04/18/23 http://www.chpc.utah.edu Slide 13

Page 14: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples

• Single Directory vs. Hierarchical Directory Structure

04/18/23 http://www.chpc.utah.edu Slide 14

4.3GB file

Page 15: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples

• Processing Many Small Files over Network vs. Local

04/18/23 http://www.chpc.utah.edu Slide 15

4.3GB file

Page 16: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples

• Bonnie Test

04/18/23 http://www.chpc.utah.edu Slide 16

4.3GB file

Page 17: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples

• Tar (Linux Kernel)

04/18/23 http://www.chpc.utah.edu Slide 17

4.3GB file

Page 18: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples

• Compile (Linux Kernel)

04/18/23 http://www.chpc.utah.edu Slide 18

4.3GB file

Page 19: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples

• Fine vs. Coarse I/O (Read)

04/18/23 http://www.chpc.utah.edu Slide 19

4.3GB file

Page 20: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Examples

• Fine vs. Coarse I/O (Writes)

04/18/23 http://www.chpc.utah.edu Slide 20

4.3GB file

Page 21: CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, brian.haymore@utah.edu brian.haymore@utah.edu Sam Liston,

CENTER FOR HIGH PERFORMANCE COMPUTING

Troubleshooting

04/18/23 http://www.chpc.utah.edu Slide 21

• Diagnosing Slowness

– Open a ticket ([email protected])

– File system

– System load

– Network load

• Future Monitoring

– Ganglia

• Additional Information

– http://www.chpc.utah.edu