hiding periodic i/o costs in parallel applications

38

Upload: cleave

Post on 21-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Hiding Periodic I/O Costs in Parallel Applications. Xiaosong Ma Department of Computer Science University of Illinois at Urbana-Champaign Spring 2003. Roadmap. Introduction Active buffering: hiding recurrent output cost Ongoing work: hiding recurrent input cost Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hiding Periodic I/O Costs in Parallel Applications
Page 2: Hiding Periodic I/O Costs in Parallel Applications

Hiding Periodic I/O Costsin Parallel Applications

Xiaosong Ma

Department of Computer Science

University of Illinois at Urbana-Champaign

Spring 2003

Page 3: Hiding Periodic I/O Costs in Parallel Applications

3

Roadmap

• Introduction• Active buffering: hiding recurrent output cost• Ongoing work: hiding recurrent input cost• Conclusions

Page 4: Hiding Periodic I/O Costs in Parallel Applications

4

Introduction

• Fast-growing technology propels high performance applications– Scientific computation– Parallel data mining – Web data processing– Games, movie graphics

• Individual component’s growth un-coordinated– Manual performance tuning needed

Page 5: Hiding Periodic I/O Costs in Parallel Applications

5

We Need Adaptive Optimization

• Flexible and automatic performance optimization desired

• Efficient high-level buffering and prefetching for parallel I/O in scientific simulations

Page 6: Hiding Periodic I/O Costs in Parallel Applications

6

Scientific Simulations

• Important– Detail and flexibility

– Save money and lives

• Challenging– Multi-disciplinary

– High performance crucial

Page 7: Hiding Periodic I/O Costs in Parallel Applications

7

Parallel I/O in Scientific Simulations

• Write-intensive

• Collective and periodic

• “Poor stepchild”

• Bottleneck-prone

• Existing collective I/O focused on data transfer

Computation

I/O

Computation

I/O

Computation

I/O

Computation

Page 8: Hiding Periodic I/O Costs in Parallel Applications

8

My Contributions

• Idea: I/O optimizations in larger scope– Parallelism between I/O and other tasks – Individual simulation’s I/O need– I/O related self-configuration

• Approach: hide the I/O cost

• Results– Publications, technology transfer, software

Page 9: Hiding Periodic I/O Costs in Parallel Applications

9

Roadmap

• Introduction• Active buffering: hiding recurrent output cost• Ongoing work: hiding recurrent input cost• Conclusions

Page 10: Hiding Periodic I/O Costs in Parallel Applications

10

Latency Hierarchy on Parallel Platforms

• Along path of data transfer– Smaller throughput– Lower parallelism and less scalable

local memory access

inter-processor communication

disk I/O

wide-area transfer

Page 11: Hiding Periodic I/O Costs in Parallel Applications

11

Basic Idea of Active Buffering

• Purpose: maximize overlap between computation and I/O

• Approach: buffer data as early as possible

Page 12: Hiding Periodic I/O Costs in Parallel Applications

12

Challenges

• Accommodate multiple I/O architectures

• No assumption on buffer space

• Adaptive– Buffer availability– User request patterns

Page 13: Hiding Periodic I/O Costs in Parallel Applications

13

Roadmap

• Introduction• Active buffering: hiding recurrent output cost

– With client-server I/O architecture [IPDPS ’02]– With server-less architecture

• Ongoing work: hiding recurrent input cost• Related work and future work• Conclusions

Page 14: Hiding Periodic I/O Costs in Parallel Applications

14

Client-Server I/O Architecture

compute processors

I/O servers

File SystemFile System

Page 15: Hiding Periodic I/O Costs in Parallel Applications

15

Client State Machine

send ablock

preparebufferdata

exit

enter collective

write routine

buffer space

available

data to send

out of bufferspace

sent

no overflow

all data

Page 16: Hiding Periodic I/O Costs in Parallel Applications

16

init.

exitmessage

idle, no data tofetch &data to

write done

idle-listen

alloc.buffers

preparereceivea block

fetch& write

all

write ablock

fetch ablock

busy-listen

write request

exit

recv.

recv.fetch

got write req.

no request write done

received all data

idle & to fetch

recv.

exit msg.

no data

out of buffer space

data to receive& enough buffer space

write

Server State Machine

Page 17: Hiding Periodic I/O Costs in Parallel Applications

17

Maximize Apparent Throughput

• Ideal apparent throughput per server

Dtotal

Tideal = Dc-buffered Dc-overflow Ds-overflow

Tmem-copy TMsg-passing Twrite

• More expensive data transfer only becomes visible when overflow happens

• Efficiently masks the difference in write speeds

+ +

Page 18: Hiding Periodic I/O Costs in Parallel Applications

18

Write Throughput without Overflow

0

200

400

600

800

1000

1200

2 4 8 16 32

number of clients

thro

ug

hp

ut

pe

r s

erv

er

(MB

/s)

local bufferingABMPIbinary write

0

200

400

600

800

1000

1200

2 4 8 16 32

number of clients

local bufferingABMPIHDF4 write

– Panda Parallel I/O library– SGI Origin 2000, SHMEM– Per client: 16MB output data per snapshot, 64MB buffer – Two servers, each with 256MB buffer

Page 19: Hiding Periodic I/O Costs in Parallel Applications

19

Write Throughput with Overflow

0

50

100

150

200

250

2 4 8 16 32

number of clients

thro

ug

hp

ut

pe

r s

erv

er

(MB

/s)

idealABMPIbinary write

0

50

100

150

200

250

2 4 8 16 32

number of clients

idealABMPIHDF4 write

– Panda Parallel I/O library– SGI Origin 2000, SHMEM, MPI– Per client: 96MB output data per snapshot, 64MB buffer – Two servers, each with 256MB buffer

Page 20: Hiding Periodic I/O Costs in Parallel Applications

20

Give Feedback to Application

• “Softer” I/O requirements

• Parallel I/O libraries have been passive

• Active buffering allows I/O libraries to take more active role– Find optimal output frequency automatically

Page 21: Hiding Periodic I/O Costs in Parallel Applications

21

init.

exitmessage

idle, no data tofetch &data to

write done

idle-listen

alloc.buffers

preparereceivea block

fetch& write

all

write ablock

fetch ablock

busy-listen

write request

exit

recv.

recv.fetch

got write req.

no request write done

received all data

idle & to fetch

recv.

exit msg.

no data

out of buffer space

data to receive& enough buffer space

write

Server-side Active Buffering

Page 22: Hiding Periodic I/O Costs in Parallel Applications

22

Performance with Real Applications

• Application overview – GENX– Large-scale, multi-component, detailed rocket simulation– Developed at Center for Simulation of Advanced Rockets

(CSAR), UIUC– Multi-disciplinary, complex, and evolving

• Providing parallel I/O support for GENX– Identification of parallel I/O requirements [PDSECA ’03]– Motivation and test case for active buffering

Page 23: Hiding Periodic I/O Costs in Parallel Applications

23

Overall Performance of GEN1

– SDSC IBM SP (Blue Horizon)– 64 clients, 2 I/O servers with AB– 160MB output data per snapshot (in HDF4)

0

500

1000

1500

2000

2500

3000

3500

number of snapshots taken in 30 time steps

tim

e (

s)

I/O

Computation

Page 24: Hiding Periodic I/O Costs in Parallel Applications

24

Aggregate Write Throughput in GEN2

– LLNL IBM SP (ASCI Frost)– 1 I/O server per 16-way SMP node – Write in HDF4

0

100

200

300

400

500

600

700

800

900

1000

2 (1) 4 (1) 8 (1) 15 (1) 30 (2) 60 (4) 120(8)

240(16)

480(32)

number of compute processors (number of SMP nodes)

app

aren

t ag

gre

gat

e w

rite

th

rou

gh

pu

t (M

B/s

)

Native I/O AB

Page 25: Hiding Periodic I/O Costs in Parallel Applications

25

Scientific Data Migration

• Output data need to be moved

• Online migration

• Extend active buffering to migration– Local storage becomes

another layer in buffer hierarchy

Computation

I/O

Computation

I/O

Computation

I/O

Computation

internet

internet

Page 26: Hiding Periodic I/O Costs in Parallel Applications

26

I/O Architecture with Data Migration

compute processors

InternetInternetFile SystemFile System

workstation runningvisualization tool

servers

Page 27: Hiding Periodic I/O Costs in Parallel Applications

27

Active Buffering for Data Migration

• Avoid unnecessary local I/O– Hybrid migration approach

• Combined with data compression [ICS ’02]

• Self-configuration for online visualization

memory-to-memory transfer disk staging

Page 28: Hiding Periodic I/O Costs in Parallel Applications

28

Roadmap

• Introduction• Active buffering: hiding recurrent output cost

– With client-server I/O architecture– With server-less architecture [IPDPS ’03]

• Ongoing work: hiding recurrent input cost• Conclusions

Page 29: Hiding Periodic I/O Costs in Parallel Applications

29

Server-less I/O Architecture

compute processors

File SystemFile System

I/O thread

Page 30: Hiding Periodic I/O Costs in Parallel Applications

30

Making ABT Transparent and Portable

• Unchanged interfaces• High-level and file-system independent• Design and evaluation [IPDPS ’03]• Ongoing transfer to ROMIO

ADIO

NFSHFS NTFS PFS PVFS XFSUFSABT

Page 31: Hiding Periodic I/O Costs in Parallel Applications

31

Active Buffering vs. Asynchronous I/O

Active buffering Async I/O Application level (platform-independent)

Supported by file system (platform-dependent)

Transparent to user Not transparent to user

Designed for collective I/O

More difficult to use in collective I/O

Both local and remote I/O Local I/O

Works on top of scientific data formats

May not be supported by scientific data formats

Page 32: Hiding Periodic I/O Costs in Parallel Applications

32

Roadmap

• Introduction• Active buffering: hiding recurrent output cost• Ongoing work: hiding recurrent input cost• Conclusions

Page 33: Hiding Periodic I/O Costs in Parallel Applications

33

I/O in Visualization

• Periodic reads

• Dual modes of operation– Interactive– Batch-mode

• Harder to overlap reads with computation

Computation

I/O

Computation

I/O

Computation

I/O

Computation

Page 34: Hiding Periodic I/O Costs in Parallel Applications

34

Efficient I/O Through Data Management

• In-memory database of datasets– Manage buffers or values

• Hub for I/O optimization– Prefetching for batch mode– Caching for interactive mode

• User-supplied read routine

Page 35: Hiding Periodic I/O Costs in Parallel Applications

35

Related Work

• Overlapping I/O with computation– Replacing synchronous calls with async calls [Agrawal et al.

ICS ’96]– Threads [Dickens et al. IPPS ’99, More et al. IPPS ’97]

• Automatic performance optimization– Optimization with performance models [Chen et al. TSE ’00]– Graybox optimization [Arpaci-Dusseau et al. SOSP ’01]

Page 36: Hiding Periodic I/O Costs in Parallel Applications

36

Roadmap

• Introduction• Active buffering: hiding recurrent output cost • Ongoing work: hiding recurrent input cost• Conclusions

Page 37: Hiding Periodic I/O Costs in Parallel Applications

37

Conclusions

• If we can’t shrink it, hide it!

• Performance optimization can be done – more actively– at higher-level– in larger scope

• Make I/O part of data management

Page 38: Hiding Periodic I/O Costs in Parallel Applications

38

References

• [IPDPS ’03] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Improving MPI-IO Output Performance with Active Buffering Plus Threads, 2003 International Parallel and Distributed Processing Symposium

• [PDSECA ’03] Xiaosong Ma, Xiangmin Jiao, Michael Campbell and Marianne Winslett, Flexible and Efficient Parallel I/O for Large-Scale Multi-component Simulations, The 4th Workshop on Parallel and Distributed Scientific and Engineering Computing with Applications

• [ICS ’02] Jonghyun Lee, Xiaosong Ma, Marianne Winslett and Shengke Yu, Active Buffering Plus Compressed Migration: An Integrated Solution to Parallel Simulations' Data Transport Needs, the 16th ACM International Conference on Supercomputing

• [IPDPS ’02] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Faster Collective Output through Active Buffering, 2002 International Parallel and Distributed Processing Symposium