petascale science with gtc/adios

24
HPC User Forum 9/10/08 [email protected] Managed by UT- Battelle for the Department of Energy 1 GPSC Petascale Science with GTC/ADIOS HPC User Forum 9/10/2008 Scott Klasky S. Ethier, S. Hodson, C. Jin, Z. Lin, J. Lofstead, R. Oldfeld, M. Parashar,K. Schwan, A. Shoshani, M. Wolf, Y. Xiao, F. Zheng

Upload: bono

Post on 07-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Petascale Science with GTC/ADIOS. HPC User Forum 9/10/2008 Scott Klasky S. Ethier , S. Hodson , C. Jin , Z. Lin, J. Lofstead , R. Oldfeld , M. Parashar,K . Schwan , A. Shoshani , M. Wolf, Y. Xiao , F. Zheng. Outline. GTC EFFIS ADIOS. Workflow. Dashboard. Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

1

GPSC

Petascale Science with GTC/ADIOS

HPC User Forum

9/10/2008

Scott KlaskyS. Ethier, S. Hodson, C. Jin, Z. Lin, J. Lofstead,

R. Oldfeld, M. Parashar,K. Schwan, A. Shoshani, M. Wolf, Y. Xiao, F. Zheng

Page 2: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

2

GPSC

Outline

· GTC· EFFIS· ADIOS. · Workflow. · Dashboard. · Conclusions.

Page 3: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

3

GPSC

2008-2009· Compute node (7,832; 35.2 GF/node)

– 1 socket (AM2/HT1) per node– 4 cores per socket (31,328 cores total)– Core CPU: 2.2 GHz AMD Opteron– Memory per core: 2 GB (DDR2-800)

· 232 service & I/O nodes· Local storage: ~750 TB, 41 GB/s· Interconnect: 3D torus, SeaStar 2.1 NIC· Aggregate memory: 63 TB· Peak performance: 275 TF

• Compute node (13,888; 73.6 GF/node)– 2 sockets per node (F/HT1)– 4 cores per socket (111,104 cores total)– Core CPU: 2.3 GHz AMD Opteron– Memory per core: 2 GB (DDR2-800)

• 256 service & I/O nodes• Local storage: ~10 PB, 200+ GB/s• Interconnect: 3D torus, SeaStar 2.1 NIC• Aggregate memory: 222 TB• Peak performance: 1.0 PF• 150 cabinets, 3400 ft2

• 6.5 MW power.

Advanced computing at NCCS

Page 4: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

4

GPSC

Big Simulations for early 2008: GTCScience Goals and Impact

Science Goals· Use GTC (classic) to analyze cascades

and propagation in Collisionless Trapped Electron Mode (CTEM) turbulence

– Resolve the critical question of ρ* scaling of confinement in large tokamaks such as ITER; what are consequences of departure from this scaling?

– Avalanches and turbulence spreading tend to break Gyro-Bohm scaling but zonal flows tend to restore it by shearing apart extended eddies: a competition

· Use GTC-S (shaped) to study electron temperature gradient (ETG) drift turbulence & compare against NSTX experiments

– NSTX has a spherical torus with a very low major to minor radius aspect ratio and a strongly-shaped cross-section

– NSTX exps have produced very interesting high frequency short wavelength modes - are these kinetic electron modes?

– ETG is a likely candidate but only a fully nonlinear kinetic simulation with the exact shape & exp profiles can address this

Science Impact· Further the understanding of CTEM

turbulence by validation against modulated ECH heat pulse propagation studies on the DIII-D, JET & Tore Supra tokamaks

– Is CTEM the key mechanism for electron thermal transport?

– Electron temperature fluctuation measurements will shed light

– Understand the role of nonlinear dynamics of precession drift resonance in CTEM turbulence

· First-time for direct comparison between simulation & experiment on ETG drift turbulence

– GTC-S possesses right geometry and right nonlinear physics to possibly resolve this

– Help to pinpoint micro-turbulence activities responsible for energy loss through the electron channel in NSTX plasmas

Page 5: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

5

GPSC

GTC Early Application: electron microturbulence in

Fusion Plasma

• “Scientific Discovery” - Transition to favorable scaling of confinement for both ions and electrons now observed in simulations for ITER plasmas

• Electron transport less understood but more important in ITER since fusion products first heat the electrons

• Simulation of electron turbulence is more demanding due to shorter time scales and smaller spatial scales

• Recent GTC simulation of electron turbulence used 28,000 cores for 42 hours in a dedicated run on Jaguar at ORNL producing 60 TB of data currently being analyzed. This run pushes 15 billion particles for 4800 major time cycles

Good news for ITER!

Good news for ITER!

Ion transport

Electron transport

Page 6: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

6

GPSC

GTC Electron Microturbulence Structure

· 3D fluid data analysis provides critical information to characterize microturbulence, such as radial eddy size, eddy auto-correlation time

· Flux Surface Electrostatic Potential demonstrates a ballooning structure

· Radial Turbulence eddies have average size ~ 5 ion gyroradius

Page 7: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

7

GPSC

EFFIS

· From SDM center*– Workflow engine – Kepler– Provenance support– Wide-area data movement

· From universities– Code coupling (Rutgers)– Visualization (Rutgers)

· Newly developed technologies– Adaptable I/O (ADIOS)

(with Georgia Tech)– Dashboard (with SDM center)

Visualization

Code Coupling

Wide-areaData Movement

DashboardDashboard

WorkflowWorkflow

Adaptable I/OAdaptable I/O

ProvenanceandMetadata

Foundation Technologies

Enabling Technologies

Approach: place highly annotated, fast, easy-to-use I/O methods in the code, which can be monitored and controlled, have a workflow engine record all of the information, visualize this on a dashboard, move desired data to user’s site, and have everything reported to a database.

Page 8: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

8

GPSC

Outline

· GTC· EFFIS· ADIOS. · Conclusions.

Page 9: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

9

GPSC

ADIOS: Motivation

· “Those fine fort.* files!”· Multiple HPC architectures

– BlueGene, Cray, IB-based clusters

· Multiple Parallel Filesystems– Lustre, PVFS2, GPFS, Panasas, PNFS

· Many different APIs– MPI-IO, POSIX, HDF5, netCDF– GTC (fusion) has changed IO routines 8 times so far based

on performance when moving to different platforms.

· Different IO patterns– Restarts, analysis, diagnostics– Different combinations provide different levels of IO

performance

· Compensate for inefficiencies in the current IO infrastructures to improve overall performance

Page 10: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

10

GPSC

ADIOS Overview

· Allows plug-ins for different I/O implementations.· Abstracts the API from the method used for I/O.· Simple API, almost as easy as F90 write statement.· Best practices/optimize IO routines for all supported

transports “for free”

· Componentization.· Thin API· XML file

– data groupings with annotation– IO method selection– buffer sizes

· Common tools– Buffering– Scheduling

· Pluggable IO routines

ExternalMetadata(XML file)

Scientific Codes

ADIOS API

MPI-CIO

LIVE/DataTap

MPI-IO

POSIX IO

pHD

F-5

pnetCDF

Viz Engines

Others (plug-in)

buffering schedule feedback

Page 11: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

11

GPSC

ADIOS Philosophy (End User)

· Simple API very similar to standard Fortran or C POSIX IO calls.– As close to identical as possible for C and Fortran API– open, read/write, close is the core– set_path, end_iteration, begin/end_computation, init/finalize are

the auxiliaries

· No changes in the API for different transport methods.

· Metadata and configuration defined in an external XML file parsed once on startup.– Describe the various IO grouping including attributes and

hierarchical path structures for elements as an adios-group– Define the transport method used for each adios-group and give

parameters for communication/writing/reading– Change on a per element basis what is written– Change on a per adios-group basis how the IO is handled

Page 12: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

12

GPSC

ADIOS Overview

• ADIOS is an IO componentization, which allows us to– Abstract the API from the IO implementation.– Switch from synchronous to asynchronous IO at runtime.– Change from real-time visualization to fast IO at runtime.

• Combines.– Fast I/O routines.– Easy to use.– Scalable architecture

(100s cores) millions of procs.– QoS.– Metadata rich output.– Visualization applied during simulations.– Analysis, compression techniques applied during

simulations.– Provenance tracking.

Page 13: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

13

GPSC

Design Goals

· ADIOS Fortran and C based API almost as simple as standard POSIX IO

· External configuration to describe metadata and control IO settings

· Take advantage of existing IO techniques (no new native IO methods)

Fast, simple-to-write, efficient IO for multiple platforms without changing the source code

Page 14: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

14

GPSC

Architecture

· Data groupings– logical groups of related items written at the same

time.· Not necessarily one group per writing event

· IO Methods– Choose what works best for each grouping– Vetted, improved, and/or written by experts for

each· POSIX (Wei-keng Liao, Northwestern)· MPI-IO (Steve Hodson, ORNL)· MPI-IO Collective (Wei-keng Liao, Northwestern)· NULL (Jay Lofstead, GT)· Ga Tech DataTap Asynchronous (Hasan Abbasi, GT)· phdf5· others.. (pnetcdf on the way).

Page 15: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

15

GPSC

Related Work

· Specialty APIs– HDF-5 – complex API– Parallel netCDF – no structure

· File system aware middleware– MPI ADIO layer – File system connection, complex API

· Parallel File systems– Lustre – Metadata server issues– PVFS2 – client complexity– LWFS – client complexity– GPFS, pNFS, Panasas – may have other issues

Page 16: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

16

GPSC

Supported Features

· Platforms tested– Cray CNL (ORNL Jaguar)– Cray Catamount (SNL Redstorm)– Linux Infiniband/Gigabit (ORNL Ewok)– BlueGene P now being tested/debugged.– Looking for future OSX support.

· Native IO Methods– MPI-IO independent, MPI-IO collective,

POSIX, NULL, Ga Tech DataTap asynchronous, Rutgers DART asynchronous, Posix-NxM, phdf5, pnetcdf, kepler-db

Page 17: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

17

GPSC

Initial ADIOS performance.

· MPI-IO method.– GTC and GTS codes have achieved over 20 GB/sec on

Cray XT at ORNL.· 30GB diagnostic files every 3 minutes, 1.2 TB restart files every

30 minutes, 300MB other diagnostic files every 3 minutes.

· DART: <2% overhead forwriting 2 TB/hour withXGC code.

· DataTap vs. Posix– 1 file per process (Posix).– 5 secs for GTC

computation.– ~25 seconds for Posix IO– ~4 seconds with DataTap

Page 18: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

18

GPSC

Codes & Performance

· June 7, 2008: 24 hour GTC run on Jaguar at ORNL– 93% of machine (28,672 cores)– MPI-OpenMP mixed model on quad-core nodes

(7168 MPI procs)– three interruptions total (simple node failure) with

2 10+ hour runs– Wrote 65 TB of data at >20 GB/sec (25 TB for post

analysis)– IO overhead ~3% of wall clock time.– Mixed IO methods of synchronous MPI-IO and

POSIX IO configured in the XML file

Page 19: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

19

GPSC

Chimera IO Performance (Supernova code)

512 1024 2048 4096 81920.01

0.1

1

10

100

1000

10000 Chimera I/O Performance (weak scaling)

MPI MPI_CIO POSIX ORIG_H5

number of cores

To

tal I

/O T

ime

pe

r R

est

art

Du

mp

2x scaling

• Plot minimum value from 5 runs with 9 restarts/run• Error bars show maximum time for the method.

Page 20: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

20

GPSC

Chimera Benchmark Results· Why ADIOS is better than pHDF5?

ADIOS_MPI_IO vs. pHDF5 w/ MPI Indep. IO driver

ADIOS_MPI_IO

Function # of calls Time

write 2560 2218.28

MPI_File_open 2560 95.80

MPI_Recv 2555 24.68

buffer_write 6136320 10.29

fopen 512 9.86

bp_calsize_stringtag 3179520 4.44

other -- ~40

pHDF5

Function # of calls Time

write 144065 33109.67

MPI_Bcast(sync) 314800 12259.30

MPI_File_open 2560 325.17

MPI_File_set_size 2560 23.76

MPI_Comm_dup 5120 16.34

H5P,H5D,etc -- 8.71

other -- ~20

Use 512 cores, 5 restart dumps.

Conversion time on 1 processor for the 2048 core job = 3.6s (read) + 5.6s (write) + 6.9 (other) = 18.8 s

Number above are sum among all PEs (parallelism not shown)

Page 21: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

21

GPSC

DataTap· A research transport to study asynchronous data movement

· Uses server directed I/O to maintain high bandwidth, low overhead for data extraction

· I/O scheduling is performed to the perturbation caused by asynchronous I/O

Page 22: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

22

GPSC

DataTap scheduler· Due to perturbations caused by asynchronous I/O, the

overall performance of the application may actually get worse

· We schedule the data movement using application state information to prevent asynchronous I/O from interfering with MPI communication

· 800 GB of data.– Schedule I/O takes 2x longer to move data. Overhead is 2x less.

Page 23: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

25

GPSC

The flood of data.· Petascale GTC runs will produce 1PB per simulation.

· Couple GTC with Edge core (core-edge coupling).– 4 PB of data per run. – Can’t store all of GTC runs at ORNL unless we go to tape. ( 12 days to grab data from

tape if we get 1GB/sec).– 1.5 FTE looking at the the data.– Need more ‘real-time’ analysis of data.– Workflows, data-in-transit (IO graphs), …?

· Can we create a staging area with “fat-nodes”– Move data from computational nodes to fat nodes using network of HPC resource.– Reduce data on fat-nodes.– Allow users to “plug-in” analysis routines on “fat-nodes”– How Fat?

· Shared memory helps (don’t have to paralyze parallelize-all analysis codes.· Typical upper bound of codes we studied write 1/20 th of memory/core for analysis. Want 1/20th of

resources (5% overhead). Need 2x memory per core for analysis (2x overhead for memory we need (in data + out data).

· On Cray at ORNL this means we will have roughly 750 sockets (quad core) for fat memory with shared memory of 34 GB of shared memory.

· Also useful for codes which require memory but not as many nodes.

· Can we have shared memory on this portion?

· What are the other solutions?

Page 24: Petascale  Science with GTC/ADIOS

HPC User Forum 9/10/08 [email protected]

Managed by UT-Battellefor the Department of Energy

26

GPSC

Conclusions

· GTC is a code which is scaling to the petascale computers.· GBP, Cray XT.· New changes are new science and new IO (ADIOS).· Major challenge in the future is speeding up the data analysis.

· ADIOS is an IO componentization.– ADIOS is being integrated integrated into Kepler.– Achieved over 50% peak IO performance for several codes on

Jaguar.– Can change IO implementations at runtime.– Metadata is contained in XML file.

– Petascale science starts with petascale applications.– Need enabling technologies to scale.– Need to rethink ways to do science.