Building Coupled Parallel and Distributed Scientific
Simulations with InterComm
Alan Sussman
Department of Computer Science &Institute for Advanced Computer Studies
http://www.cs.umd.edu/projects/hpsl/chaos/ResearchAreas/ic/
A Simple Example (MxN coupling)
p1
p2
p3
p4
p1
p2
M=4 processors N=2 processors
InterComm: Data exchange at the borders (transfer and control)
Visualization station
Parallel Application LEFTSIDE(Fortran90, MPI-based)
Parallel Application RIGHTSIDE(C++, PVM-based)
2-D Wave Equation
The Problem
Coupling codes (components), not modelsCodes written in different languages Fortran (77, 95), C, C++/P++, …
Both parallel (shared or distributed memory) and sequentialCodes may be run on same, or different resources One or more parallel machines or clusters (the
Grid) concentrate on this today
Space Weather Prediction
InnerMagnetosphere
Solar CoronaMAS
Solar WindENLIL Ionosphere
T*CGM
Active Regions
SEP
Ring Current
Radiation Belts
Geocorona and Exosphere
Plasmasphere
MI Coupling
MagnetosphereLFM
Our driving application:
Production of an ever-improving series of comprehensive scientific models of the Solar Terrestrial environment
Codes model both large scale and microscale structures and dynamics of the Sun-Earth system
What is InterComm?
A programming environment and runtime library For performing efficient, direct data
transfers between data structures (multi-dimensional arrays) in different programs
For controlling when data transfers occur For deploying multiple coupled programs in
a Grid environment
Deploying Components
Infrastructure for deploying programs and managing interactions between them
1. Starting each of the models on the desired Grid resources
2. Connecting the models together via the InterComm framework
3. Models communicate via the import and export calls
Motivation
Developer has to deal with … Multiple logons Manual resource discovery and allocation Application run-time requirements
Process for launching complex applications with multiple components is Repetitive Time-consuming Error-prone
Deploying Components – HPCALE
A single environment for running coupled applications in the high performance, distributed, heterogeneous Grid environmentWe must provide: Resource discovery: Find resources that can run the job,
and automate how model code finds the other model codes that it should be coupled to
Resource Allocation: Schedule the jobs to run on the resources – without you dealing with each one directly
Application Execution: start every component appropriately and monitor their execution
Built on top of basic Web and Grid services (XML, SOAP, Globus, PBS, Loadleveler, LSF, etc.)
Resource Discovery
Discover Resource
Register Resource
Register Software
Update Resource
InformationRepository
XJD
Cluster1
Cluster2
Cluster3
XSDXRD
XJD
SSL-Enabled Web Server
Resource Allocation
Allocate Resource
Server
XJD
1. Assign Job ID
Cluster1
Cluster2
Cluster32. Create scratch directories
4. Allocate requests (with retry)- Send script to local resource manager- Lock file creation
Success/Fail
3. Create/manage app specific runtime info
Application Launching
Launch Application
Server
Cluster1
Cluster2
Cluster3
4. Launch using appropriate launching method
Success/Fail
3. Split single resource across components
2. Deal with runtime environment (e.g., PVM)
XJD
5. Transfer created files to user’s machine (get)
File Transfer Service
1. Transfer input files to resources (put)
XJD
Current HPCALE implementation Discovery:
Simple resource discovery service implemented Resources register themselves with the service, including what
codes they can run Resource Allocation
Scripts implemented to parse XJD files and start applications on desired resources
Works with local schedulers – e.g., PBS, LoadLeveler Job Execution:
Startup service implemented – for machines without schedulers, and with schedulers, for PBS (Linux clusters) and LoadLeveler (IBM SP, p Series)
Works via ssh and remote execution – can startup from anywhere, like your home workstation
Need better authentication, looking at GSI certificates Building connections automated through startup service
passing configuration parameters interpreted through InterComm initialization code
Current statushttp://www.cs.umd.edu/projects/hpsl/chaos/ResearchAreas/ic/
First InterComm release (data movement) April 2005 source code and documentation included Release includes test suite and sample applications for multiple
platforms (Linux, AIX, Solaris, …) Support for F77, F95, C, C++/P++
InterComm 1.5 released Summer 2006 new version of library with export/import support, broadcast of
scalars/arrays between programs HPCALE adds support for automated launching and resource
allocation, tool for building XJD files for application runs release is imminent for HPCALE (documentation still not complete)
The team at Maryland
Il-Chul Yoon (graduate student) HPCALE
Norman Lo (staff programmer) InterComm testing, debugging, packaging ESMF integration
Shang-Chieh Wu (graduate student) InterComm control design and
implementation
End of Slides
Data Transfers in InterComm
Interact with data parallel (SPMD) code used in separate programs (including MPI)
Exchange data between separate (sequential or parallel) programs, running on different resources (parallel machines or clusters)
Manage data transfers between different data structures in the same application
CCA Forum refers to this as the MxN problem
InterComm Goals
One main goal is minimal modification to existing programs In scientific computing: plenty of legacy code Computational scientists want to solve their
problem, not worry about plumbing
Other main goal is low overhead and efficient data transfers Low overhead in planning the data transfers Efficient data transfers via customized all-to-all
message passing between source and destination processes
Coupling OUTSIDE components
Separate coupling information from the participating components Maintainability – Components can be
developed/upgraded individually Flexibility – Change participants/components easily Functionality – Support variable-sized time interval
numerical algorithms or visualizations
Matching information is specified separately by application integrator
Runtime match via simulation time stamps
Controlling Data Transfers
A flexible method for specifying when data should be moved Based on matching export and import calls
in different programs via timestamps Transfer decisions take place based on a
separate coordination specification Coordination specification can also be used to
deploy model codes and grid translation/interpolation routines (how many and where to run codes)
Example
Simulation exports every time step, visualization imports every 2nd time step
Visualization station
Simulation Cluster
Separate codes from matching
define region Sr12define region Sr4define region Sr5...Do t = 1, N, Step0 ... // computation jobs export(Sr12,t) export(Sr4,t) export(Sr5,t)EndDo
define region Sr0...Do t = 1, M, Step1 import(Sr0,t) ... // computation jobsEndDo
Importer Ap1
Exporter Ap0
Ap1.Sr0
Ap2.Sr0
Ap4.Sr0
Ap0.Sr12
Ap0.Sr4
Ap0.Sr5
Configuration file#Ap0 cluster0 /bin/Ap0 2 ...Ap1 cluster1 /bin/Ap1 4 ...Ap2 cluster2 /bin/Ap2 16 ...Ap4 cluster4 /bin/Ap4 4#Ap0.Sr12 Ap1.Sr0 REGL 0.05Ap0.Sr12 Ap2.Sr0 REGU 0.1Ap0.Sr4 Ap4.Sr0 REG 1.0#
Plumbing
Bindings for C, C++/P++, Fortran77, Fortran95
Currently external message passing and program interconnection via PVM
Each model/program can do whatever it wants internally (MPI, pthreads, sockets, …) – and start up by whatever mechanism it wants