1 a common application platform (cap) for suragrid -mahantesh halappanavar, john-paul robinson, enis...

20
1 A Common Application Platform (CAP) for SURAgrid -Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore. SURAgrid All-hands Meeting, 27 September, 2007, Washington D.C.

Upload: winfred-wilson

Post on 17-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

1

A Common Application Platform (CAP) for SURAgrid

-Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore.

SURAgrid All-hands Meeting, 27 September, 2007, Washington D.C.

2

Introduction

Problem Statement:

“How to quickly grid-enable scientific applications to exploit SURAgrid resources ?”

Goals for Grid-enabling Applications: Dynamic resource discovery Collective resource utilization Simple Job Management and Accounting Minimal Programming Efforts

3

Identifying Patterns

Algorithm StructuresSupport StructuresRelationships

4

Basic Process

Problems ProgramsAlgorithms Implementation

Algorithm Structures

Supporting Structures

Programming Environments

5

Algorithm Structures

1. Organize by Tasks Task Parallelism Divide and Conquer

2. Organize by Data Decomposition Geometric Decomposition Recursive Data

3. Organize By Flow of Data Pipeline Event-Based Coordination

How to organize? (Linear and Recursive)

6

Support Structures

Program Structures

1. SPMD

2. Master/Worker

3. Loop Parallelism

4. Fork/Join

Data Structures

1. Shared Data

2. Shared Queue

3. Distributed Array

Problems ProgramsAlgorithms Implementation

7

Relationships: AS and SS

AS

SS

Task Parallelism

Divide & Conquer

Geometric Decomposition

Recursive Data

Pipeline Event-based Coordination

SPMD **** *** **** ** *** **Loop Parallelism **** ** ***Master/Worker **** ** * * * *Fork/Join ** **** ** **** ****

Relationship between Supporting Structures (SS) patterns and Algorithm Structure (AS) patterns.

8

Relationships: SS and PE

SS PE OpenMP MPI Java

SPMD *** **** **Loop Parallelism **** * ***Master-Worker ** *** ***Fork-Join *** ****

Relationship between Supporting Structures (SS) patterns and Programming Environments (PE).

9

Important Observation

“This (SPMD) pattern is by far the most commonly used pattern for structuring parallel programs. It is particularly relevant for MPI programmers and

problems using the Task Parallelism and Geometric Decomposition patterns. It has also

proved effective for problems using the Divide and Conquer, and Recursive Data patterns.”

Single Program, Multiple Data (SPMD). This is the most common way to organize a prallel program, especially on MIMD computers. The idea is that a single program is written and loaded onto each node of a parallel computer. Each copy of the single program runs independently (aside from coordination events), so the instruction streams executed on each node can be completely different. The specific path through the code is in part selected by the node ID.

10

The Computing Continuum

OpenMPMPIJava

11

Scientific Applications

What are they?How are they built?

12

Scientific Applications

Life Sciences: Biology, Chemistry, …

Engineering: Aerospace, Civil, Mechanical, Environmental

Physics: QCD, Black Holes, …

….

The SCaLeS Reports: http://www.pnl.gov/scales/

13

Building Blocks

OpenMP; MPI; BLACS, … Metis/ParMetis, Zoltan, Chaco, … BLAS and LAPACK, … ScaLapack, MUMPS, SuperLU, … PETSc, Aztec, …

“A core requirement of many engineering and scientific applications is the need to solve linear and non-linear systems of equations, eigensystems and other related

problems.” – The Trilinos Project

14

ScaLAPACK

ScaLAPACK

BLAS

LAPACK BLACS

MPI/PVM/...

PBLASGlobal

Local

platform specific

Level 1/2/3Level 1/2/3

Linear systems, least squares, singular value decomposition,

eigenvalues.

Linear systems, least squares, singular value decomposition,

eigenvalues.

Communication routines targeting

linear algebra operations.

Communication routines targeting

linear algebra operations.

Parallel BLAS.

Parallel BLAS.

Communication layer (message

passing).

Communication layer (message

passing).

http://acts.nersc.gov/scalapack

15

PETSc

Computation and Communication KernelsMPI, MPI-IO, BLAS, LAPACK

Profiling Interface

PETSc PDE Application Codes

Object-OrientedMatrices, Vectors, Indices

GridManagement

Linear SolversPreconditioners + Krylov Methods

Nonlinear Solvers,Unconstrained Minimization

ODE Integrators Visualization

Interface

Portable, Extensible Toolkit for Scientific Computation

16

Observation

“Explicit message passing will remain the dominant programming model for the foreseeable future because of the

huge investment in application codes.”

-Jim Tomkis, Bob Balance, and Sue Kelly, ASC PI Meeing, Nevada, Feb 2007

17

Common Application Platform

Basic ArchitectureBuilding BlocksConclusions

18

Basic Architecture

SiteA

MetaScheduler (MS)

Site1

Site2

Site3

SURAgrid Portal

Command-Line(GSISSH, etc.)

Browser

A

B

19

Building Blocks

MPICH-G2

20

Conclusions

Powershift ! Onus is now on System Administrators

Load Balancing Minimize communication costs Limits: ScaLAPACK (Latency<500 ms) Dynamic Redistribution of Work

Heterogeneous Environment Issues with floating-point operations