grid computing and netsolve / gridsolve

28
1 1 Grid Computing and NetSolve / GridSolve Jack Dongarra 2 Between Now and the End Today April 5 th Grids and NetSolve April 12 th and 19 th Dr. Langou on Iterative Methods in Linear Algebra April 26 th Dr. Cronk on Tools for Debugging and Performance Analysis May 1 st Project report are due (email to [email protected]) May 3 rd Presentation of class projects Starting at 12:30

Upload: others

Post on 11-Jun-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grid Computing and NetSolve / GridSolve

1

1

Grid Computing andNetSolve / GridSolve

Jack Dongarra

2

Between Now and the EndToday April 5th

Grids and NetSolveApril 12th and 19th

Dr. Langou on Iterative Methods in Linear AlgebraApril 26th

Dr. Cronk on Tools for Debugging and Performance Analysis

May 1st

Project report are due (email to [email protected])May 3rd

Presentation of class projects Starting at 12:30

Page 2: Grid Computing and NetSolve / GridSolve

2

3

More on the In-Class Presentations

Start at 12:30 on Wednesday, 5/3/06Roughly 20 minutes eachUse slidesDescribe your project, perhaps motivate via applicationDescribe your method/approachProvide comparison and results

See me about your topic

4

No in Class Final Exam

Grade will be based on Homework

30%

Project presentation30%

Project write up30%

Class participation10%

Page 3: Grid Computing and NetSolve / GridSolve

3

5

Distributed Computing

Concept has been around for two decadesBasic idea: run scheduler across systems to runs processes on least-used systems first

Maximize utilizationMinimize turnaround time

Have to load executables and input files to selected resource

Shared file systemFile transfers upon resource selection

6

Examples of Distributed Computing

Workstation farms, Condor flocks, etc.Generally share file system

SETI@home project, United Devices, etc.Only one source code; copies correct binary code and input data to each system

Napster, Gnutella, Bit Torrent: file/data sharingNetSolve

Runs numerical kernel on any of multiple independent systems, much like a Grid solution

Page 4: Grid Computing and NetSolve / GridSolve

4

7

The Grid Problem

Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resourceFrom “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals --assuming the absence of…

central location,central control, omniscience, existing trust relationships.

8

Elements of the ProblemResource sharing

Computers, storage, sensors, networks, …Sharing always conditional: issues of trust, policy, negotiation, payment, …

Coordinated problem solvingBeyond client-server: distributed data analysis, computation, collaboration, …

Dynamic, multi-institutional virtual organisationsCommunity overlays on classic org structuresLarge or small, static or dynamic

Page 5: Grid Computing and NetSolve / GridSolve

5

9

Grid vs. Internet/Web Services?

We’ve had computers connected by networks for 20 yearsWe’ve had web services for 5-10 yearsThe Grid combines these things, and brings additional notions

Virtual OrganizationsInfrastructure to enable computation to be carried out across these

Authentication, monitoring, information, resource discovery, status, coordination, etc

Can I just plug my application into the Grid?No! Much work to do to get there!

10

What is Grid Computing? Resource sharing & coordinated problem

solving in dynamic, multi-institutional virtual organizations

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

IMAGING INSTRUMENTS

COMPUTATIONALRESOURCES

LARGE-SCALE DATABASES

DATA ACQUISITION ,ANALYSIS

ADVANCEDVISUALIZATION

Page 6: Grid Computing and NetSolve / GridSolve

6

11

The Computational Grid is……a distributed control infrastructure that allows applications to treat compute cycles as commodities.Power Grid analogy

Power producers: machines, software, networks, storage systemsPower consumers: user applications

Applications draw power from the Grid the way appliances draw electricity from the power utility.

SeamlessHigh-performanceUbiquitousDependable

12

Introduction

Grid systems can be classified depending on their usage:

ComputationalGrid

GridSystems

Collaborative

Data Grid

ServiceGrid

HighThroughput

On Demand

Multimedia

DistributedSupercomputing

Page 7: Grid Computing and NetSolve / GridSolve

7

13

The Grid

14

The Grid Architecture Picture

Resource Layer

High speed networks and routers

Computers Data bases Online instruments

Service Layers

User Portals

Authentication

Co- Scheduling

Naming & Files Events

Grid Access & InfoProblem SolvingEnvironments

Application SciencePortals

Resource Discovery& Allocation Fault Tolerance

Software

Page 8: Grid Computing and NetSolve / GridSolve

8

15

Basic Grid Architecture

Clusters and how grids are different than clustersDepartmental Grid ModelEnterprise Grid ModelGlobal Grid Model

16

Examples of Grids

TeraGrid – NSF funded linking 5 major research sites at 40 Gbs (www.teragrid.org)European Union Data Grid – grid for applications in high energy physics, environmental science, bioinformatics (www.eu-datagrid.org)Access Grid – collaboration systems using commodity technologies (www.accessgrid.org)Network for Earthquake Engineering Simulations Grid - grid for earthquake engineering (www.nees.org)

Page 9: Grid Computing and NetSolve / GridSolve

9

17

Teragrid Network

18

...and enabling broad based collaborations

GloriadNLR UltraScience NetTeraGrid

Page 10: Grid Computing and NetSolve / GridSolve

10

19

Examples of GridProgramming Technologies

MPICH-G2: Grid-enabled message passingCoG Kits, GridPort: Portal construction, based on N-tier architecturesGDMP, Data Grid Tools, SRB: replica management, collection managementCondor-G: simple workflow managementLegion: object models for Grid computingNetSolve/GridSolve: Network enabled solverCactus: Grid-aware numerical solver framework

Note tremendous variety, application focus

20

Atmospheric Sciences Grid

Real time data

Data Fusion

General Circulation model

Regional weather model

Photo-chemical pollution model Particle dispersion model

TopographyDatabase

TopographyDatabase

VegetationDatabase

VegetationDatabaseBushfire modelEmissions

InventoryEmissions Inventory

Page 11: Grid Computing and NetSolve / GridSolve

11

21

Standard Implementation

GASS

Real time data

Data Fusion

General Circulation model

Regional weather model

Photo-chemical pollution model Particle dispersion model

TopographyDatabase

TopographyDatabase

VegetationDatabase

VegetationDatabaseEmissions

InventoryEmissions Inventory

MPIMPI

MPI

GASS/GridFTP/GRC

MPI

MPI

Bushfire model GASS

Change Models

22

MPICH-G2: A Grid-Enabled MPI

A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments

Based on the Argonne MPICH implementation of MPI (Gropp and Lusk)

Globus services for authentication, resource allocation, executable staging, output, etc.Programs run in wide area without changeSee also: MetaMPI, PACX, STAMPI, MAGPIE

www.globus.org/mpi

Page 12: Grid Computing and NetSolve / GridSolve

12

23

Useful References

Book (Morgan Kaufman)www.mkp.com/grids

Perspective on Grids“The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, IJSA, 2001www.globus.org/research/papers/anatomy.pdf

All URLs in this section of the presentation, especially:

www.gridforum.org, www.grids-center.org, www.globus.org

24

Globus Grid Services

The Globus toolkit provides a range of basic Grid services

Security, information, fault detection, communication, resource management, ...

These services are simple and orthogonalCan be used independently, mix and matchProgramming model independent

For each there are well-defined APIsStandards are used extensively

E.g., LDAP, GSS-API, X.509, ... You don’t program in Globus, it’s a set of tools like Unix

Page 13: Grid Computing and NetSolve / GridSolve

13

25

Distributed and Parallel Systems

Distributedsystemshetero-geneous

Massivelyparallelsystemshomo-geneous

Grid

bas

ed

Com

putin

gBe

owul

f clu

ster

Netw

ork

of w

sCl

uste

rs w

/sp

ecia

l int

erco

nnec

t

Entro

pia

ASCI

Tflo

ps(1

30 T

flop/

s)

Gather (unused) resourcesSteal cyclesSystem SW manages resourcesSystem SW adds value10% - 20% overhead is OKResources drive applicationsTime to completion is not criticalTime-shared

Bounded set of resources Apps grow to consume all cyclesApplication manages resourcesSystem SW gets in the way5% overhead is maximumApps drive purchase of equipmentReal-time constraintsSpace-shared

SETI

@ho

me

(27

Tflo

p/s)

Para

llel D

ist m

em

26

GridSolve:Basic Usage Scenarios

Grid based numerical library routines

User doesn’t have to have software library on their machine, LAPACK, SuperLU, ScaLAPACK, PETSc, AZTEC, ARPACK

Task farming applications“Pleasantly parallel” executioneg Parameter studies

Remote application executionComplete applications with user specifying input parameters and receiving output

“Blue Collar” Grid Based Computing

Does not require deep knowledge of network programmingLevel of expressiveness right for many usersUser can set things up, no “su” requiredIn use today, up to 200 servers in 9 countries

Can plug into Globus, Condor, NINF, …

Page 14: Grid Computing and NetSolve / GridSolve

14

27

NetSolve Network Enabled Server

NetSolve is an example of a grid based hardware/software server.Easy-of-use paramountBased on a RPC model but with …

resource discovery, dynamic problem solving capabilities, load balancing, fault tolerance asynchronicity, security, …

Other examples are NEOS from Argonne and NINF Japan.Use resources, not tie together geographically distributed resources, for a single application.

28

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

IBP Depot

Page 15: Grid Computing and NetSolve / GridSolve

15

29

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

A, BIBP Depot

30

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

HandlebackIBP Depot

Page 16: Grid Computing and NetSolve / GridSolve

16

31

NetSolve: The Big Picture

AGENT(s)

AC

S1 S2

S3 S4

Client

Answer (C)

S2 !

Request

Op(C, A, B)

Matlab

Mathematica

C, Fortran

Web

Schedule

Database

No knowledge of the grid required, RPC like.

A, B

OP, handle

IBP Depot

32

NetSolve Agent

Name server for the NetSolve system.Information Service

client users and administrators can query the hardware and software services available.

Resource schedulermaintains both static and dynamic information regarding theNetSolve server components touse for the allocation of resources

Agent

Page 17: Grid Computing and NetSolve / GridSolve

17

33

NetSolve Agent

Resource Scheduling (cont’d):CPU Performance (LINPACK).Network bandwidth, latency.Server workload.Problem size/algorithm complexity.Calculates a “Time to Compute.” for each appropriate server.Notifies client of most appropriate server.

Agent

34

NetSolve - Load BalancingNetSolve agent :

predicts the execution times and sorts the servers

Prediction for a server based on :• Its distance over the network

- Latency and Bandwidth- Statistical Averaging

• Its performance (LINPACK benchmark)• Its workload• The problem size and the algorithm complexity

Cached data Quick estimate

workload out of date ?

Page 18: Grid Computing and NetSolve / GridSolve

18

35

Function Based Interface.Client program embeds call from NetSolve’s API to access additional resources.Interface available to C, Fortran, Matlab, Mathematica, and Java.Opaque networking interactions.NetSolve can be invoked using a variety of methods: blocking, non-blocking, task farms, …

NetSolve Client

Client

36

NetSolve Client

Intuitive and easy to use.Matlab Matrix multiply e.g.:

A = matmul(B, C);

A = netsolve(‘matmul’, B, C);

• Possible parallelisms hidden.

Client

Page 19: Grid Computing and NetSolve / GridSolve

19

37

NetSolve Client

i. Client makes request to agent.

ii. Agent returns list of servers.

iii. Client tries each one in turn untilone executes successfully or list is exhausted.

Client

38

NetSolve - MATLAB Interface

>> define sparse matrix A>> define rhs>> [x, its] = netsolve( ‘itmeth’, ‘petsc’, A, rhs );…>> [x, its] = netsolve( ‘itmeth’, ‘aztec’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘superlu’, A, rhs ); >> [x, its] = netsolve( ‘solve’, ‘ma28’, A, rhs );

Synchronous Call

Asynchronous Calls also available

Page 20: Grid Computing and NetSolve / GridSolve

20

39

NetSolve - FORTRAN Interface

parameter( MAX = 100)double precision A(MAX,MAX), B(MAX)integer IPIV(MAX), N, INFO, LWORKinteger NSINFO

call DGESV(N,1,A,MAX,IPIV,B,MAX,INFO)

Easy to ‘switch’ to NetSolve

call NETSL(‘DGESV()’,NSINFO,N,1,A,MAX,IPIV,B,MAX,INFO)

40

Hiding the Parallel Processing

User maybe unaware of parallel processing

NetSolve takes care of the starting the message passing system, data distribution, and returning the results.

Page 21: Grid Computing and NetSolve / GridSolve

21

41

Problem Description Specification

Specifies the calling interface between GridSolve and the service routine

Original NetSolve problem description filesStrange notationDifficult for users to understand

Previous attempts to simplify involved GUI front-ends

In GridSolve, the format is totally re-designed Specified in a manner similar to normal function prototypesSimilar to Ninf

42

GridSolve Problem Description (DGESV)

SUBROUTINE dgesv(IN int N, IN int NRHS,INOUT double A[LDA][N], IN int LDA,OUT int IPIV[N], INOUT double B[LDB][NRHS],IN int LDB, OUT int INFO)

"This solves Ax=b using LAPACK"LANGUAGE = "FORTRAN"LIBS = "$(LAPACK_LIBS) $(BLAS_LIBS)"COMPLEXITY = "2.0*pow(N,3.0)*(double)NRHS"MAJOR="COLUMN"

SUBROUTINE DGESV(N,NRHS,A,LDA,IPIV,B,LDB,INFO)INTEGER INFO, LDA, LDB, N, NRHSINTEGER IPIV( * )DOUBLE PRECISION A( LDA, * ), B( LDB, * )

Original Fortran Subroutine:

GridSolve IDL Specification:

Page 22: Grid Computing and NetSolve / GridSolve

22

43

GridSolve Interface Definition LanguageData types: int, char, float, doubleArgument passing modes:

IN -- input only; not modifiedINOUT -- input and outputOUT -- output onlyVAROUT -- variable length output only dataWORKSPACE -- server-side allocation of workspace; not passed as part of calling sequence

Argument sizeSpecified as expression using scalar arguments, e.g. ddot(IN int n, IN double dx[n*incx], IN int incx, …All typical operators supported (+, -, *, /, etc).

44

Problem AttributesMAJOR

Row or column major; depends on the implementation of the service routine

LANGUAGELanguage in which the service routine is implemented; currently C or Fortran

LIBSAdditional libraries to be linked

COMPLEXITYTheoretical complexity of the service routine, specified in terms of the arguments

Page 23: Grid Computing and NetSolve / GridSolve

23

45

Building the Servicescd GridSolve/src/problemmake check

When building a new service, the server should be restarted, but thereafter it is not necessary.

For more detailed documentation, consult the manual:

cd GridSolve/doc/ugmakeghostview ug.ps

46

Task Farming -Multiple Requests To Single Problem

A Solution:Many calls to netslnb( ); /* non-blocking */

Farming Solution:Single call to netsl_farm( );

Request iterates over an “array of input parameters.”

Adaptive scheduling algorithm.

Useful for parameter sweeping, and independently parallel applications.

Page 24: Grid Computing and NetSolve / GridSolve

24

47

Data Persistence

Chain together a sequence of NetSolve requests.Analyze parameters to determine data dependencies. Essentially a DAG is created where nodes represent computational modules and arcs represent data flow.Transmit superset of all input/output parameters and make persistent near server(s) for duration of sequence execution.Schedule individual request modules for execution.

48

Server Software Repository

Dynamic downloading of new software.Enhance servers capabilities without shutdown and restart.Repository maintained independently of server.

Hardware SoftwareHardware Software

NetSolve Server

Page 25: Grid Computing and NetSolve / GridSolve

25

49

NetSolve: A Plug into the Grid

NetSolve

C Fortran

Globusproxy

NetSolveproxy

Ninfproxy

Condorproxy

Gridmiddleware

Resource Discovery

System Management Resource Scheduling

Fault Tolerance

50

NetSolve: A Plug into the Grid

NetSolve

C Fortran

Globus NetSolveservers

Ninfservers

NetSolveservers

Condor

NetSolveservers

Globusproxy

NetSolveproxy

Ninfproxy

Condorproxy

Grid back-ends

Gridmiddleware

Resource Discovery

System Management Resource Scheduling

Fault Tolerance

Page 26: Grid Computing and NetSolve / GridSolve

26

51

NetSolve: A Plug into the Grid

NetSolve

C Fortran

Matlab Mathematica Custom

Globus NetSolveservers

Ninfservers

NetSolveservers

Condor

NetSolveservers

Globusproxy

NetSolveproxy

Ninfproxy

Condorproxy

PSEfront-ends

Grid back-ends

SCIRun

Gridmiddleware

Remote procedure call

Resource Discovery

System Management Resource Scheduling

Fault Tolerance

52

University of Tennessee Deployment: Scalable Intracampus Research Grid SInRG

Federated Ownership: CS, ChemEng., Medical School, Computational Ecology, El. Eng.Real applications, middleware development, logistical networking

The Knoxville Campus has two DS-3 commodity Internet connections and one DS-3 Internet2/Abilene connection. An OC-3 ATM link routes IP traffic between the Knoxville campus, National Transportation Research Center, andOak Ridge National Laboratory. UT participates in several national networking initiatives including Internet2 (I2),Abilene, the federal Next Generation Internet (NGI) initiative, Southern Universities Research Association (SURA)Regional Information Infrastructure (RII), and Southern Crossroads (SoX).

The UT campus consists of a meshed ATM OC-12 being migrated over to switched Gigabit by early 2002.

Page 27: Grid Computing and NetSolve / GridSolve

27

53

54

NetSolve Monitor

http://anaka.cs.utk.edu:8080/monitor/signed.html

Page 28: Grid Computing and NetSolve / GridSolve

28

55

Thanks

Fran Berman Director, San Diego Supercomputer Center

Jay BoisseauDirector, Texas Advanced Computing Center