teragrid simo niskala teemu pasanen. teragrid general objectives resources service architecture...

22
TeraGrid Simo Niskala Teemu Pasanen

Post on 22-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

TeraGrid

Simo NiskalaTeemu Pasanen

TeraGrid

• general

• objectives

• resources

• service architecture– grid services– teragrid application services

• Using TeraGrid

General• An effort to build and deploy the world's largest, fastest,

distributed infrastructure for open scientific research• Extensible Terascale Facility, ETF• Funded by National Science Foundation, NSF

– Total of $90 million at the moment• Partners:

– Argonne National Laboratory, ANL– National Center for Supercomputing Applications, NCSA– San Diego Supercomputing Center, SDSC– Center for Advanced Computing Research (CalTech), CACR– Pittsburgh Supercomputing Center, PSC– New partners in September 2003:

• Oak Ridge National Laborarory, ORNL• Purdue University• Indiana University• Texas Advanced Computing Center, TACC

• Provides Terascale computing power by connecting several supercomputers with Grid technologies

• Will offer 20 TFLOPS when ready in 2004– first 4 TFLOPS will be available for use around Jan 2004

Objectives

• increase computational capabilities for research community with geographically distributed resources

• deploy a distributed ”system” using Grid technologies rather than ”distributed computer”

• Define an open an extensible infrastructure– focus on integrating “resources” rather than “sites”– adding resources will require significant, but not unreasonable,

effort• supporting key protocols and specifications (e.g. authorization,

accounting)

– supporting heterogeneity while exploiting homogeneity• balancing complexity and uniformity

Resources• 4 clusters at ANL, Caltech, NCSA and SDSC

– Itanium 2-based Linux clusters– Total computing capacity of 15 TFLOPS

• Terascale Computing System, TCS-1 at PSC– AlphaServer-based Linux cluster– 6 TFLOPS

• HP Marvel system at PSC– Set of SMP machines– 32*1.15GHz(Alpha EV 67) + 128GB/machine

• ~1 Petabyte of networked storage• 40 Gb/s backplane network

Resources

• Backplane network– consists of 4 10Gb/s optical fiber channels– enables ”machine room” network across

sites– optimized for peak requirements– designed to scale to a much smaller number

of sites than general WAN– separate TeraGrid resource

• only for data transfer needs of TeraGrid resources

Resources

Resources

Service architecture

• Grid Services (Globus toolkit)• TeraGrid Application Services

Grid Services

Service Layer Functionality TeraGrid implementation

Advanced Grid Services super schedulers, resource discovery services, repositories, etc.

SRB, MPICH-G2, distributed accounting, etc.

Core Grid Services (Collective layer)

TeraGrid information service, advanced data movement, job scheduling, monitoring

GASS, MDS, Condor-G, NWS

Basic Grid Services (Resource layer)

Authentication and accessResource allocation/MgmtData access/MgmtResource Information ServiceAccounting

GSI-SSH, GRAM, Condor, GridFTP, GRIS

Advanced Grid Services

• on top of Core and Basic Services• enhancements required for TeraGrid• for example Storage Resource Broker, SRB• additional capabilities• new services possible in future

Core Grid Services

• built on Basic Grid Services• focus on the coordination of multiple services• mostly implementations of Globus services

– MDS, GASS etc.• supported by most TeraGrid resources

Basic Grid Services

• focus on sharing single resources• implementations of i.e. GSI and GRAM• should be supported by all TeraGrid

resources

Grid Services

• provide clear specifications for what a resource must do in order to participate

• only specifications defined, implementations are left open

TeraGrid Application Services

• enable running of applications on a heterogenous system

• on top of basic and core Grid services• under development• new service specifications to be added

by current and new TeraGrid sites

TeraGrid Application Services

Service Objective

Basic Batch Runtime Supports running static-linked binaries

High Throughput Runtime (Condor-G) Supports running naturally distributed applications using Condor-G

Advanced Batch Runtime Supports running dynamic-linked binaries

Scripted Batch Runtime Supports scripting (including compile)

On-Demand / Interactive Runtime Supports interactive applications

Large-Data Supports very large data sets, data pre-staging, etc.

File-Based Archive Supports GridFTP interface to data services

Using TeraGrid• Access

– account• account request form• Globus certificate for authentication and

Distinguished Name (DN) entry– logging in

• single-site access requires SSH• multiple-site acccess requires GSI-enabled SSH

Using TeraGrid• Transferring files

– Storage Resource Broker (SRB)• data management tool for storing large data sets

accross distributed, heterogenous storage– High Performance Storage System (HPSS)

• moving entire directory structures between systems– SCP

• copying users files to TeraGrid platforms using SCP– Globus-url-copy

• transferring files between sites using GridFTP– GSINCFTP

• Uses proxy for authentication• additional software to Globus toolkit

Using TeraGrid

• Programming Environments– IA-64 clusters (in NCSA, SDSC, Caltech, ANL)

• Intel (default), Gnu, mpich-gm (default mpi-compiler)

– PSC clusters• HP (default), Gnu

– Softenv software• manages users environments through symbolic

keys

Using TeraGrid

• Running jobs– Grid Tools

• Condor-G • Globus toolkit

– PBS (portable batch system)

TeraGrid

www.teragrid.org