extreme computing on the distributed european infrastructure for supercomputing …gentzsch.pdf ·...

22
RI-222919 www.deisa.eu Extreme Computing on the Distributed European Infrastructure for Supercomputing Applications Wolfgang Gentzsch The DEISA Project & Board of Directors of OGF gentzsch at rzg.mpg.de OGF 25 Cloud Workshop

Upload: others

Post on 13-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

RI-222919

www.deisa.eu

Extreme Computing on the Distributed

European Infrastructure for

Supercomputing Applications

Wolfgang Gentzsch

The DEISA Project & Board of Directors of OGF

gentzsch at rzg.mpg.de

OGF 25Cloud Workshop

Page 2: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 2

RI-222919

HPC Centers

� HPC Centers are service providers, for past 30 years

� Services are computing, storage, applications, data, and other IT services

� They serve (local) research, education, and industry (HLRS in Stuttgart serving Bosch, Daimler, Porsche)

� Very professional: to their end-users, they appear almost as a set of Cloud services (AWS Definition: easy, secure, flexible, on demand, pay per use, self serve)

� But: no virtualization, semi-automatic, operating in step-function (mostly static) mode (increase of performance…

� That’s where they themselves can become a Cloud customer, adding to their portfolio dynamically scaling and adopting to changing business and user demands

Page 3: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 3

RI-222919

Grids

1998: The Grid: Blueprint for a New Computing Infrastructure:

“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.”

2002: The Anatomy of the Grid:

“. . . coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.”

2002: Grid Checklist:1) coordinates resources that are not subject to centralized control

…2) … using standard, open, general-purpose protocols and

interfaces3) … to deliver nontrivial qualities of service.

Quotes: Ian Foster, Carl Kesselman, Steve Tuecke

Page 4: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 4

RI-222919

Clouds

• IT resources provisioned outside of corporate data center

• Resources accessed over the internet

• Variable cost of services

• SaaS, PaaS, IaaS, HaaS

• A virtual computing environment

• Build and deliver always-on, pay-per-use IT services

• Near infinite-scale computing, storage, database, related Web services, AND users

• Scaling resources and services up and down

• Abstraction of the hardware from the service

• No need for on-premises software and servers

Page 5: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

RI-222919

www.deisa.eu

The DEISA Ecosystem forHPC Applications

Distributed European Infrastructure for

Supercomputing Applications

Page 6: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 6

RI-222919

DEISA1: May 1st, 2004 – April 30th, 2008

DEISA Project & Partners

DEISA2: May 1st, 2008 – April 30th, 2011

Page 7: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 7

RI-222919

Vision:Establishing persistent European HPC ecosystem integrating national Tier-1 (Tflop/s) centres and the new European Tier-0 (Pflop/s) centres

Mission:Enhance Europe’s capability in computing and science by integrating most powerful supercomputers into a European HPC e-infrastructure

Built European Supercomputing Service on top of existing national services, based on the deployment and operation of a persistent,production quality, distributed supercomputing environment with continental scope

Strategy:

• Consolidate the existing DEISA1 HPC infrastructure and services

• Deliver a turnkey operational solution for the future persistent European HPC ecosystem

DEISA: Vision - Mission - Strategy

Page 8: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 8

RI-222919

Technologies

reque

sts

support

Applications

Operations

offer

spro

duct

requests

config

uratio

n

offers

service

offers technology

requests development

Categories of DEISA services

Page 9: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 9

RI-222919

DEISA

Sites

UnifiedUnified

AAAAAANetworkNetwork

connectivityconnectivity

DataData

transfer transfer

toolstools

Data stagingData staging

toolstools

JobJob

reroutingrerouting

SingleSingle

monitormonitor

systemsystem

CoCo--

reservationreservation

and coand co--

allocationallocation

WorkflowWorkflow

managemntmanagemnt

MultipleMultiple

ways toways to

accessaccess

CommonCommon

productionproduction

environmntenvironmnt

WANWAN

sharedshared

File systemFile system

Network

and

AAA

layers

Job manag.

layer and

monitor.

Presen-

tation

layer

Data

manag.

layer

DEISA Service Layers

Page 10: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 10

RI-222919

Gateway

CSC

Gateway

ECMWF

Gateway

FZJ

Gateway

IDRIS

Gateway

SARA

Gateway

LRZ

Gateway

HPCX

Gateway

HLRS

NJS CINECA IBM P5

IDB UUDB

Gateway

BSC

Gateway

CINECA NJS FZJ IBM

IDB UUDB

NJS RZG IBM

IDB UUDB

NJS ECMWF IBM P5

IDB UUDB

NJS CSC Cray XT4/5

IDB UUDB

NJS HPCX Cray XT4

IDB UUDB

NJS LRZ SGI ALTIX

IDB UUDB

NJS HLRS NEC SX8

IDB UUDB

CINECA user

LRZ user

job

job

NJS SARA IBM

IDB UUDB

NJS BSC IBM PPC

IDB UUDB

Gateway

RZG

NJSIDRIS IBM P6

IDB UUDB

AIXLL-MC

AIXLL

LINUXPBS Pro

Super-UXNQS II

GridFTP

LINUXMaui/Slurm

UNICOS/lcPBS Pro

LINUXLL

AIXLL-MC

AIXLL-MC

UNICOS/lcPBS Pro

AIXLL-MC

DEISA UNICORE Infrastructure

Page 11: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 11

RI-222919

AIXLL-MC

AIXLL

LINUXPBS Pro

Super-UXNQS II

GridFTP

UNICOS/lcPBS Pro

LINUXLL

AIX, LinuxLL-MC

AIX, LinuxLL-MC

IBM P5

IBM P6 & BlueGene/P

IBM P6 & BlueGene/P

IBM P6

Cray XT4/5

Cray XT4

SGI ALTIX

NEC SX8

IBM P5+ / P6IBM PPC

IBM P6 & BlueGene/P

UNICOS/lcPBS Pro

AIX, LinuxLL-MC

DEISA Global File System

LINUXMaui/Slurm

Global transparent file system based on the Multi-Cluster General Parallel File System(MC-GPFS of IBM)

Page 12: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 12

RI-222919

DEISA Life Sciences PortalNICE EnginFrame Cluster/Grid/Cloud Portal

Provides remote, interactive, transparent, and secure access to applications and data on your corporate Intranet or Internet,

or in the Cloud.

Interactive

Applications

Intranet Clients

Win LX

UXMac

Intranet Clients

Win LX

UXMac

Virtualized Data Center Clusters

Users

BatchApplications

Virtualized Storage

Cloud Portal

/ Gateway

Cloud Portal

/ Gateway

Administrators

Administrators

Users

Administrators

Administrators

Users

Sta

nd

ard

pro

toco

lsS

tan

da

rd p

roto

co

ls

Licenses

Users and administrators can access and control computing resources via an intuitive and standard Web interface

virtually anywhere using a standard Web browser.

Page 13: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 13

RI-222919

DEISA Extreme Computing Initiative(DECI)

• DECI launched in 2005: complex, demanding, innovative simulations requiring the exceptional capabilities of DEISA

• Multi-national proposals encouraged

• Proposals reviewed by national evaluation committees

• Projects chosen on the basis of innovation potential, scientific excellence, relevance criteria, and national priorities

• Most powerful HPC architectures for most challenging projects

• Most appropriate supercomputer architecture selected

Page 14: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

RI-222919

www.deisa.eu

Analyzing the Workload of an HPC/Grid Center

Is your scientific application ready for the Cloud ?

Page 15: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 15

RI-222919

A Closer Look at HPC Centers’ Load *

� Single, cpu-intensive, tightly-coupled, highly scalable computational engineering & science parallel jobs

� Single, cpu-intensive, weakly-scalable, computational engineering & science parallel jobs

� Capacity computing, throughput, parameter jobs

� Managing massive data sets, possibly geographically distributed

� Analysis and visualization of data sets

* Similar to the analysis of T.Sterling and D.Stark, LSU, in a recent HPCwire article

Page 16: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 16

RI-222919

Analysis 1: tasks supporting HPC

� Supporting heavy compute and data-intensive work, such as…

� data analysis and visualization which are suitable for the use of Cloud services

� Especially for SME’s, small groups, individual researchers, not having full set of specific software and hardware

� Cost of ownership (hardware, ISV licenses) may be high

� No need for local expertise (installing, tuning, maintaining software)

Page 17: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 17

RI-222919

Analysis 2: management of data sets

� Data-oriented science: data generation, acquisition, organization, correlation, archiving, mining, presentation

� Tertiary storage is difficult and expensive

� Especially distributed data sets are target for Cloud services

� Data integrity higher with cloud services providers, removes single point of failure (hurricanes, lightening strikes, floods)

� Challenge with mission-critical HPC: data security, national security, intellectual property protection, privacy

Page 18: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 18

RI-222919

Analysis 3: throughput computing

� Array of jobs, parameter studies, throughput job-streams

� Application loads of many sequential or slightly parallel application tasks

� Obviously very promising for Cloud computing

� Cloud services greatly enhance availability of resources and operational flexibility, improving efficiency, reducing cost of equipment and maintenance personnel

� Better focusing on the resources unique to the needs of the HPC applications not served by Clouds

� Challenge: workloads that are security or IP sensitive

Page 19: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 19

RI-222919

Analysis 4: weakly-scalable single jobs

� Weakly scalable because of fixed problem size (discrete volumes, finite elements, mesh points), or low degree of parallelization

� Users often demand hands-on access to the specifics of the physical machine

� Virtualization often precludes architecture-specific performance tuning essential to HPC: user productivity versus optimal performance of long-running jobs on Beowulf-type clusters and MPPs

� I/O bandwidth often needs to be well balanced with application needs (not assured by the abstraction of today’s Clouds)

� Even worse: checkpoint and restart

Page 20: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 20

RI-222919

Analysis 5: capability computing

� Big science, grand challenge applications, running hours, days or weeks on teraflop or petaflop systems, with potentially 106 cores and 1013 TB main memory

� Highly scalable, massively parallel, tightly coupled, optimally tuned applications

� Resilience through checkpoint / restart

� HPC systems: unique design, limited market: loss of economy of scale

Page 21: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 21

RI-222919

An HPC ChecklistWhen is your HPC app ready for the Cloud ?

� If there are no issues with licenses, IP, secrecy, sensitive data, privacy, legal or regulatory issues, . . .

� If your app is (almost) architecture independent, not optimized for specific architecture (i.e. single process, loosely-coupled low-level parallel, I/O-robust)

� If it’s just one app and zillions of parameters

� If latency and bandwidth are not an issue

� If time (wait, wall, run) doesn’t really matter

� If your job is low-priority, simple SLAs, can re-run, . . .

Ideally, your HPC Center’s meta-scheduler knows all this and schedules automatically ☺☺☺☺

Page 22: Extreme Computing on the Distributed European Infrastructure for Supercomputing …Gentzsch.pdf · RI-222919 Extreme Computing on the Distributed European Infrastructure for Supercomputing

OGF25, March 2 - 6, 2009 Wolfgang Gentzsch, DEISA 22

RI-222919

Conclusion

DEISA Initiative is successful - because:

• Built on top of proven, professional infrastructure of HPC centers with expertise in implementation, operation, services respecting user need.

• Moderately and evolutionary enhancing existing HPC services - from local to global - according to user requirements: revolution by evolution.

• Supports user at level of user-friendly access to resources AND at level of application supporting users porting their apps onto turnkey architecture.

• Ecosystem of resources, middleware, applications is respecting administrative, cultural and political autonomy of partners/centres.

• Real chance that DEISA ecosystem will continue to operate successfully in a sustainable way after EU funding, in the interest of the ‘global scientist’,

(almost) as an HPC Cloud !