ian foster on behalf of the globus alliance computation institute argonne national lab &...

39
Ian Foster on behalf of the Globus Alliance Computation Institute Argonne National Lab & University of Chicago Globus: State of the Union

Upload: randolf-long

Post on 30-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Ian Fosteron behalf of the Globus Alliance

Computation Institute

Argonne National Lab & University of Chicago

Globus:State of the Union

2

What’s New with Globus?

Globus applications are larger-scaleand more mission critical

Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)

Globus core software is more robust, functional, performant, and easy to use

Globus community is increasingly diverse and international

3

Why Grid (and Globus)? —The Changing Nature of Work

IT must adapt to this new realityIT must adapt to this new reality

Collaborative and DynamicCollaborative and Dynamic

Project focused, globally distributed teams, spanning

organizations within and beyond enterprise boundaries

Project focused, globally distributed teams, spanning

organizations within and beyond enterprise boundaries

Distributed and HeterogeneousDistributed and Heterogeneous

Each team member/group brings own data, compute, & other resources into the project

Each team member/group brings own data, compute, & other resources into the project

Data & Computation Intensive

Data & Computation Intensive

Access to computing and data resources must be coordinated

across the collaboration

Access to computing and data resources must be coordinated

across the collaboration

Concurrent Innovation Cycles

Concurrent Innovation Cycles

Resources must be available to projects with strong QoS, & also

reflect system-wide priorities

Resources must be available to projects with strong QoS, & also

reflect system-wide priorities

4

Bridging the Application-Resource Gap

IBM

IBM

Uniform interfaces,security mechanisms,Web service transport,

monitoring

Computers StorageSpecialized resources

User App

GRAM GridFTPHost EnvUser Svc

DAIS

ToolTool

Workflow

Credent.

Host EnvUser Svc

Registry

5

Genome Analysis &DB Update (GADU)

600-1000+ CPUs

First-Generation

Grids

6

Drug Discovery:In Silico Screening

2M+ ligandsProtein x target(s)

(Mike Kubal, Benoit Roux, and others)

7

start

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues, #MDsteps)

Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript

end

BuildNABScript

NABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

Select best ~5KSelect best ~5K

For 1 target:4 million tasks

500,000 cpu-hrs(50 cpu-years)

8

Second-Generation Grids:Service-Oriented Science

People create services (data or functions) …

which I discover (& decide whether to trust) …

& compose to create a new function ...

and then publish as a new service.

I find “someone else” to host services, so I don’t have to become an expert in operating services & computers!

I hope that this “someone else” can manage security, reliability, scalability, …

!!

9

caBIG: sharing of infrastructure, applications, and data.

DataIntegration!

NIH’s Cancer Biomedical Informatics Grid (caBIG)

10

Microarray

NCICB

ResearchCenter

Gene Databas

e

Grid-Enabled Client

ResearchCenter

Tool 1

Tool 2caArray

Protein Database

Tool 3

Tool 4

Grid Data Service

Analytical Service

Image

Tool 2

Tool 3

Grid Services Infrastructure(Metadata, Registry, Query,

Invocation, Security, etc.)

Grid Portal

caBIG Under the Covers

Main ESG PortalMain ESG Portal CMIP3 (IPCC AR4) ESG PortalCMIP3 (IPCC AR4) ESG Portal

198 TB of data at four locations 1,150 datasets 1,032,000 files Includes the past 6 years of joint

DOE/NSF climate modeling experiments

35 TB of data at one location 74,700 files Generated by a modeling campaign coordinated by the

Intergovernmental Panel on Climate Change Data from 13 countries, representing 25 models

8,000 registered users 1,900 registered projects

Downloads to date 49 TB 176,000 files

Downloads to date 387 TB 1,300,000 files 500 GB/day

(average)

400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data

Earth System Grid

ESG usage: over 500 sites worldwide

ESG monthly download volumes

12

Children’s Oncology Gridand MEDICUS

13

MEDICUS Under the Covers

DICOM images Send (publish) Query/Retrieve (discover)

Grid Archive Fault tolerant Bandwidth

Security Authentication Authorization Cryptography

Access Web portal

Applications Computing Data Mining

DICOM Grid Interface Service (DGIS)+

Meta Catalog Service (OGSA-DAI)

Data Replication Service (DRS)

Grid Web Portal, OGCE / GridSphere

Globus Toolkit Release 4

GRAM, OGSA-DAI

X.509 Certificates +

MyProxy Delegation

14

Birmingham•

Data Replication Service

Replicating >1 Terabyte/day to 8 sites770 TB replicated to date: >120 million replicasMTBF = 1 month

LIGO Gravitational Wave Observatory

Cardiff

AEI/Golm

Ann Chervenak et al., ISI; Scott Koranda et al, LIGO

15

Lag Plot for Data Transfers to Caltech

Credit: Kevin Flasch, LIGO

16

What’s New with Globus?

Globus applications are larger-scale and more mission critical

Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)

Globus core software is more robust, functional, performant, and easy to use

Globus community is increasingly diverse and international

17

Creating Services in 2005

“This full-day tutorial provides an introduction to programming Java services with the latest version of the Globus Toolkit version 4 (GT4). The tutorial teaches how to build a Java Service that makes use of GT4 mechanisms for state management, security, registry and related topics.”

18

ApplnService

Create

Index service

StoreRepository ServiceAdvertize

Discover

Invoke;get results

Introduce

Container

Transfer GAR

Deploy

Ohio State University and Argonne/U.Chicago

Creating Services in 2008Introduce and gRAVI

Introduce Define service Create skeleton Discover types Add operations Configure security

Grid Remote Application Virtualization Infrastructure Wrap executables

19

Creating Services:E.g., Introduce Authoring Tool

Define service Create skeleton Discover types Add operations Configure security Modify service

targets GT4

Introduce: Hastings, Saltz, et al., Ohio State University

New GT4 servicescreated in

five minutes …

20

Metascheduling in 2005

“Writing software that dispatches jobs to many sites via GRAM interfaces is left as an exercise for the reader.”

21

SGE Cluster

Users

PBS Cluster LSF Cluster

GridWay

Globus Globus

Infrastructure

Applications

Middleware

• Multiple Admin. Domains• Multiple Organizations

•Multiple metaschedulers

•(V)Organization-wide policies

• DRMAA interface• Science Gateways

GridWay

Users

(Virtual)Organization

Globus

Architecture Examples

Metascheduling in 2008: GridWay

EGEE-II• gLite-LHC interoperability• Virtual Organizations

Fusion: Massive Ray TracingBiomed: CD-HIT (Worflow)

AstroGrid-D, German Astronomy Community Grid

• Supercomputing resources• Astronomy-specific resources• GRAM interface

22

What’s New with Globus?

Globus applications are larger-scale and more mission critical

Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)

Globus core software is more robust, functional, performant, and easy to use

Globus community is increasingly diverse and international

23

IncubatorProjects

Globus Software: dev.globus.org

SecurityExecution

MgmtInfo

ServicesCommonRuntime

Globus Projects

Other

MPICH G2

GridWay

Data Mgmt

IncubationMgmt

Cog WF

LRMA

GAARDS

OGROGDTE UGP

HOC-SAPURSE

GridShib

Introduce

Dyn Acct

WEEP

Gavia JSC

Gavia MS

DDM

Virt WkSp

SGGC

Metrics

ServMark

GridFTP

ReliableFile

Transfer

OGSA-DAI

GRAM

MDS4CAS

DataRepDelegation

ReplicaLocation

Java Runtime

C Runtime

Python Runtime

C Sec GT4 Docs

MEDICUS

GSI-OpenSSH

MyProxy

Swift MonMan

NetLogger

GEMLCA

GlobusToolkit

gRAVI

24

Some Recent Globus GridFTP Enhancements

Performance Dynamic data mover

management Small-files optimization

Ease of use SSH

authentication Robustness

Connection mgmt Space reservation

25

Clients

Clients

Clients

TeraGrid’s Information Systems Architecture

CacheCache

WS/RESTHTTP GET

WS/SOAP

WS MDS4

TomcatWebMDS

Apache 2.0

TeraGrid Central Services

TeraGrid Repositories

Partners

WS/SOAPWS MDS4

Resource Provider Services

26

Information Services Users User

DocumentationUser Portal

Database?Database?

Gateways

Peer Grids

User Applications

info.teragrid.org

Others

2727

GRAM Scalability:E.g., AstroGrid-D Performance

#1 as reported on Einstein@home top users http://einstein.phys.uwm.edu/top_users.php

28

What’s New with Globus?

Globus applications are larger-scale and more mission critical

Globus tools are increasingly sophisticated (e.g., GridWay, Introduce, OGSA-DAI, UniCluster, Workspaces)

Globus core software is more robust, functional, performant, and easy to use

Globus community is increasingly diverse and international

29

Globus.Org Visits Jan 1 to May 12, 2008

Not counting VDT, ChinaGrid gLite, DGrid, UniCluster, …

30

Globus.Org Visits May 12, 2008

Not counting VDT, ChinaGrid gLite, DGrid, UniCluster, …

31

32

Examples of Globus-BasedProduction Scientific Grids

APAC (Australia) China Grid China National Grid CROWN Grid DGrid (Germany) EGEE Open Science Grid Taiwan Grid TeraGrid ThaiGrid UK Natl Grid Service

33

http://dev.globus.org

Guidelines(Apache)

Infrastructure(CVS, email,

bugzilla, Wiki)

ProjectsInclude

dev.globus — Community Driven Improvement of Globus Software, NSF OCI

34

Selected Globus Content: Tuesday

Tuesday morning GT Java WS Core Authoring Services Using Introduce Grid Remote Application Virtualization Interface

(gRAVI) Tuesday afternoon

What's New in the Data Area? GridFTP and Cluster Meltdown: When No Means

'Maybe Later‘ Grid Information Management using MDS

35

Selected Globus Content: Wednesday Morning

GridWay: The Open Source Metascheduling Technology for Grid Computing

Using Taverna to Orchestrate Grid Services in a Workflow

MyProxy-based Short Lived Credential CA Service at NERSC

Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments

36

Selected Globus Content: Wednesday Afternoon

Globus Execution Services What's New in 4.0 and 4.2, Future Plans Virtual Machine Management Services Experiences with GRAM in the LEAD portal Swift & Falkon

Globus Security Update and Futures GAARDS Attribute-based Authorization with GridShib

Virtualization and Cloud Computing with Globus

37

Selected Globus Content:Thursday

Innovative Grid Applications Earth System Grid Southern California Earthquake Center MEDICUS and Children’s Oncology Grid

Globus Administration Tutorial

Porting Applications with Globus GridWay

Service Oriented Science Tutorial

38

Examples of What Globus Lets You Do

Build secure & stateful Web services Web Services core, service authoring tools

Configure distributed authorization structures Powerful standards-based security tools

Deploy services/run jobs on remote systems GRAM, virtual workspace, dynamic services

Move data fast & reliably among many sites Globus data services

Discover and monitor services & resources Globus information service

39

Thanks!

DOE Office of Science

NSF Office of Cyberinfrastructure

Colleagues worldwide

Participants in Globus, CEDPS, ESG, OSG, caBIG, TeraGrid, and other projects