gce06, tampa, fl november 12-13, 2006 science gateways on the teragrid charlie catlett, sebastien...

34
GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton, Kevin J. Price, Anurag Shankar, Von Welch, Nancy Wilkins-Diehr

Upload: jason-heath

Post on 25-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Science Gateways on the TeraGrid

Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton, Kevin J. Price, Anurag Shankar, Von

Welch, Nancy Wilkins-Diehr

Page 2: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Today’s Topics

• TeraGrid Background – 5 min.• Gateway integration issues – 15 min.

– Accounting•GRAM

– Security•Commsh•Attribute-based authentication

– Metrics

• Future work – 10 min.– Gateway primer– Best practices

Page 3: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

What is the TeraGrid?

• NSF-funded facility to offer high end compute, data and visualization resources to the nation’s academic researchers

Page 4: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

TeraGrid Technology

Data

18.8 Petabytes StorageMemory Intensive

Resources

Computation Visualization

100+ Teraflops Computation

40gigabit/second cross-country network

Page 5: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Over 100 Tflops in Computing Power

55

20.4

15.611

10.23

10.23

10

6.6

6.55

6

5.7345.7 3.12.2 20.610.340.310.17

TACC Lonestar

IU Big Red

SDSC DataStar

Purdue Radon

NCSA Mercury

NCSA Tungsten

PSC BigBen

Purdue Lear

NCSA Cobalt

PSC Lemieux

NCAR Frost

SDSC BlueGene

SDSC IA64

IU IA-32

NCSA Copper

UC ANL IA64

ORNL IA32

PSC Rachel

IU Tiger

Page 6: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

TeraGrid Resources Available to Academic Researchers at No Cost

• TeraGrid creates integrated, persistent, and pioneering computational resources that significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems.

• Proposal-based access, researchers can use resources at no cost– Collaborative opportunities, but

Principal Investigators must be from the U.S.

Page 7: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Gateways are part of TeraGrid’s 3-pronged strategy to further science

• DEEP Science: Enabling Terascale Science– Make science more productive

through an integrated set of very-high capability resources

• Advanced Support for TeraGrid Applications (ASTA) projects

• WIDE Impact: Empowering Communities– Bring TeraGrid capabilities to the

broad science community• Science Gateways

• OPEN Infrastructure, OPEN Partnership– Provide a coordinated, general

purpose, reliable set of services and resources

• Grid interoperability working group

Page 8: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Science GatewaysA new initiative for the TeraGrid

• Increasing investment by communities in their own cyberinfrastructure, but heterogeneous:

• Resources• Users – from expert to K-12• Software stacks, policies

• Science Gateways– Provide “TeraGrid Inside”

capabilities– Leverage community investment

• Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

Workflow Composer

Page 9: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Gateways are growing in numbers

• 10 initial projects as part of TG proposal• >20 Gateway projects today• No limit on how many gateways can use TG resources

– Prepare services and documentation so developers can work independently

• Open Science Grid (OSG)• Special PRiority and Urgent Computing Environment

(SPRUCE)• National Virtual Observatory (NVO)• Linked Environments for Atmospheric Discovery

(LEAD)• Computational Chemistry Grid (GridChem)• Computational Science and Engineering Online (CSE-

Online)• GEON(GEOsciences Network)• Network for Earthquake Engineering Simulation (NEES)• SCEC Earthworks Project• Network for Computational Nanotechnology and

nanoHUB• GIScience Gateway (GISolve)• Biology and Biomedicine Science Gateway• Open Life Sciences Gateway• The Telescience Project• Grid Analysis Environment (GAE)• Neutron Science Instrument Gateway• TeraGrid Visualization Gateway, ANL• BIRN• Gridblast Bioinformatics Gateway• Earth Systems Grid• Astrophysical Data Repository (Cornell)

• Many others interested– SID Grid– HASTAC

Page 10: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

• TeraGrid Background – 5 min.• Gateway integration issues – 15 min.

– Accounting•GRAM

– Security•Commsh•Attribute-based authentication

– Metrics

• Future work – 10 min.– Gateway primer– Best practices

Page 11: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

What Did We Learn About Common Gateway Requirements?

• Accounting– Support for accounts with

differing capabilities– Ability to associate compute job

to a individual portal user– Scheme for portal registration and

usage tracking– Dynamic accounts

• Security– Community account privileges– Need to identify human

responsible for a job for incident response

– Acceptance of other grid certificates

• Web Services – Many will build on the Globus

Toolkit, but additional interfaces may be needed

– Web Service security– Interfaces to scheduling and

account management are common requirements

• Software– Interoperability of software stacks

between TeraGrid and peer grids– Software installations for

gateways across all TG sites– Community software areas– Management (pacman, other

options)

Page 12: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

In Today’s Talk

• Per job accounting• Secured community accounts• Federated identity management• Metrics for Success• Futures

– Primer– Gateway Best Practices

Page 13: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

“Per Job” Accounting is Key Functionality for Gateways

• Common gateway structure– Web front end, users log on to gateway– Jobs run as single user on TeraGrid– Need to tie usage to individual users

• Globus used by many gateway developers to access TeraGrid resources

• GRAM operates in a fire and forget mode– When a job finishes there is no straightforward way to

determine how many CPU hours the job consumed– That information is critical to attributing usage to

individual users using a Science Gateway account on the TeraGrid

Page 14: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

GRAM Audit Extension Provides Need Accountability

• GRAM2 and GRAM4 services were enhanced to create audit records that are written to a database local to the GRAM services

• Enhancements provide a persistent link between the grid service’s job id and the local resource manager’s (LRM) job id

• Open Grid Services Architecture-Data Access and Integration (OGSA-DAI) provide a service interface for TeraGrid’s audit and accounting information

Page 15: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Individual Usage Tracking Now Possible

• Gateways can remotely submit jobs to TeraGrid and– Account for usage on a per job basis without needing to

understand the details of the various local resource managers chosen by TeraGrid resource providers.

• Capability will be very useful for other projects using Globus where per-job usage information is needed.

• Enhancements reduce the complexity for gateways to interface with TeraGrid’s computational resources– Allows TeraGrid to simultaneously support an increasing

number of gateways.

Page 16: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Linked Environments for Atmospheric Discovery•Providing tools that are needed to make accurate predictions of tornados and hurricanes

•Meteorological data•Forecast models•Analysis and visualization tools

•Data exploration and Grid workflow

Page 17: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Securing Community Accounts

• Additional risks arise when providing community account and web interfaces to high performance resources.

• TeraGrid security working group analyzing risks developing mitigation approaches. – Sites may take independent approaches to risk mitigation

• One approach being developed at NCSA is the Community Shell, or Commsh

• Commsh allows for two methods of account restriction: – a configuration file is created that defines which commands (or sets

of commands) a given account can execute. – commands can be specified using wildcards and regular expressions

for flexibility– change-root (or chroot) jailing. Change-root jailing effectively creates

a filesystem-based "sandbox" for the account, only allowing commands to be executed from within this sandbox

Page 18: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Federated Identity Management

• Traditionally each resource or resource-providing site was responsible for the management of their users identities

• The science gateway model brought an out-sourcing of identity management from the resource to the gateway

• For maximum scalability, the goal is to shift identity management all the way back to the user’s home institution and leverage the existing identity management infrastructure

• Mechanisms to achieve this based on Shibboleth, GridShib, myVocs are other technologies are currently being evaluated by TeraGrid

Page 19: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Individual User Environment

ResourceTGCDBuid

uiduid project

(G)Id(G)Id(G)Id

Grant Process

Use cases:Traditional users, Development

O(10)

O(1)O(10)

O(1000)O(1000)

O(1)

Page 20: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Authenticated User Environment

ResourceTGCDB

project

(G)Id(G)Id(G)Id

Grant Process

Use cases:Grid-savvy user communities, Production runs, user managed services

uidO(10) ?

O(10)

O(1)

O(1)

O(10)

O(1)

Page 21: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Gateway

Gateway Environment

ResourceTGCDB

project

Grant Process

Use cases:Large communities of users, novice users, public

uiduid

GId

ComId

O(1) ?

O(10)

O(10)O(1)

O(10)

O(1) ?

O(1)O(100) ?

O(1000)

O(100)O(1000)

O(100)

O(100-106) ?

O(1)

Page 22: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Community Gateway Accounts

• Shift authentication and authorization from RP to the Science Gateway

• Whole community then appears as “one” user to the RP in terms of authorization– One grid-mapfile and /etc/password entry

•or perhaps (a mapped set of) virtual machine images

– Except accounting and troubleshooting. We still need an individual identifier

Page 23: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

The Proposal

• Plan for a world where users can be authenticated via their home campus identity management system

• Enable attribute-based authorization of users by RP site– Allow for user authentication with authorization by community

• Prototype system in testbed, with involvement of interested parties to work out issues

• All usage still billed to an allocation– Community or individual

Page 24: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Metrics

• Metrics of success are commonly requested for government funded programs

• Successful gateway design will allow principal investigators to highlight gateway usage as well as science accomplishments due to the gateway – Some gateways may set up a mechanism for researchers

to cite the use of the gateway in publications

• Success both in funding the gateway and in requesting TeraGrid resources can be traced to scientific accomplishments and a history of publications

Page 25: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Sample Metrics Collected by ESG

• The DOE-sponsored Earth System Grid (ESG) project includes a Metrics Service that tracks – logins– file and aggregation downloads– browse and search requests– total volume of activity conducted via its portal.

• This information is very useful to principal investigators and sponsors in terms of determining the overall impact of the project

• As ESG begins to utilize TeraGrid resources, it will need to track computational and data services that are delivered to it as a Science Gateway

Page 26: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

NCAR Earth System Grid

• Science Gateway for climate research

– Enabling analysis and understanding gained from global Earth System computational models

• ESG originally a distributed data management/access system but it has evolved into more.

• User registration, authorization controls, and metrics tracking

• CCSM model source, initialization datasets, post-processing codes, and analysis and visualization tools.

• Prototypes of model- submission environments

– Eventually real-time tracking of model status along with references to available output datasets.

• Expect to see more model runs at higher- resolution and with greater component scope.

Page 27: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Page 28: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

• TeraGrid Background – 5 min.• Gateway integration issues – 15 min.

– Accounting•GRAM

– Security•Commsh•Attribute-based authentication

– Metrics

• Future work – 10 min.– Gateway primer– Best practices

Page 29: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Science Gateway Primer

• Primer components– TeraGrid resources and services available to Science

Gateways– Requirements for using TeraGrid resources– Best practices when designing a gateway– Software contribution area

• Wiki-based– Very dynamic development community– Counting on them for contributions– http://www.teragridforum.org/mediawiki

Page 30: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

TeraGrid Resources and Services

• Compute, data and visualization resources• Software

– Common TeraGrid Software Stack (CTSS)– Third party applications on a variety of platforms– Community Software Area for user-maintained software– Software packaging and distribution mechanisms

• Accounting services – developer accounts– community accounts– in the future, dynamic accounts

• External relations staff available to help publicize successes

Page 31: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Gateway Requirements

• Additional information to be provided when requesting community accounts– IP address of portal– Data and compute expectations

• Recommended audit trails for usage tracking• Mechanisms to restrict problem jobs

Page 32: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Lots of Best Practices!Thanks Anurag

• Planning• Assess If Gateway-ing Adds Any Value• Create a Precise List of Requirements the GW

Must Meet• Plan for the Long Term (= GW Lifetime)• Design• Use Formal Design Principles• Involve Users in the Design• Use Mockups to Perform Usability Testing• Design a Focused and Uncluttered UI• Implementation• Choose Technologies Based on

Resources/Time• Hire/Use Developers with UI Experience• Develop in Stages• Use Reusable Components• Operation• Monitor Gateway Components 24x7• Institute Help Desk ProcessesMonitor &

Implement New Technologies• Keep Content Current/Relevant

• Desirable Gateway Characteristics• Universal, Secure Access• Ability to Personalize• Based on Open Standards (JSR 168/286,

OGSA, etc.)• Use of Modular, Reusable Design (Use

Portlets)• Use of Technologies With a Rich

API/Abstraction Layer• Platform Independence (Web, Java, XML, etc.)• Rapid Development Capability• Ease of Integration into Existing Infrastructure• Availability of Workflows• Use of Commodity Software• Airtight Security• Extensibility• Maintainability• Scalability• Extensive Help and documentation

Page 33: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Would development of a gateway help your research?

• Think about your current bottlenecks– What would you like to explore if only you had

• Lots of disk• Lots of compute resources• Powerful analysis capabilities• A nice interface to information

Page 34: GCE06, Tampa, FL November 12-13, 2006 Science Gateways on the TeraGrid Charlie Catlett, Sebastien Goasguen, Jim Marsteller, Stuart Martin, Don Middleton,

GCE06, Tampa, FLNovember 12-13, 2006

Gateways in Tampa

• Gateway BOF 11/14, 5:30pm rooms 22-23• Gateway talks at many TG booths

– http://www.teragrid.org/eot/sc06.html

• www.teragrid.org• Nancy Wilkins-Diehr, [email protected]