science gateways on the teragrid nancy wilkins-diehr area director for science gateways san diego...

16
Science Gateways on the TeraGrid Nancy Wilkins-Diehr Area Director for Science Gateways San Diego Supercomputer Center [email protected]

Upload: hilary-stevenson

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Science Gateways on the TeraGrid

Nancy Wilkins-Diehr

Area Director for Science GatewaysSan Diego Supercomputer Center

[email protected]

Today’s Outline

• What are Gateways?• Why TeraGrid and Gateways?• Initial Strategy• Implementation Details

– Issues to address when using TG

• Future growth

Gateways are part of TG’s 3-pronged strategy to further science

• DEEP Science: Enabling Terascale Science– Make science more productive

through an integrated set of very-high capability resources

• ASTA projects

• WIDE Impact: Empowering Communities– Bring TeraGrid capabilities to the

broad science community• Science Gateways

• OPEN Infrastructure, OPEN Partnership– Provide a coordinated, general

purpose, reliable set of services and resources

• Grid interoperability working group

Science GatewaysA new initiative for the TeraGrid

• Increasing investment by communities in their own cyberinfrastructure, but heterogeneous:

• Resources• Users – from expert to K-12• Software stacks, policies

• Science Gateways– Provide “TeraGrid Inside”

capabilities– Leverage community investment

• Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

Workflow Composer

Initial Focus on 10 Gateways

National Virtual ObservatoryFacilitating Scientific Discovery

• Astronomy is increasingly a data-rich science

• New science enabled by enhancing access to data and computing resources

• Ease of use in locating, retrieving, and analyzing data from archives and catalogs worldwide

• NVO is a set of tools used to exploit the data avalanche

NanoHUB Middleware infrastructure

Campus Grids

Purdue, GLOW

Grid

Capability Computing

Science Gateway

Workspaces

Research apps

Virtual backends

Virtual Cluster with VIOLIN

VM

Capacity Computing

nanoHUB VO

Middleware

spruce.teragrid.orgSpecial Priority and Urgent Computing Environment

Linked Environments for Atmospheric DiscoveryLEAD

•Providing tools that are needed to make accurate predictions of tornados and hurricanes•Data exploration and Grid workflow

Gateways are growing in numbers

• 10 initial projects as part of TG proposal• >20 Gateway projects today• No limit on how many gateways can use TG resources

– Prepare services and documentation so developers can work independently

• Open Science Grid (OSG)• Special PRiority and Urgent Computing Environment

(SPRUCE)• National Virtual Observatory (NVO)• Linked Environments for Atmospheric Discovery

(LEAD)• Computational Chemistry Grid (GridChem)• Computational Science and Engineering Online (CSE-

Online)• GEON(GEOsciences Network)• Network for Earthquake Engineering Simulation (NEES)• SCEC Earthworks Project• Network for Computational Nanotechnology and

nanoHUB• GIScience Gateway (GISolve)• Biology and Biomedicine Science Gateway• Open Life Sciences Gateway• The Telescience Project• Grid Analysis Environment (GAE)• Neutron Science Instrument Gateway• TeraGrid Visualization Gateway, ANL• BIRN• Gridblast Bioinformatics Gateway• Earth Systems Grid• Cornell

• Many others interested– SID Grid– HASTAC

NCAR Earth System Grid

• ESG originally a distributed data management/access system but it has evolved into more.

• User registration, authorization controls, and metrics tracking

• CCSM model source, initialization datasets, post-processing codes, and analysis and visualization tools.

• Prototypes of model- submission environments, eventually real-time tracking of model status along with references to available output datasets.

• "science gateway" for climate research.

• Expect to see more model runs at higher- resolution and with greater component scope.

So how will we meet all these needs?

• With RATS! (Requirements Analysis Teams)

• Collection, analysis and consolidation of requirements to jump start the work– Interviews with 10 Gateways– Common user models,

accounting needs, scheduling needs

• Summarized requirements for each TeraGrid working group– Accounting, Security, Web

Services, Software

• Areas for more study identified• Primer outline for new Gateways

in progress

• And milestones

Implications for TeraGrid working groups

• Accounting– Support for accounts with differing

capabilities– Ability to associate compute job to a

individual portal user– Scheme for portal registration and

usage tracking– Support for OSG’s Grid User

Management System (GUMS)– Dynamic accounts

• Security– Community account privileges– Need to identify human responsible

for a job for incident response– Acceptance of other grid certificates– TG-hosted web servers, cgi-bin code

• Web Services – Initial analysis completed 12/05– Some Gateways (LEAD, Open Life

Sciences) have immediate needs– Many will build on capabilities offered

by GT4, but interoperability could be an issue

– Web Service security– Interfaces to scheduling and account

management are common requirements

• Software– Interoperability of software stacks

between TG and peer grids– Software installations for gateways

across all TG sites– Community software areas– Management (pacman, other options)

Gateway Web Services Needs

• Interfaces provided by the TeraGridThe list of services that have been identified by the gateways developers includes:

– Resource Status Service (both polling and pub/sub) – Job Submission Interface

• The gateways expect this to be provided by WS-GRAM – Job Tracking Interface (Both polling and pub/sub) – File/Data Staging Interface – Retrieve Usage Information – Retrieve Inca Info – Advanced Reservation Interface – Cross-site Run interface– Pushing DN to an RP interface

• Interfaces provided by the GatewaysThe list of services that have been identified by the gateways developers and the TeraGrid Security group includes:

– Retrieve user information for a job – Retrieve accounting information/statistics – Provides the necessary means to track down problem job submissions, identify malicious users, and tabulate

accounting and logging information for reporting needs by the RPs. It is expected that the information provided for the first interface is simply the (resource, job id) that is known by both parties at job submission time. This interface provides sufficient user information for the RPs to deal with the situation at hand, and possibly identifies another interface that should be provided by the gateways:

– Don't submit jobs from the user who submitted job (resource, job id), until we say it's Ok. – The accounting interface requires no information, but returns sufficient accounting information and statistics to

report to funding agencies, program managers, etc.

Gateway primer and “getting started” documentation by end of summer

1. Introduction2. Science Gateway in Context

a. Science Gateway (SGW) Definition(s) b. Science Gateway user modes c. Distinction between SGW and other TeraGrid

user modes 3. Components of a Science Gateway

a. User Model b. Gateway targeted community c. Gateway Services d. Integration with TeraGrid external resources

(data collections, services, …) e. Organizational and administrative structure

4. TeraGrid services and policies available for Science Gateways

a. Portal middleware tools (user portal and other portal tools)

b. Account Management (user models, community accounts, )

c. Security environment (security models) d. Web Services e. Scheduling services (and meta-scheduling) f. Community accounts and allocations g. Community Software Areas h. All traditional TeraGrid services and resources i. Ability to propose additional services and how

that would interact with TeraGrid operations

5. Responsibilities and Requirements for Science Gateways

a. Interaction with and compatibility with TeraGrid communities

b. Control procedures i. Community user identification and

tracking (map TeraGrid usage to Portal user)

ii. Use monitoring and reporting iii. Security and trust iv. Appropriate use

6. How to get started a. Existing resources

i. Publication references ii. Web areas with more details iii. Online tutorials iv. Upcoming presentations and tutorials

b. Who to contact for initial discussions c. How to propose a new Gateway d. How to integrate with TeraGrid Gateways

efforts. e. How to obtain a resource allocation

Want to be involved?

[email protected] mailing list– Email [email protected]– <subscribe gateways> in body

• Biweekly telecons to get advice from others. Current focus– Auditing strategy– Mini-tutorial at April Lariat workshop, “Accelerating Research Through Grid

Computing”– Hands on tutorial at June conference

• Overview of Gateways• In depth presentations by LEAD, nanoHUB, RENCI, GIScience

– Transition to GT4– Scheduling requirements

• As original gateways move into production, we will be able to provide short term support to new projects that would benefit from utilizing TeraGrid resources

• www.teragrid.org

• Nancy Wilkins-Diehr, [email protected]