iu teragrid gateway support marlon pierce community grids lab indiana university

19
IU TeraGrid Gateway Support Marlon Pierce Community Grids Lab Indiana University

Upload: dinah-potter

Post on 31-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

IU TeraGrid Gateway Support

Marlon Pierce

Community Grids Lab

Indiana University

Personnel

• Marlon Pierce: project leader• Sangmi Lee Pallickara: senior developer

– Lead on VLAB and File Agent Service development

• Yu “Marie” Ma: senior developer– Lead on CIMA support

• Rishi Verma: student intern– Software release and testing technician

Team Strategy

• Provide general purpose gateway support through software delivered through the NSF-funded Open Grid Computing Environments project.

• Provide short term development support to new TeraGrid gateway projects to help them integrate with resources. – IU’s CIMA instrument project (R. McMullen)– Minnesota’s VLAB project (R. Wentzcovitch)– IU School of Medicine’s Docking and Scoring Portal (S.

Meroueh – IU/ECSU Polar Grid project

CIMA Project Overview

• Common Instrument Middleware Architecture project– NSF funded project to develop software and portals for instruments.– Dr. Rick McMullen PI

• Flagship project: crystallography portal and services for collecting and archiving real time crystallography data. – Gateway for data collected at 10 crystallography labs in the US,

UK, and Australia.

• Problems: – Much of the collected data is private and should only be accessed

by the owners. – Data must be stored on large, highly available file systems.– Services must be highly reliable.

Gateway Team Support for CIMA (Y. Ma)

• Existing CIMA project was converted into a TeraGrid gateway.• CIMA now provides secure access to CIMA archives

– Using IU Data Capacitor for storage

– Security through GridFTP and a TeraGrid community credential.

• Marie Ma also led the development work for CIMA High Availability testing.– SC07 demo

• Future Work: – Support follow-on NSF funded Crystal Grid project.

– Use CIMA as a test case to explore virtual hosting and other data grid strategies.

Users get gateway credentials with normal login.

Experiments are grouped into samples with private access.

Sample data (images, metadata) are securely retrieved from TeraGrid storage.

High Availability CIMA

• Prototypes fail-over services and portals for TeraGrid gateways.• Demonstrated resilient services for multiple scenarios:

– Application (Web service) failures.– Operating system failures– Partial and complete network failures– WAN file system failures

Future CIMA/Crystal Grid Support

• IU is setting up a virtual hosting environment for Gateways and TeraGrid Web services.– Dave Hancock will describe this in an

upcoming talk.

• We are prototyping this for CIMA.– Provide the Gateway perspective.

– Dave will provide the integrator perspective/

VLAB Project Overview

• U-Minn VLAB project is an NSF ITR funded project for investigating properties of planetary materials under extreme conditions.– Prof. Renata Wenztcovitch, PI

• Very computationally intense (determinng phase diagrams of materials)– Potentially 1000’s of jobs medium-large parallel jobs.

• VLAB also develops services and portals for managing the complicated runs.

• Problem: existing VLAB services for task management needed to be integrated with the TeraGrid.– The service also needed to be more easily extensible to many

different scheduling/queuing systems.

Gateway Team Support for VLAB (S. Pallickara)

• Modified VLAB’s Task Executor Web Service to work with TeraGrid GRAM servers.– New Task Executor code built around Condor-G and Condor

Birdbath Web Service Java language clients.– Testing with both serial and parallel versions of VLAB

workhorse code (“PWSCF”) on TACC’s Lonestar, NCSA’s various metal machines, ORNL’s cluster.

• This code also formed the basis of support for the NASA-funded QuakeSim project and will be packaged and released for general use.

• Next step: integrate more complicated of VLAB’s major codes (“Phonon”).

Project ExecutorProject Executor

PortalPortal

Project InteractionProject Interaction

Task DispatcherTask Dispatcher

Task ExecutorTask Executor Task ExecutorTask ExecutorTask ExecutorTask Executor

Databases(Metadata, Session registry, etc.)

Databases(Metadata, Session registry, etc.)

Auxiliary Services (Phonon Input prep, High T post processing, etc.)

Auxiliary Services (Phonon Input prep, High T post processing, etc.)

TeraGrid Task InterfaceTeraGrid Task Interface TeraGrid Task InterfaceTeraGrid Task Interface TeraGrid Task InterfaceTeraGrid Task Interface

Lonestar(TACC)Dell PowerEdge

Linux Cluster5840 CPUs

62.6 Peak TFlops

Lonestar(TACC)Dell PowerEdge

Linux Cluster5840 CPUs

62.6 Peak TFlops

Tungsten(NCSA)Dell Xeon IA32

Linux Cluster2560 CPUs

16.38 Peak TFlops

Tungsten(NCSA)Dell Xeon IA32

Linux Cluster2560 CPUs

16.38 Peak TFlops

Cobalt(NCSA)SGI Altix

1024 CPUs6.55 Peak TFlops

Cobalt(NCSA)SGI Altix

1024 CPUs6.55 Peak TFlops

NSTG(ORNL)IBM IA-32

0.34 Peak TFlops

NSTG(ORNL)IBM IA-32

0.34 Peak TFlops

TeraGrid Information

Service

TeraGrid Information

Service

GRAM JobManager

GRAM JobManager

LSFBatch system

LSFBatch system

GRAM JobManager

GRAM JobManager

PBS Batch system

PBS Batch system

GRAM JobManager

GRAM JobManager

PBS Batch system

PBS Batch system

GRAM JobManager

GRAM JobManager

LSFBatch system

LSFBatch system

Task ExecutorTask ExecutorTeraGrid Task InterfaceUsing Condor Birdbath

WebService API

TeraGrid Task InterfaceUsing Condor Birdbath

WebService API

Lonestar(TACC)Dell PowerEdge

Linux Cluster5840 CPUs

62.6 Peak TFlops

Lonestar(TACC)Dell PowerEdge

Linux Cluster5840 CPUs

62.6 Peak TFlops

Tungsten(NCSA)Dell Xeon IA32

Linux Cluster2560 CPUs

16.38 Peak TFlops

Tungsten(NCSA)Dell Xeon IA32

Linux Cluster2560 CPUs

16.38 Peak TFlops

Cobalt(NCSA)SGI Altix

1024 CPUs6.55 Peak TFlops

Cobalt(NCSA)SGI Altix

1024 CPUs6.55 Peak TFlops

NSTG(ORNL)IBM IA-32

0.34 Peak TFlops

NSTG(ORNL)IBM IA-32

0.34 Peak TFlops

Condor G Job Submission

Condor G Job Submission

•Modified VLAB’s Task Executor Web Service to work with TeraGrid GRAM servers.

- New Task Executor code built around Condor-G and Condor Birdbath Web Service Java language clients.

- Testing with both serial and parallel versions of VLAB workhorse code (“PWSCF”) on TACC’s Lonestar, NCSA’s various metal machines, ORNL’s cluster.

Lessons from the VLAB Job Submission Example

• For the VLAB application, multiple input files and multiple output files were required to be transferred between the Teragrid clusters and the TaskExecutor service.

• Using CondorG provided us a reasonably unified mechanism. However, each of the TeraGrid clusters provides a batch system, which requires different setups for the executables.

• Some of the system environments were not setup properly– Scripts generated by jobmanager-lsf on Lonestar, for example, override

custom $PATH

• Tackling each of these problems were not trivial, but we did get enthusiastic support from all the TeraGrid sites that we dealt with.

Scoring and Docking Gateway• Users develop

scoring functions for the ability of drug-like molecules to dock to proteins.

• Then need Quantum Chemistry techniques to refine technique.– AMBER

• We are adapting our Condor-G based Web services to build an AMBER Grid Service. Samy Meroueh, IU School of

Medicine

General Purpose Gateway Software (S. Pallickara)

• TeraGrid community credentials are used with GridFTP to access community archives.– Ex: Data Capacitor, HPSS mass storage

• Problem: We need a way to enforce additional community restrictions on these files.– Users should have restricted file spaces.

• Solution: express and enforce access restrictions to community files through the Web gateway

• Software: File Agent Service and updated File Manager portlet developed and released through the OGCE web site.– Targeted for the DC and HPSS

Portlet (modified from OGCE code base) allows file system views of DC, HPSS, and other GridFTP accessible resources.

Portlet enforces additional restrictions on community users to keep their data separate and private from other users.

PolarGrid: Microformats,KML, and GeoRSS feeds used to deliver SAR data to multiple clients.

PolarGrid: Microformats,KML, and GeoRSS feeds used to deliver SAR data to multiple clients.

Out of Scope Items

• We do not currently develop, deploy, or maintain general purpose services for TG resource providers.– TG Information Services (J. P. Navarro) and TG User Portal

do this.– We do collaborate with these groups through the OGCE

project.– This could change if we have clear requirements for this.

• We rely on existing resource provider infrastructure such as Globus GRAM and GridFTP.– We don’t install or maintain these.

Project Blogs

• Get a snapshot of what we are working on:– Sangmi: http://sangpall.blogspot.com/ – Marie: http://tethealla.blogspot.com/ – Rishi: http://gridportal-lab.blogspot.com/– Marlon:

http://communitygrids.blogspot.com/