infso-ri-031688 enabling grids for e-science grid applications & grid services c. loomis...

37
INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org Grid Applications & Grid Services C. Loomis (LAL-Orsay) EMBRACE-3DEM (Madrid) 23 February 2007

Upload: oscar-parker

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

Grid Applications &Grid Services

C. Loomis (LAL-Orsay)

EMBRACE-3DEM (Madrid)

23 February 2007

Grid Apps. – C. Loomis – 11 November 2006 2

Enabling Grids for E-sciencE

INFSO-RI-031688

Contents

• Introduction– EGEE project history– Usage and users

• Grid Application Families• Grid Software & Services• Summary

Grid Apps. – C. Loomis – 11 November 2006 3

Enabling Grids for E-sciencE

INFSO-RI-031688

Evolution

• EGEE: Enabling Grids for E-sciencE– Two-year project funded by European Commission.– Provides computing infrastructure for e-science.

• Evolution of Project (2001–now):– European DataGrid: R&D– EGEE: Re-engineering & Infrastructure– EGEE-II: Infrastructure & Re-engineering– EGEE-III: Same focus, in preparation

• Evolution of Grid Users:– Focus: Grid technology Scientific results– Goal: Grid technology Grid as a tool– Experience: IT experts IT “minimalists”

moreapps.

largergrid

Grid Apps. – C. Loomis – 11 November 2006 4

Enabling Grids for E-sciencE

INFSO-RI-031688

EGEE/LCG Production Service

> 175 sites> 30 kCPU> 13 PB

htt

p:/

/go

c03

.grid

-su

pp

ort

.ac.

uk/

go

og

lem

ap

s/lc

g.h

tml

Grid Apps. – C. Loomis – 11 November 2006 5

Enabling Grids for E-sciencE

INFSO-RI-031688

Grid Virtual Organizations

• Routine and large-scale use of EGEE infrastructure.• Virtual Organizations:

– 200+ visible on the grid– 100+ registered with EGEE– App. Deploy. Plan (https://edms.cern.ch/document/722131/2)

http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_vo.php

Grid Apps. – C. Loomis – 11 November 2006 6

Enabling Grids for E-sciencE

INFSO-RI-031688

Usage History

Virtual Organizations

Dec

. ’0

5N

ov.

’06

http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_vo.php

• Sharing and federation of resources make sense!

Grid Apps. – C. Loomis – 11 November 2006 7

Enabling Grids for E-sciencE

INFSO-RI-031688

Scientific Domains

• Astrophysics– Planck, MAGIC

• Computational Chemistry• Earth Science

– Hydrology, Pollution, Climate, Geophysics, …

• Fusion• High-Energy Physics

– LHC, Tevatron, HERA, …

• Life Sciences– Medical Images, Bioinformatics, Drug Discovery

• Related Projects– Finance, Digital Libraries, …

• And more…

Grid Apps. – C. Loomis – 11 November 2006 8

Enabling Grids for E-sciencE

INFSO-RI-031688

Grid Benefits

• Science is a balance between competition and cooperation. Grid appeals to both aspects.

• Better use of resources:– Sharing: faster turnaround with lower investment.– Federation: reach previously unattainable scales

• Better science:– Faster results: get published first!– Higher quality: better statistics, more varied data.

• Collaboration– Platform to bring different people with different skills together.– Mechanism to publish, reuse, and combine previous data.

Grid Apps. – C. Loomis – 11 November 2006 9

Enabling Grids for E-sciencE

INFSO-RI-031688

Job Submission

User Interface

ResourceBroker

InformationSystem

ReplicaCatalogs

1. submit

2. query

3. query

4. submit

5. retrieve

6. retrieve

publish status

UserInterface

ResourceBroker

InformationSystem

ReplicaCatalog

StorageElement

ComputingElement

Site 1

StorageElement

ComputingElement

Site 2

Grid Apps. – C. Loomis – 11 November 2006 10

Enabling Grids for E-sciencE

INFSO-RI-031688

Comp. Serv. (Gatekeepers)

• LCG-CE (production)– Modified GT2 gatekeeper with VOMS support.– Not ported to SL4/VDT 1.3; supported until gLite CE is certified.

• gLite CE (under test)– Only direct interface is Condor-G.– Possible to run pre-WS GRAM too; not certified nor supported.– Maybe possible to run WS GRAM.

• CREAM CE (development)– Native, proprietary web-service interface.– Request to provide WS-GRAM interface in addition.

Grid Apps. – C. Loomis – 11 November 2006 11

Enabling Grids for E-sciencE

INFSO-RI-031688

Comp. Serv. (Resource Brokers)

• LCG-RB (production)– Phased out in preference to WMS.

• gLite WMS (test)– Will talk to old and new CE interfaces.– Provides higher-level services: DAG, parameterized jobs, etc.– Version deployed on production service, but not stable.– Next version extensively tested and is much more robust.

• GridWay (http://www.gridway.org/)– Lighter weight, lower latencies than EGEE brokers.– Standard DRMAA interface.– Federation of EGEE, non-EGEE resources.

Grid Apps. – C. Loomis – 11 November 2006 12

Enabling Grids for E-sciencE

INFSO-RI-031688

Comp. Serv. (Others)

• Workflow– TAVERNA, MOTEUR have been used.– Need better web-service support for these tools.

• Others– GANGA/DIANE (ARDA): job management framework– JJS (CC-IN2P3): java job submission

Grid Apps. – C. Loomis – 11 November 2006 13

Enabling Grids for E-sciencE

INFSO-RI-031688

Storage Services

• Strategy: Follow SRM (Storage Resource Manager).– Implementations provide SRMv1+ functionality.– SRMv2+ will provide better access control possibilities.

• DPM (CERN)– Disk Pool Manager: only supports disk storage.

• DCache (DESY)– Supports tape and other backends.– Very flexible, but complicated to install and configure.

• Storage Resource Broker (SRB)– Used by many disciplines for data and metadata management.– Won’t be integrated; probably can use on EGEE infrastructure.

Grid Apps. – C. Loomis – 11 November 2006 14

Enabling Grids for E-sciencE

INFSO-RI-031688

Data Management Services

• LCG File Catalog (LFC)– Actually a general file catalog included as part of gLite.– Currently has limited access control features.

• File Transfer Service (FTS)– Reliable file transfer service (i.e. batch system for data).– Used only by LHC VOs now; could be used by others.

• Hydra– Key server for data encryption.– Client in gLite; server (?).

• gLite IO, Fireman (deprecated)– Provide better ACL management and consistency.– Functionality to be incorporated into standard services.

Grid Apps. – C. Loomis – 11 November 2006 15

Enabling Grids for E-sciencE

INFSO-RI-031688

Transparent Data Access

• ELFI– Uses FUSE kernel module to expose “grid file system”.– Limited to systems where FUSE is available (easier with SL4).– Needs to allow users to mount the file system.

• Parrot– Intercepts system calls to provide grid data access.– Resides completely in user space.

Grid Apps. – C. Loomis – 11 November 2006 16

Enabling Grids for E-sciencE

INFSO-RI-031688

Metadata Services

• AMGA– Lightweight metadata catalog developed in ARDA.– Allows distribution and federation of servers.– Clients in gLite; server (?).

• OGSA-DAI– Generic, secured interface to databases.– Works but has scalability, performance problems.– Integration not likely in the near future.

• GDSE (Grid Data Source Engine)– Developed by INFN.– Generic interface to data sources (DBs included).

Grid Apps. – C. Loomis – 11 November 2006 17

Enabling Grids for E-sciencE

INFSO-RI-031688

Information Systems

• Strategy– Keep BDII-based information system for medium-term.– Need something faster and more scalable for longer term.– GLUE schema will evolve with needs of apps. and projects.

§ Version 2 should be completely (?) service-based.

• BDII (production)– LDAP-based information system.– Contains all published information.– Used for service discovery and service status.

• R-GMA– Producer-consumer deployment model.– Specialized uses: accounting and some application monitoring.

Grid Apps. – C. Loomis – 11 November 2006 18

Enabling Grids for E-sciencE

INFSO-RI-031688

Security

• Security infrastructure is mature; no significant changes in the short to medium-term.– Certificate Authority services– VOMS– LCAS/LCMAPS– Proxy renewal

• Significant work to integrate these with all services!

• Potential new services:– Hydra: Data encryption key server– G-PBOX: distribution of VO-specific policies

Grid Apps. – C. Loomis – 11 November 2006 19

Enabling Grids for E-sciencE

INFSO-RI-031688

Accounting

• Two competing/cooperating systems for collecting and presenting accounting information.– APEL

§ Works only for computing-related usage.§ Has (partial) usage information since early 2005.§ Uses R-GMA for collecting the accounting information.

– DGAS§ General framework for collecting and metering usage.§ Probably included in next release of gLite.

• Developers have agreed to use same accounting sensors for collecting information.

Grid Apps. – C. Loomis – 11 November 2006 20

Enabling Grids for E-sciencE

INFSO-RI-031688

Important Core Changes

• Move from SL3 to SL4– Change from 2.4 to 2.6-series kernel.

§ Provides better support for new hardware.§ Better performance on multi-CPU systems.

– Minor version change of GCC compiler.

• VDT (Virtual Data Toolkit)– Change from VDT 1.2 to 1.3

§ Compatibility with latest Globus Toolkit™.§ Should have web service interfaces available.

• Decision made to stop integration of new developments until August 2007 to refactor code and rationalize dependencies.

Grid Apps. – C. Loomis – 11 November 2006 21

Enabling Grids for E-sciencE

INFSO-RI-031688

Service Integration Policies

• EGEE-II users need third-party products:– “Core” only provides low-level services.– To better meet the high-level service needs of applications.– Allow applications choice of several high-level services.

• RESPECT: Recommended External Software Packages for EGEE Communities– Registry for useful, external software for EGEE scientists.– Final stages of approval within EGEE.– List will appear on the NA4 web site.– Developers must provide support and binary packages.

Grid Apps. – C. Loomis – 11 November 2006 22

Enabling Grids for E-sciencE

INFSO-RI-031688

Application Families

• Simulation• Bulk Processing• Responsive Apps.• Workflow• Parallel Jobs• Legacy Applications

Grid Apps. – C. Loomis – 11 November 2006 23

Enabling Grids for E-sciencE

INFSO-RI-031688

Simulation

• Examples– LHC Monte Carlo simulation– Fusion– WISDOM

• Characteristics– Jobs are CPU-intensive– Large number of independent jobs– Run by few (expert) users– Small input; large output

• Needs– Batch-system services– Minimal data management for storage of results

ATLAS

ITER

Grid Apps. – C. Loomis – 11 November 2006 24

Enabling Grids for E-sciencE

INFSO-RI-031688

Virtual Screening Process

• Docking:– Predict how small

molecules bind to receptor with known 3D structure.

• Projects:– Proteins@Home– Rosetta@home– Docking@Home– AFRICA@home– malariacontrol.net– WISDOM

Starting compound database

Starting target structure model

DOCKING

Predicted binding models

Post-analysis

Compounds for assay

Grid Apps. – C. Loomis – 11 November 2006 25

Enabling Grids for E-sciencE

INFSO-RI-031688

WISDOM

• WISDOM (http://wisdom.healthgrid.org/)– Developing new drugs for neglected and emerging diseases with

a particular focus on malaria.– Reduced R&D costs for neglected diseases– Accelerated R&D for emerging diseases

• Three large calculations:– WISDOM-I (Summer 2005)– Avian Flu (Spring 2006)– WISDOM-II (Autumn 2006)

• WISDOM calculations used FlexX from BioSolveIT (3-6k free, floating licenses) in addition to Autodock.

Grid Apps. – C. Loomis – 11 November 2006 26

Enabling Grids for E-sciencE

INFSO-RI-031688

Docking Results

Targets

Com-poun

ds

CPU-

years

Duration

(wk)

Max. CPUs

Size of Results

(TB)

WISDOM-I

(Q3’05)

PBD 1M 80 6 1700 1

Avian Flu

(Q2’06)

H5N1 300k 105 6 1700 0.750

WISDOM-II

(Q4’06)

GST

DHFR

DHFR

Tubulin

125M 240 8 5000 2

Grid Apps. – C. Loomis – 11 November 2006 27

Enabling Grids for E-sciencE

INFSO-RI-031688

Benefits from Grid

• Computing Resources– Provided large amount of CPUs that normally would not have

been available if it had to be bought.

• Storage Resources– Ability to hook storage for results to grid.– Ability to make permanent backups of the data.

• Tools– Job management tools to handle millions of jobs.– Tools for collecting and storing results from calculations.– Data management tools for collating the data and making it

available to others.

• Collaboration– Platform engendered new human collaboration and provides

environment in which to share and analyze data efficiently.

Grid Apps. – C. Loomis – 11 November 2006 28

Enabling Grids for E-sciencE

INFSO-RI-031688

Continued Analysis

• WISDOM-I: Molecular dynamics– 5k best plasmepsin docking compounds are being reanalyzed

using molecular dynamics codes– Need more “classic” parallel resources, either MPI on EGEE or

use of supercomputers through DEISA

• Avian Flu:– Top 5% of compounds will be refined through other methods– From top 5% of compounds:

§ structure cluster will be done for web lab assay§ 50+ compounds will be assayed experimentally by (GRC, Academia

Sinica, Taiwan)

• WISDOM-II:– Post-docking filtering and analysis.

Grid Apps. – C. Loomis – 11 November 2006 29

Enabling Grids for E-sciencE

INFSO-RI-031688

Bulk Processing

• Examples– HEP processing of raw data, analysis– Earth observation data processing

• Characteristics– Widely-distributed input data– Significant amount of input and output data

• Needs– Job management tools (workload management)– Meta-data services– More sophisticated data management

Grid Apps. – C. Loomis – 11 November 2006 30

Enabling Grids for E-sciencE

INFSO-RI-031688

Responsive Apps. (I)

• Examples– Prototyping new applications– Monitoring grid operations– Direct interactivity

• Characteristics– Small amounts of input and output data– Not CPU-intensive– Short response time (few minutes)

• Needs– Configuration which allows “immediate” execution (QoS)– Services must treat jobs with minimum latency

Grid Apps. – C. Loomis – 11 November 2006 31

Enabling Grids for E-sciencE

INFSO-RI-031688

Responsive Apps. (II)

• Grid as a backend infrastructure:– gPTM3D: interactive analysis of medical images– GPS@: bioinformatics via web portal– DILIGENT: digital libraries– Volcano sonification

• Characteristics– Rapid response: a human waiting for the result!– Many small but CPU-intensive tasks– User is not aware of “grid”!

• Needs– Interfacing (data & computing) with non-grid application or portal– User and rights management between front-end and grid

Grid Apps. – C. Loomis – 11 November 2006 32

Enabling Grids for E-sciencE

INFSO-RI-031688

• PTM3D:– Interactive analysis of 3D data for surgery planning and

volumetric analysis.– Requires “guiding” from physician to find initial contours, work

around noisy data, …– Needs unplanned, interactive access to significant computational

resources.

gPTM3D

Grid Apps. – C. Loomis – 11 November 2006 33

Enabling Grids for E-sciencE

INFSO-RI-031688

• Speed-up gives response times acceptable to doctors.• Grid overhead doesn’t dominate for short calculations.• Requires application modifications to use with grid.

Results

Dataset

(MB)

Input(MB)

Output

(MB)

Tasks 1 CPU(s)

EGEE(s)

Sm. body

87 3 6 169 315 37

Med. Body

210 9.6 57 378 1980 150

Lg. Body

346 15 86 676 1080 123

Lungs 87 0.4 2.3 95 36 24

Grid Apps. – C. Loomis – 11 November 2006 34

Enabling Grids for E-sciencE

INFSO-RI-031688

Workflow

• Examples– “Bronze Standard”: image registration– Flood prediction

• Characteristics– Use of grid and non-grid services– Complex set of algorithms for the analysis– Complex dependencies between individual tasks

• Needs– Tools for managing the workflow itself– Standard interfaces for services (I.e. web-services)

Grid Apps. – C. Loomis – 11 November 2006 35

Enabling Grids for E-sciencE

INFSO-RI-031688

Parallel Jobs

• Examples– Climate modeling– Earthquake analysis– Computational chemistry

• Characteristics– Many interdependent, communicating tasks– Many CPUs needed simultaneously– Use of MPI libraries

• Needs– Configuration of resources for flexible use of MPI– Pre-installation of optimized MPI libraries

Grid Apps. – C. Loomis – 11 November 2006 36

Enabling Grids for E-sciencE

INFSO-RI-031688

Legacy Applications

• Examples– Commercial or closed source binaries– Geocluster: geophysical analysis software– FlexX: molecular docking software– Matlab, Mathematics, …

• Characteristics– Licenses: control access to software on the grid– No recompilation no direct use of grid APIs!

• Needs– License server and grid deployment model– Transparent access to data on the grid

Grid Apps. – C. Loomis – 11 November 2006 37

Enabling Grids for E-sciencE

INFSO-RI-031688

Summary & Conclusions

• Observe routine and large-scale use of the EGEE infrastructure by numerous, diverse set of user communities.

• Present:– Grid is a collaborative platform: 10+ domains, 200+ VOs.– Grid enables sharing of resources and data for better science.

• Future:– Responsiveness: Applications requiring quality-of-service.– Workflow: Use of different infrastructures, instruments.– Bigger role for third-party software for applications on grid.