seed project working group 07.05.20071 seed project working group report seed project working group...

31
07.05.2007 1 Seed Project Working Group Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid .org Wibke Sudholt, University of Zurich (lead) [email protected]

Upload: cathleen-cook

Post on 13-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 1

Seed Project Working Group

Seed Project Working Group Report

Seed Project Working [email protected]

Wibke Sudholt, University of Zurich (lead)[email protected]

Page 2: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 2

Seed Project Working Group

Seed Project Working Group

• Nabil Abdennadher, EIG/HES-SO

• Peter Engel, UniBE• Derek Feichtinger, PSI• Dean Flanders, FMI• Placi Flury, SWITCH• Pascal Jermini, EPFL• Sergio Maffioletti, CSCS

• Cesare Pautasso, IBM• Heinz Stockinger, SIB• Wibke Sudholt, UZH

(lead)• Michela Thiemard,

EPFL• Nadya Williams,

UZH/CSCS• Christoph Witzig,

SWITCH

Page 3: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 3

Seed Project Working Group

Goals

• Followup Workshop November 2007– What is there already? Low hanging fruit.– Come up with a plan.– Interoperability among sites.– Security infrastructure.

• Report February 2007– Identify which resources (people, hardware, middleware,

applications, ideas) are readily available and represent strong interest among the current partners of the SGI.

– Based on available resources, propose one or more seed project(s) that will help to initialize, test, and demonstrate the SGI collaboration. The seed project(s) should be realizable in a fast, easy, and inexpensive manner (“low hanging fruit”).

– Provide help with the coordination and the realization of the defined seed project(s).

Page 4: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 4

Seed Project Working Group

Collaboration and Activities

• Foundation on Swiss Grid Initiative Followup Workshop in Bern on November 23, 2006

• Wiki page: https://twiki.cscs.ch/bin/view/SwissGridInitiative/SeedProjectWorkingGroup

• Mailing list: [email protected]• Subversion source repository: http://svn.cscs.ch/SGI-seed/• In-person meetings: 07.12.2006 in Fribourg (Grid Crunching

Day), 07.05.2007 in Bern (Swiss Grid Day)• Phone conferences: 18.01.2007, 01.02.2007, 20.02.2007,

20.03.2007, 04.04.2007, 24.04.2007 (summaries available)• Seed Project Survey• Intermediate report on February 28, 2007• Work on realization of seed project

Page 5: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 5

Seed Project Working Group

Seed Project Survey

• Informal inventory of resources available for seed project• Form requesting information about

– Member groups

– Available personnel

– Computer hardware

– Lower-level grid middleware

– Higher-level grid middleware

– Scientific application software and data

– Seed project ideas

• Submitted to Swiss Grid Initiative mailing list on December 12, 2006, deadline for responses on December 22, 2006

• Received 12 completed forms and two supportive emails as responses

Page 6: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 6

Seed Project Working Group

Survey Results

• There is high interest in the Swiss Grid Initiative and the seed project.• Already a lot of grid computing expertise from other grid projects and

middleware exists.• Some groups willing to invest more than spare time, but manpower still

sparse resource.• No shortage in computer hardware, mostly Linux clusters or desktop

PCs. Actual availability for seed project remains to be seen.• Different lower-level grid middleware employed or developed at

member sites, but enough common ground.• Higher-level grid middleware and applications diverse and often

coupled. Typical high-performance computing domain areas (biology, chemistry, physics).

• Two main themes in the project ideas– Specific middleware or application projects– Work towards grid interoperability

Related to opinion if there should be several or one seed project.

Page 7: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 7

Seed Project Working Group

Seed Project Proposal

• Cover the two different aspects from the survey– Involve as many existing infrastructure sites and suggested

scientific applications as possible– Serve as testbed for grid interoperability

• Tackle the seed project in a two-fold way– Build a cross-product infrastructure of selected grid middleware and

applications by gridifying each application on each middleware pool in a non-intrusive manner

– Record experiences and deduct a list of requirements for selection or creation of a meta-middleware

• Handling of practical aspects– Seed Project Working Group manages the seed project in

collaboration with the other Swiss Grid Initiative members– Rely on the help of SWITCH for security aspects (authentication

and authorization, virtual organization, grid certificates, etc.)

Page 8: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 8

Seed Project Working Group

Seed Project Definition

Page 9: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 9

Seed Project Working Group

Selection of Middleware

• Criteria– Already deployed at partner sites– Sufficient expertise and manpower– Representative of existing larger grid efforts– Not too complex requirements– Must be diverse and provide sufficient set of capabilities

• Initial focus– EGEE gLite / Globus Toolkit 2 (deployed at CHIPP, CSCS, SWITCH, UniBas, SIB) -

responsible: Heinz Stockinger, SIB– Nordugrid ARC (deployed at CSCS, SIB/Vital-IT, UniBas, UZH) - responsible: Sergio

Maffioletti, CSCS– XtremWeb-CH (developed and deployed at EIG/HES-SO) - responsible: Nabil

Abdennadher, EIG/HES-SO– Condor (deployed at EPFL) - responsible: Pascal Jermini, EPFL

• Later focus– United Devices (deployed at UniBas and others)– Globus Toolkit 4 / WSRF (pre-WS components deployed at CSCS, UZH)– UNICORE (deployed at UZH and others)

Page 10: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 10

Seed Project Working Group

Selection of Applications

• Criteria– Need for application from the Swiss scientific user community– Sufficient expertise and manpower– Not too complex requirements– Gridification on basis of individual executions or embarrassingly parallel parameter

scans, without changing the source code if possible– Should be diverse and cover sufficient set of requirements– Reusage of existing grid-enabled applications

• Initial focus– Cones (mathematical crystallography, individual code) - responsible: Peter Engel,

UniBE– GAMESS (quantum chemistry, standard free open source code) - responsible: Wibke

Sudholt, UZH– Huygens (remote deconvolution for imaging, standard commercial code) - responsible:

Dean Flanders, FMI– PHYLIP (bioinformatics, standard free open source code) - responsible: Nabil

Abdennadher, EIG/HES-SO• Later focus

– Mascot (proteomics analysis, standard commercial code) - responsible: Dean Flanders, FMI

– Monte Carlo simulation (high-energy physics) - responsible: Derek Feichtinger, PSI– Swiss Bio Grid applications

Page 11: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 11

Seed Project Working Group

Work towards Realization

• Wiki pages and documents– Middleware and application requirements lists - done

– Middleware and application information - in progress

– Project plan and status - in progress

• Building of middleware pools– Test infrastructure - in progress

– Production infrastructure - to do

• Preparation of applications– Functional and “real-life” test cases - in progress

– Grid-enabling of codes - in progress

– Small program library for input and output processing - in progress

• Focus on collaboration, infrastructure and knowledge building, not on achieving scientific results

Page 12: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 12

Seed Project Working Group

Middleware: EGEE gLite

• Responsible person: Heinz Stockinger, SIB• The EGEE middleware provides software tools for secure job submission,

data management, etc.– Deployed in most of European countries – Biggest grid infrastructure world-wide

• gLite homepage: http://www.glite.org/• Deployment status in Switzerland

– Switch• Resource Broker, VO management services, etc.• Locally (behind firewall): Computing Element, Worker Node, gLite clients

– CSCS• Computing Element• Storage Element• gLite client software

– SIB Lausanne• Client software (LCG version)

Page 13: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 13

Seed Project Working Group

gLite (cont.)

• A Virtual Organisation called SGA has been created– Needs to be made available at CSCS and SIB

• Currently, the three sites have different versions of middleware due to different activities in EGEE– Versions will be adapted soon– SIB is planning to provide an additional gLite client machine

• Job submission to gLite is already possible using existing VOs such as CMS, biomed– Gives access to ~50-100 sites per VO – User certificate registration with EGEE is required

Page 14: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 14

Seed Project Working Group

Middleware: NorduGrid ARC

• Responsible person: Sergio Maffioletti, CSCS

• Grid middleware development and testbed deployment project in the Nordic countries

• NorduGrid middleware is ARC (Advanced Resource Connector)– Enables production-quality computational and data grids– Open source under GPL license– Uses replacements and extensions of Globus Toolkit pre-WS services

• NorduGrid homepage: http://www.nordugrid.org/

Page 15: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 15

Seed Project Working Group

NorduGrid (cont.)

• NorduGrid ARC middleware deployed in Switzerland as part of the Swiss Bio Grid project

• Status of seed project:– Installed and set up at CSCS and UZH– Cones and GAMESS applications deployed and tested

• To do:– Deploy and test

other applications– Integrate other

NorduGrid sites

Page 16: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 16

Seed Project Working Group

Middleware: XtremWeb-CH

• Responsible person: Nabil Abdennadher, HES-SO• XtremWeb-CH is a desktop grid middleware

– Public (non-dedicated) platform– Supports communicating “jobs” and direct communications between

“providers” (workers)– Can fix the “granularity” of the application according to the “state” of

the platform

• XtremWeb-CH homepage: http://www.xtremwebch.net/• XtremWeb-CH Wiki page:

http://www.xtremwebch.net/mediawiki/index.php/Main_Page• Deployed applications

– PHYLIP: PHYLogeny Inference Package

• XtremWeb-CH today– ~200 workers (mainly Windows platforms)– 2 sites: EIG (Geneva) and HEIG-VD (Yverdon)

Page 17: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 17

Seed Project Working Group

XtremWeb-CH: Architecture

WebService

XtremWeb-CHCoordinator

User Application

OSBinaries

XML file

Work request

Work Result

Scheduler

Worker’s manager

Task’s manager

Warehouse

Worker

Work Alive

Data

C

XWCH applicationStructure

Brokers

BA

XWCHDB

Page 18: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 18

Seed Project Working Group

Middleware: Condor

• Responsible person: Pascal Jermini, EPFL• Condor homepage: http://www.cs.wisc.edu/condor/• Greedy@EPFL: http://greedy.epfl.ch/• Condor provides the infrastructure for desktop grids

– Job queues management

– Resources management

– Data and binaries transfer to the compute nodes

– Promotes fair computing ressources sharing

– Multi-platform (Linux, Windows, Mac OS X, some other UNIX variants)

• Can be interfaced with other middlewares such as UNICORE or Globus

• Middleware still in active developement (i.e., project not dead)

Page 19: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 19

Seed Project Working Group

Condor (cont.)

• Deployment status at EPFL– In production with approximately 200 desktop CPUs available

• 60% of Windows machines, 40% of Linux or OS X machines• Computing power available only during the night and weekends.

Machine owner has priority over any running job.• Size of the pool is still growing

– One submit server and one Central Manager (running Linux)– All nodes and servers behind EPFL firewall– Condor managers generally have no access to the compute nodes

for third-party software installation (Condor is installed by node owners, not by Grid managers!)

– Smaller grid (4 very old nodes) also available, but only for tests; restricted to Grid managers, but with full access to them.

• Due to the «desktop» nature of EPFL grid, relatively short jobs are advised (6h max.; not enforced)

Page 20: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 20

Seed Project Working Group

Application: Cones

• Responsible person: Peter Engel, UniBE (with help of Nadya Williams, UZH/CSCS)

• Cones is a crystallography program. For a given representative quadratic form it calculates its subcone of equivalent combinatorial parallelohedra.

• For dimension d = 6 number expected to be greater than 100’000’000

• Cones Wiki page: https://twiki.cscs.ch/bin/view/SwissGridInitiative/CONES

Page 21: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 21

Seed Project Working Group

Cones (cont.)

• C program developed by an individual

• Several text input files, one execution command, several text output files

• Can be executed in parallel in two ways– Running of several jobs off the same input file (best suited for cluster

infrastructure, less than 24 nodes)– Cutting of input file into pieces (best suited for grid infrastructure, 500-2000

cones per input)

• Status of Cones deployment and testing– Wiki page - in progress– Adaptation and generalization of source code - done– Configuration and makefile creation - done– Test installation at CSCS and UZH - done– Test runs and comparison with known input - done– Remote job submission using NorduGrid testbed at CSCS and UZH - done

Page 22: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 22

Seed Project Working Group

Application: GAMESS

• Responsible person: Wibke Sudholt, UZH (with help of Nadya Williams, UZH/CSCS)

• General Atomic and Molecular Electronic Structure System

• Program package for ab initio molecular quantum chemistry

• Standard free open source code developed and used by many groups

• Available for large variety of operating systems and hardware

• Mainly Fortran 77 and C code and shell scripts

• GAMESS homepage: http://www.msg.ameslab.gov/GAMESS/

• GAMESS Wiki page: https://twiki.cscs.ch/bin/view/SwissGridInitiative/GAMESS

• Usually one keyword-driven text input file, one execution command, several text output files

• Well parallelized by its own implementation, called Distributed Data Interface (DDI)

• Comes with lots of functional test cases

Page 23: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 23

Seed Project Working Group

GAMESS (cont.)

• Two possible layers of grid distribution– External: Embarrassingly parallel parameter scans in input file (already

previously implemented with Nimrod and BOINC)– Internal: Component distribution based on current DDI parallelization

implementation (probably needs considerable programming efforts)

• Status of seed project– Wiki page - ongoing– Configuration and makefile creation - in progress– Test installation at CSCS and UZH - done– Test runs and comparison with known input - done– Remote job submission using NorduGrid testbed at CSCS and UZH - done– Collection of “real life” scientific test cases - ongoing (available on SVN)– Development of small Java program and library for creating parameter scan

input files and commands - ongoing (available on SVN)– Integration with XtremWeb-CH - starting

Page 24: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 24

Seed Project Working Group

Application: Huygens

• Responsible: Dean Flanders, FMI• Huygens is an image deconvolution software developed and

distributed by Scientific Volume Imaging• It can be used for the restoration, visualization, and analysis of

microscopy images• Standard commercial software, but parts available as freeware• Scientific Volume Imaging homepage: http://www.svi.nl/• Five node-locked licenses available at Friedrich Miescher

Institute, not always fully used• Status of seed project

– No known progress up to now

• To do– Probably agreement with Scientific Volume Imaging needed for grid

use, but company usually very collaborative

Page 25: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 25

Seed Project Working Group

Application: PHYLIP

• Responsible person: Nabil Abdennadher, HES-SO

• PHYLogeny Inference Package• Used to generate “life” trees

(evolutionary trees)• The most widely-distributed

phylogeny package• In distribution since 1980,

15’000 users• PHYLIP homepage:

http://evolution.genetics.washington.edu/phylip.html

• PHYLIP Wiki page: https://twiki.cscs.ch/bin/view/SwissGridInitiative/PHYLIP

“Life” tree

Page 26: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 26

Seed Project Working Group

PHYLIP (cont.)

• A package of programs (~34)– Source code (C) and executables (Win, Mac OS, Linux) are available– Input data are read into the program from a text file– Output data are written onto text files

• Data types– DNA sequences– Protein sequences– Etc.

• Methods available– Parsimony– Distance matrix– Likelihood methods– Bootstrapping and consensus trees

• Already deployed on XtremWeb-CH

Page 27: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 27

Seed Project Working Group

Some Lessons Learned

• There is considerable interest and expertise in grid collaboration in Switzerland• Consequences of diverse seed project middleware and application selection

– Setup and testing more complex and heterogeneous– More knowledge gain, participation, and collaboration of people– Requirements, procedures, and results can be better abstracted and generalized

• Middleware architecture and security differ considerably between computational grid tools (gLite, NorduGrid) and desktop grid tools (XtremWeb-CH, Condor)

• Applications can be distributed onto a grid infrastructure at two different levels– External: Parameter scans, input splitting or other “wrapper” tasks, usually embarrassingly parallel,

often corresponding to how users apply a code (focus of seed project)– Internal: Directed towards tightly-coupled parallel computer systems, requiring implementation on the

source code level and balance of parallel tasks, often performing inter-process communication, usually transparent to the application user

• Application gridification usually needs direct cooperation between scientific developers and grid experts

• Suggestions for grid project management– Dedicated partners, regular team meetings, reaching of consensus, and conscious project steering

important– Selection of responsible person for each middleware and application tool good idea to bundle

knowledge and ease communication– Considerable investment of people time and thus money expected to reach production state

Page 28: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 28

Seed Project Working Group

Further Plans

• Finishing of the seed project– Continue with current approach, potentially after revising middleware and application

lists– Each application should run on each middleware pool– Each middleware pool should consist of at least two partner sites– Completion planned until summer 2007

• Documentation and communication– Regular meetings and reports about status and results– Documentation about middleware and application setup– Recording of the lessons learned– Requirements lists and recommendations for meta-middleware and production

infrastructure– Publication on conference and/or in article

• Continuation of work– Transfer into sustainable production infrastructure for grid computing in Switzerland– Extension to further middleware and applications (e.g., UNICORE)– Inclusion of data grid features– Selection or development of meta-middleware to integrate different middleware pools

(e.g., ISS)– Collaboration with other national and international grid projects– Transfer of Seed Project Working Group into other working groups

Page 29: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 29

Seed Project Working Group

Questions to the Audience

• Have we taken the right approach?• Do you agree with the seed project scope?• Is there other grid middleware we should consider?• Are there other scientific applications we should consider?• Does anybody else would like to participate and contribute?• How should we document and communicate our status and results?• Do you agree with the seed project timeline?• What should happen after the end of the seed project?• How to transfer the seed project into a sustainable production infrastructure?• How to fund these efforts?• What are the requirements for data grid features?• How to select or develop a meta-middleware?• How to transfer the Seed Project Working Group into other working groups?• What should happen with the survey data?• Do you have any other ideas?

Page 30: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 30

Seed Project Working Group

Thanks

Discussion

Page 31: Seed Project Working Group 07.05.20071 Seed Project Working Group Report Seed Project Working Group seed-wg@swiss-grid.org seed-wg@swiss-grid.org Wibke

07.05.2007 31

Seed Project Working Group

Tentative Agenda forWorking Group Session

• 15:20-16:00: Individual and informal 5 min presentations about each of the middleware and application tools by the corresponding responsible people. Mainly so that we all better understand each other's tools

– gLite, NorduGrid, XtremWeb-CH, Condor– Cones, GAMESS, Huygens, PHYLIP

• 16:00-17:00: Time for informal discussion. Some ideas– Responses to feedback received from the audience in the early afternoon– Technical discussions within the middleware pools and with the application

drivers about setup and testing plans and problems– Potential changes/additions/deletions on the middleware and application

lists– Potential setup of an additional UNICORE/ISS-based middleware pool– Further timeline of the Seed Project– Documentation and publication of results– Anything else you would like to discuss in person