hydra: using windows desktop systems in distributed parallel computing arvind gopu, douglas grover,...
TRANSCRIPT
![Page 1: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/1.jpg)
HYDRA: Using Windows Desktop Systems in Distributed
Parallel Computing
Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve
Simms, Adam Sweeny, Peng Wang
University Information Technology ServicesIndiana University
![Page 2: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/2.jpg)
SC'05, Seattle, WA
Problem Description
Turn Windows desktop systems at IUB student labs into a scientific resource. 2300 systems, 3 year replacement cycle 1.5 Teraflops Fast ethernet or better
Harvest idle cycles.
![Page 3: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/3.jpg)
SC'05, Seattle, WA
Constraints
Systems dedicated to students using desktop office applications — not parallel scientific computing
Microsoft Windows environment Daily software rebuild
Systems dedicated to students using desktop office applications — not parallel scientific computing
Microsoft Windows environment Daily software rebuild
![Page 4: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/4.jpg)
SC'05, Seattle, WA
What could these systems be used for?
Many small computations and a few small messages Master-worker Parameter studies Monte Carlo
![Page 5: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/5.jpg)
SC'05, Seattle, WA
Assembling small ephemeral resources
Different parallel libraries have constraints of some form or the other MPI not designed to handle ephemeral
resources
![Page 6: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/6.jpg)
SC'05, Seattle, WA
Solution Simple Message Brokering Library (SMBL)
Limited replacement for MPI Process and Port Manager (PPM)
… Plus …
Condor NT Job management
Web portal Job submission
![Page 7: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/7.jpg)
SC'05, Seattle, WA
The Big PictureWe’ll discuss each part in more detail next…
The shaded box indicates components hosted on multiple desktop computers
![Page 8: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/8.jpg)
SC'05, Seattle, WA
Portal
Creates and submits Condor files, handles data files
Apache based PHP web interface
http://hydra.indiana.edu (IU users)http://hydra.iu.teragrid.org (Teragrid users)
![Page 9: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/9.jpg)
SC'05, Seattle, WA
Condor
Condor for Windows NT/2000/XP “Vanilla universe” -- no support for check-
pointing or parallelism Provides:
Security Match-making Fair sharing File transfer Job submission, suspension, preemption,
restart
![Page 10: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/10.jpg)
SC'05, Seattle, WA
SMBL
In charge of message delivery for each parallel session
Client library implements selected MPI-like calls
Both server and client library based on TCP socket abstraction
![Page 11: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/11.jpg)
SC'05, Seattle, WA
SMBL (Contd … ) Managing Temporary Workers
SMBL server maintains a dynamic pool of client process connections
Worker job manager hides details of ephemeral workers at the application level
Porting from MPI is fairly straight forward
![Page 12: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/12.jpg)
SC'05, Seattle, WA
Process and Port Manager (PPM)
Assigns port/host to each parallel session Starts the SMBL server and application
processes on demand Directs workers to their servers
![Page 13: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/13.jpg)
SC'05, Seattle, WA
Once again … the big picture
The shaded box indicates components hosted on multiple desktop computers
![Page 14: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/14.jpg)
SC'05, Seattle, WA
System Layout
PPM, SMBL server and web portal running on Linux server -- Dual Intel Xeon 1.7 GHz, 2 GB memory and GigE inter-connect
STC Windows worker machines -- Combination of different OS (Windows 2000/XP) and network inter-connect speeds (GigE/100 Mbps/10 Mbps)
![Page 15: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/15.jpg)
SC'05, Seattle, WA
Applications
FastDNAml-p Parallel application, master-worker model,
small granularity of work Provides generic interface for parallel
communication library (MPI, PVM, SMBL) Reliability built in: Foreman process copes with
delayed or lost workers Blast Meme
![Page 16: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/16.jpg)
SC'05, Seattle, WA
PortalPortal
![Page 17: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/17.jpg)
SC'05, Seattle, WA
Applications – FastDNAml
![Page 18: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/18.jpg)
SC'05, Seattle, WA
FastDNAml-p Performance
1
10
100
1000
10000
100000
1000000
0 50 100 150 200 250 300
Number of Processors
Research SP Condor Cluster
![Page 19: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/19.jpg)
SC'05, Seattle, WA
Other Applications – Parallel MEME
![Page 20: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/20.jpg)
SC'05, Seattle, WA
Other Applications – BLAST
![Page 21: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/21.jpg)
SC'05, Seattle, WA
Utilization of Idle Cycles
Red: total owner Blue: total idle Green: total Condor
![Page 22: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/22.jpg)
SC'05, Seattle, WA
Recent Development
Hydra cluster Teragrid’ized! (Oct 2005) Allow TG users to use resource Virtual Host based solution – two different URLs
for IU and Teragrid users Teragrid users authenticate against different
Kerberos server (PSC) Still to-do
Usage accounting
![Page 23: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/23.jpg)
SC'05, Seattle, WA
Work in Progress/Future Direction
Once again … Teragrid’ization of Hydra cluster Usage Accounting – Report usage byTeragrid users
New Portal – JSR 168 compliant, certificate based authentication capability
Range of applications – Virtual machines, so forth
![Page 24: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/24.jpg)
SC'05, Seattle, WA
Summary
Large parallel computing facility created at very low cost SMBL parallel message passing library that can
deal with ephemeral resources PPM port broker that can handle multiple parallel
sessions SMBL (Open Source) Home – http://
smbl.sourceforge.net
![Page 25: HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve](https://reader035.vdocuments.mx/reader035/viewer/2022062719/56649ecf5503460f94bdc2e1/html5/thumbnails/25.jpg)
SC'05, Seattle, WA
Links and References Hydra Portal
http://hydra.indiana.edu (IU users) http://hydra.iu.teragrid.org (Teragrid users)
SMBL home page – http://smbl.sourceforge.net Condor home page -- http://www.cs.wisc.edu/condor/ IU Teragrid home page – http://iu.teragrid.org
Parallel FastDNAml – http://www.indiana.edu/~rac/hpc/fastDNAml
Blast -- http://www.ncbi.nlm.nih.gov/BLAST Meme -- http://meme.sdsc.edu/meme/intro.html