mygrid: a user-centric approach for grid computing walfredo cirne universidade federal da paraíba
TRANSCRIPT
High-Performance Computing
• High-Performance Computing means running faster than the typical machine du jour
• Unbeatable price/performance of microprocessors has killed specialized high-performance machines
• Therefore, paralelism currently is the way to do High-Performance Computing– Parallel supercomputers
Solving a Real Problem
• I had hundreds of thousands of independent simulations to run
• Parallel supercomputers are typically– hard to get acess to – slow (too much time in the queue)
• Since my simulations were independent, I had the perfect application for the Computational Grid
Grid Computing
• Grid Computing aims to enable the execution of parallel applications over processors that are:– Geographically distributed– Under multiple administrative domains– Not dedicated
• The potential for resource gathering is enormous– “Let´s run over the Internet”
Grid Applications
• Not all applications can benefit from the Grid
• Loosely coupled applications match the Grid characteristics much better than tightly coupled applications
State of Art in Grid Computing
• Most services are provided by the Grid Infrastructure– Naming, remote execution/task control, security,
etc
• Scheduling is done at the application level
• Globus
• “Virtual Organizations”
Back to the Real Problem
• I had hundreds of thousands of independent simulations to run
• I was working in a top research lab in Grid Computing
• I could not manage to use the Grid
• It is hard to get the Grid Infrastructure Software installed everywhere
The Motivation for MyGrid
• Users of loosely coupled applications could benefit from the Grid now
• However, they don´t run on the Grid today because the Grid Infrastructure is not widely deployed
• What if we build a solution at the user level? That is, a solution that does not depend upon installed infrastructure?
MyGrid
• MyGrid is a framework to build infrastructure-independent grid applications
• The user provides:– A description of her Grid– A way to do remote execution and file transfer– “The application”
• MyGrid provides:– Grid abstractions– Scheduling
MyGrid Goals
• open = do not require a particular infrastructure
• self-installable = do not require manual installation on a given machine
• extensible = simple to add refinements
• complete = cover the whole production cycle
MyGrid Concepts
• Job = set of independent tasks– Tasks have three pieces: init, remote and final
• Home machine Grid machine
• Grid abstractions– remote execution– file transfer– playpen– mirroring
Defining My Personal Gridbagre.dsc.ufpb.brdsc, linuxssh %machine %commandscp %localdir/%file %machine:%remotedirscp %machine:%remotedir/%file %localdir
traira.dsc.ufpb.brdsc, linuxssh %machine %commandscp %localdir/%file %machine:%remotedirscp %machine:%remotedir/%file %localdir
quidam.ucsd.educse, linuxssh %machine %commandscp %localdir/%file %machine:%remotedirscp %machine:%remotedir/%file %localdir
Fatoring with MyGrid
• Fatora n gerates tasks, init, remotei, and collect• User runs mygrid.ui.AddTask < tasks• tasks
task:init= initremote= remote1final= collectprocessor= linuxplaypensize= 0cost = 1task:init= initremote= remote2…
Fatoring with MyGrid
• initjava mygrid.ui.MyGridUI p $PROC ./Fat.class $PLAYPEN
• remote1java Fat 3 18655 34789789798 output-$TASK
• remote2java Fat 18655 37307 34789789798 output-$TASK
• collectjava mygrid.ui.MyGridUI g $PROC "" $PLAYPEN saida-
$TASK .
Running an MyGrid Task
(3c)(3b)
task-done (4)remote exec (3)
playpen, file xfer, and remote exec (3a)
(2)
add-task (1)
Home Machine
Grid Machine
Task Manager
User Agent Server
home stasks
User Agent
Daemon
grid stask
User Agent
• User Agent provides the grid abstractions
• User Agent Daemon runs on grid machines
• User Agent Server runs on home machines
• The Daemon and the Server rely upon public-key cryptography to authenticate each other
Self Instalation
• We are working on having MyGrid install and start-up User Agents everywere
• The user provides a way to do remote execution and file transfer to make that possible
Scheduling in MyGrid
• Grid scheduling is application dependent and effort intensive
• Most people don´t want to spend months to write good schedulers for their applications
• MyGrid provides a sensible default scheduler – The user can of course replace the default
scheduler
Default Scheduler
• How to provide good performance with no knowledge about the application or the current state of the Grid– The key is to avoid having the job waiting for a
task that runs in a slow/loaded machine
• Task replication is our answer for this problem– Task replication is only done when the jobs has
no other tasks
Preliminary Results
• During a 40-day period, we ran 600,000 simulations using 178 processors located in 6 different administrative domains widely spread in the USA
• MyGrid took 16.7 days to run the simulations
• My desktop machine would have taken 5.3 years to do so
• Speed-up is 115.8 for 178 processors
Conclusions
• Running Grid Applications at the user-level is a viable strategy
• Bag-of-tasks parallel applications can currently benefit from the Grid
• Is “upperware” the way to go for new middleware development?