distributed computing at the facility level: applications and attitudes tom griffin stfc isis...
TRANSCRIPT
![Page 1: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/1.jpg)
Distributed computing at the Facility level: applications and attitudes
Tom GriffinSTFC ISIS Facility
NOBUGS 2008, Sydney
![Page 2: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/2.jpg)
Spare cycles
• Typical PC CPU usage is about 10%
• Usage minimal 5pm – 8am
• Most desktop PCs are really fast
• Waste of energy
• How can we use (“steal?”) unused CPU
cycles to solve computational problems?
![Page 3: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/3.jpg)
Types of Application
•CPU Intensive
•Low to moderate memory use
•Not too much file output
•Coarse grained
•Command line / batch driven
•Licensing issues?
![Page 4: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/4.jpg)
Distributed computing solutions
Lots of choice CONDOR, GridEngine, GridMP…
• Grid MP Server hardware• Two, dual Xeon 2.8GHz servers RAID 10
• Software• Servers run RedHat Linux Enterprise Server / DB2• Unlimited Windows (and other) clients
•Programming• Web Services interface – XML, SOAP• Accessed with C++ , Java, C#
• Management Console• Web browser based• Can manage services, jobs, devices etc
• Large industrial user base•GSK, J&J, Novartis etc.
![Page 5: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/5.jpg)
Installing and Running Grid MP
Server Installation2 hours
Client InstallationCreate MSI and RPM using ‘setmsiprop’30 seconds
Manual InstallBetter security on Linux and Macs
![Page 6: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/6.jpg)
Adapting a program for GridMP
1) Think about how to split your data
2) Wrap your executable
3) Write the application service• Pre and Post processing
• Fairly easy to write
• Interface to grid via Web Services
•C++, Java, C#
![Page 7: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/7.jpg)
Package your executable
PROGRAM MODULEEXECUTABLE
Uploaded to, and residenton, the server
ExecutableDLLs Standard data
files Environmentvariables
Compress?
Encrypt? }
![Page 8: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/8.jpg)
Create / run a jobPkg1 Pkg4Molecules Proteins
Pkg2 Pkg3
Create job, generatecross product
Datasets
Workunits
Clie
nt s
ide
Ser
ver
side
https://
Start job
![Page 9: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/9.jpg)
Code examples
Mgsi.Job job = new Mgsi.Job();job.application_gid = app.application_gid;job.description = txtJobName.Text.Trim();job.state_id = 1;job.job_gid = ud.createJob(auth, job);
Mgsi.JobStep js = new Mgsi.JobStep();js.job_gid = job.job_gid;js.state_id = 1;js.max_concurrent = 1js.max_errors = 20;js.num_results = 1;js.program_gid = prog.program_gid;
![Page 10: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/10.jpg)
Code examplesMgsi.DataSet ds = new Mgsi.DataSet();ds.job_gid = job.job_gid;ds.data_set_name = job.description + "_ds_" + DateTime.Now.Ticks;ds.data_set_gid = ud.createDataSet(auth, ds);
for (int i = 1; i <= numWorkunits.Value; i++) {FileTransfer.UploadData uploadD = ft.uploadFile(auth, Application.StartupPath + "\\testdata.tar");Mgsi.Data data = new Mgsi.Data();data.data_set_gid = ds.data_set_gid;data.index = i;data.file_hash = uploadD.hash;
data.file_size = long.Parse(uploadD.size);datas[i - 1] = data; }
ud.createDatas(auth, datas);
ud.createWorkunitsFromDataSetsAsync(auth, js.job_step_gid, new string[] { ds.data_set_gid }, options);
![Page 11: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/11.jpg)
PerformanceFamotidine form B13 degrees of freedomP21/c V=1421Sync data to 1.64A1 x 107 moves per run, 64 runs
Standard DASH2.4GHz Core2 Quadusing single core
Job complete = 9 hrs
Gdash submit to testgrid of 5 in-use PCs4 x 2.4GHz Core2 Quad1 x 2.8GHz Core2 Quad
Job complete = 24 minutes
Speedup = 22.5 x
![Page 12: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/12.jpg)
Performance – 999 SA runs, full grid
Time
Wor
kuni
ts
317 coresfrom 163 devices
42 Athlons: 1.6–2.2Ghz168 Core 2 duos: 1.8–3 Ghz36 Core 2 quads: 2.4–2.8 Ghz1 duron @ 1.2Ghz42 P4s 2.4–3.6Ghz27 Xeons: 2.5–3.6Ghz
4 days 18 hours CPU in ~40 minutes elapsed time
![Page 13: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/13.jpg)
A Particular Success - McStas
HRPD supermirror guide design
Complex designMeaningful simulations take a long time
Want to try lots of ideas
Many runs of >200 CPU days
Simpler model was best value
Massive improvement in flux
Significant cost savings
![Page 14: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/14.jpg)
Problems
McStas
Interactions in the wild
Symantec Anti-Virus
Did not show up in testing
McStas restricted to night running only
![Page 15: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/15.jpg)
User Attitudes
A range
Theft
“I’m not having that on my machine”
First thing to get blamed
Gaining more trust
Evangelism by users
![Page 16: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/16.jpg)
Flexibility with virtualisation
Request to run ‘GARefl’ code
ISIS is Windows based
Few Linux PCs
VMWare server is freeware
8 Hosts gave 26 cores
More cores = more demand
56 real cores recruited from servers, 64-core Beowulf
10 mac cores
Run Linux as a job
![Page 17: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/17.jpg)
Flexibility with virtualisation
![Page 18: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/18.jpg)
The Future
Grid growing in power every dayNew machines added, old ones still left on
ElectricityEnergy saving drive at STFC – switch machines off
Wake On-LAN ‘Magic Packets’ + Remote hibernate
LaptopsGood or bad?
![Page 19: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/19.jpg)
Summary
Distributed computing Perfect for coarse-grained,CPU intensive, ‘disk-lite’
Resources Use existing resources. Power increases with time, no need to write-off assets. Scalable
Not just faster Allows one to try different scenarios
Virtualisation Linux under Windows, Windows under Linux.
Green credentials PCs are running anyway, better to utilise them. Can be powered down & up.
![Page 20: Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney](https://reader030.vdocuments.mx/reader030/viewer/2022020117/56649cf85503460f949c9808/html5/thumbnails/20.jpg)
Acknowledgements
ISIS Data Analysis GroupKenneth ShanklandDamian Flannery
STFC FBU IT Service Desk and ISIS Computing Group
Key UsersRichard Ibberson (HRPD)Stephen Holt (GARefl)
Questions?