the eminerals minigrid and the national grid service: a user’s perspective ngs169 (a. marmier)
Post on 04-Jan-2016
224 Views
Preview:
TRANSCRIPT
The eMinerals minigrid and the national grid service:
A user’s perspective
NGS169(A. Marmier)
Objectives
1. User Profile2. Two real resources:
eMinerals MinigridNational Grid Service
3. Practical Difficulties4. Amateurish rambling
(discussion/suggestions)
User Profile 1Atomistic modelling communityChemistry/physics/material science
Potentially big users of eScience(CPU intensive, NOT data)
VASP, SIESTA, DL_POLY, CASTEP …
Want to run parallel codes
User Profile 2
Relative proficiency with Unix, mainframes, etc …
Scripting parallel programming
Note of caution:Speaker might be biased
Want to run parallel codes
eMinerals
Virtual Organisation, NERC
The eMinerals project brings together simulation scientists, applications developers and computer scientists to develop UK eScience/grid capabilities for molecular simulations of environmental issues
Grid prototype: the minigrid
eMinerals: Minigrid
3 clusters of 16 pentiumsUCL condor poolEarth Science Cambridge condor
poolSRB vaultsSRB manager at Daresbury
eMinerals: Minigrid philosophy
Globus 2
• No Login possible (except one debug/compile cluster)
• No easy Files transfer (have to use SRB, see later)
• Feels very ‘gridy’, but not painless• Promotes condorG and home
wrappers
eMinerals: Minigrid exampleUniverse = globusGlobusscheduler = lake.bath.ac.uk/jobmanager-pbsExecutable = /home/arnaud/bin/vasp-lam-intelNotification = NEVERtransfer_executable = true
Environment = LAMRSH=ssh -xGlobusRSL = (job_type=mpi)(queue=workq)(count=4)(mpi_type=lam-intel)
Sdir = /home/amr.eminerals/run/TST.VASP3Sget = INCAR,POTCAR,POSCAR,KPOINTSSget = OUTCAR,CONTCARSRBHome = /home/srbusr/SRB3_3_1/utilities/bin
log = vasp.logerror = vasp.erroutput = vasp.outQueue
My_condor_submit script example
NGS: What ?
VERY NICE PEOPLE who offer access to LOVELY clusters
Real GRID approximation
NGS: Resources“Data” Clusters: 20 compute nodes with
dual Intel Xeon 3.06 GHz CPUs, 4 GB RAMgrid-data.rl.ac.uk - RALgrid-data.man.ac.uk – Manchester
“Compute” Clusters: 64 compute nodes with dual Intel Xeon 3.06 GHz CPUs, 2 GB RAM
grid-compute.leeds.ac.uk - WRG Leedsgrid-compute.oesc.ox.ac.uk – Oxford
Plus Other nodes : HPCx, Cardiff, Bristol …
NGS: Setup
Grid-proxy-initGsi-ssh …
Then, a “normal” machine Permanent fixed account (NGS169) unix queuing system
With gsi-ftp for file transfer
NGS: example
globus-job-rungrid-compute.oesc.ox.ac.uk/jobmanager-fork /bin/ls
globusrun -b grid-compute.oesc.ox.ac.uk/jobmanager-pbs example1.rsl
[EXAMPLE1.RSL:& (executable=DLPOLY.Y) (jobType=mpi) (count=4)
(environment=(NGSMODULES intel-math:gm:dl_poly))
Interlude
1 2 3 4 5 6 7 8 9 10 11S1
S4
S7
S10
-2.3-2.25-2.2-2.15-2.1-2.05-2
-1.95-1.9-1.85-1.8-1.75-1.7
Difficulty 1: access
Well known problem
- Certificate - Globus enabled machine- SRB account (2.0)
Difficulty 2: Usability
How do I submit a job ?• Directly (gsi-ssh…)• Remotely (globus,condorG)
DirectLogin, checkq, submit, (kill), logout
Different Batch Queuing Systems (PBS, condor,LoadLeveler …)
Usability 2
Usually requires a “script”Almost nobody writes their own scriptsWorks by inheritance and adaptation
At the moment eScience forces the user to learn the syntax of the B.Q.S.
Usability 3
Remote[EXAMPLE1.RSL:
& (executable=DLPOLY.Y) (jobType=mpi) (count=4) (environment=(NGSMODULES intel-math:gm:dl_poly))
Ignores file transferIgnores more complex submit
structures
Usability 4
Ignores more complex submit structures
abinit <inp.txt Cpmd.x MgO.inp
=> User has to learn globus syntax :o/
(environment and RSL)
Finally
At the moment no real incentives to submit remotely
Mechanism to reward the early adopters
Access to special queues• Longer walltime ?• More cpus ?
CONCLUSION
Submission scripts are very important and useful pieces of informationEasily accessible examples would save a lot of time
Mechanism to encourage remote submission (access to better queues)
top related