Download - Intro to Monsoon and Slurm
![Page 1: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/1.jpg)
IntrotoMonsoonandSlurm
1/18/2017
![Page 2: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/2.jpg)
Introductions
• Introduceyourself– Name– Department/Group– Whatproject(s)doyouplantousemonsoonfor?– LinuxorUnixexperience– Previousclusterexperience?
![Page 3: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/3.jpg)
ListofTopics
• Clustereducation– Whatisacluster,exactly?– Queues,schedulingandresourcemanagement
• ClusterOrientation– Monsoonclusterspecifics– HowdoIusethiscluster?– Groupresourcelimits– Exercises– Questionandanswer
![Page 4: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/4.jpg)
Whatisacluster?
• Acomputerclusterismanyindividualcomputerssystems(nodes)networkedtogetherlocallytoserveasasingleresource
• Abilitytosolveproblemsonalargescalenotfeasiblealone
![Page 5: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/5.jpg)
InsideaNode
Socket0
Socket3Socket2
Socket1
![Page 6: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/6.jpg)
Whatisaqueue?
• Normallythoughtofasaline,FIFO• QueuesonaclustercanbeasbasicasaFIFO,orfarmoreadvancedwithdynamicprioritiestakingintoconsiderationmanyfactors
![Page 7: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/7.jpg)
Whatisscheduling?
• “Aplanorprocedurewithagoalofcompletingsomeobjectivewithinsometimeframe”
• Schedulingforaclusteratthebasiclevelismuchthesame. Assigningworktocomputerstocompleteobjectiveswithinsometimeavailability
• Notexactlythateasythough. Manyfactorscomeintoplayschedulingworkonacluster.
![Page 8: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/8.jpg)
Scheduling
• A schedulerneedstoknowwhatresourcesareavailableonthecluster
• Assignmentofworkonaclusteriscarriedoutmostefficientlywithschedulingandresourcemanagementworkingtogether
![Page 9: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/9.jpg)
ResourceManagement
• Monitoringresourceavailabilityandhealth• Allocationofresources• Executionofresources• Accountingofresources
![Page 10: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/10.jpg)
Clusterschedulinggoals
• Optimizequantityofwork• Optimizeusageofresources• Serviceallusersandprojectsjustly• Makeschedulingdecisionstransparent
![Page 11: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/11.jpg)
ClusterResources
• Node• Memory• CPU’s• GPU’s• Licenses
![Page 12: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/12.jpg)
Manyschedulingmethods
• FIFO– Simplyfirstinfirstout
• Backfill– Runssmallerjobswithlowerresourcerequirementswhilelargerjobswaitforhigherresourcerequirementstobeavailable
• Fairshare– Prioritizesjobsbasedonusersrecentresourceconsumption
![Page 13: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/13.jpg)
Monsoon• TheMonsoonclusterisaresourceavailabletotheNAUresearchenterprise
• 32systems(nodes)– cn[1-32]• 884cores• 12GPUs,NVIDIATeslaK80• Red Hat EnterpriseLinux6.7• 12TBmemory - 128GB/nodemin,1.5TBmax• 170TBhigh-speed scratch storage• 500TBlong-term storage• Highspeed interconnect:FDRInfiniband
![Page 14: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/14.jpg)
Monsoonscheduling
• Slurm (SimpleLinuxUtilityforResourceManagement)• Excellentresourcemanagerandscheduler• Precisecontroloverresourcerequests• DevelopedatLLNL,continuedbySchedMD• Usedeverywherefromsmallclusterstothelargestclusters:
– SunwayTaihuLight (#1),10.6Mcores,93PF,15kKW- China– Titan(#3),561Kcores,17.6PF,8kKW,UnitedStates
![Page 15: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/15.jpg)
SmallCluster!
Dualcore?
![Page 16: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/16.jpg)
LargestCluster!
10,649,600 cores
![Page 17: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/17.jpg)
Monsoonscheduling
• Combinationofschedulingmethods• Currentlyconfiguredtoutilizebackfillalongwithamultifactorprioritysystemtoprioritizejobs
![Page 18: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/18.jpg)
Factorsattributingtopriority
• Fairshare (predominantfactor)– Prioritypointsdeterminedonusersrecentresourceusage– Decayhalflifeover1days
• QOS(QualityofService)– SomeQOShavehigherprioritythanothers,forinstance:debug
• Age– howlonghasthejobsatpending• Jobsize- thenumberofnodes/cpus ajobisrequesting
![Page 19: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/19.jpg)
Storage
• /home– 10GBquota– Keepyourscriptsandexecutables here– Snapshottedtwiceaday:/home/.snapshot– Pleasedonotwritejoboutput(logs,results)here!!
• /scratch– 30dayretention– Veryfaststorage,capableof11GB/sec– Checkpoints,logs– Keepalltemp/intermediatedatahere– Shouldbeyourdefaultlocationtoperforminput/output
![Page 20: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/20.jpg)
Storage
• /projects– Long-termstorageprojectshares– Spaceisassignedtofacultymemberforgrouptoshare– Snapshotsavailable– Nobackupstoday
• /common– Clustersupportshare– Contrib:placetoputscripts/libs/confs/db’s forothersuse
![Page 21: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/21.jpg)
DataFlow
1. Keepscriptsandexecutablesin/home2. Writetemp/intermediatedatato/scratch3. Copydatato/projects/<group_project>,forgroupstorage
andreferenceinotherprojects4. Cleanup/scratch
**Remember,/scratchisascratchfilesystem,usedforhigh-speedtemporary,andintermediatedata
![Page 22: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/22.jpg)
Remotestorageaccess
• scp– scp [email protected]:/scratch/nauid– WinSCP
• samba/cifs– \\nau.froot.nau.edu\cirrus(windows)– smb://nau.froot.nau.edu/cirrus(mac)
![Page 23: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/23.jpg)
Groups
• NAUhasaresourcecalledEnterprisegroups• Theyareavailabletoyouontheclusterifyou’dliketomanageaccesstodata
• Manageaccesstoyourfiles• https://my.nau.edu
– “GotoEnterpriseGroups”– TakealookatourFAQ::nau.edu/hpc/faq
• Iftheyarenotworkingforyou,contactITShelpdesk
![Page 24: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/24.jpg)
Software• ENVI/IDL• Matlab• IntelCompilers,andMKL• R• Qiime• AnacondaPython• OpenFOAM• SOWFA• Lotsofbioinformaticsprograms• Requestadditionalsoftwaretobeinstalled!
![Page 25: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/25.jpg)
Modules
• Softwareenvironmentmanagementhandledbythemodulespackagemanagementsystem
• moduleavail– whatmodulesareavailable• modulelist– modulescurrentlyloaded• moduleload<modulename>- loadapackagemodule• moduledisplay<modulename>- detailedinformationincludingenvironmentvariableseffected
![Page 26: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/26.jpg)
MPI
• QuicknoteonMPI• MessagePassingInterfaceforparallelcomputing• OpenMPIsetasdefaultMPI• Mvapich2alsoavailable
– moduleunloadopenmpi– moduleloadmvapich2
• ExampleMPIjobscript:– /common/contrib/examples/job_scripts/mpijob.sh
![Page 27: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/27.jpg)
InteractingwithSlurm• Decidewhatyouneedtoaccomplish• Whatresourcesareneeded?
– 2cpus,12GBmemory,for2hours?• Whatstepsarerequired?
– Runprog1,thenprog2…etc– Arethestepsdependentononeanother?
• Canyourwork,orprojectbebrokenupintosmallerpieces?Smallerpiecescanmaketheworkloadmoreagile.
• Howlongshouldyourjobrunfor?• Isyoursoftwaremultithreaded,usesOpenMP orMPI?
![Page 28: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/28.jpg)
ExampleJobscript• #!/bin/bash• #SBATCH--job-name=test• #SBATCH--output=/scratch/nauid/output.txt• #SBATCH--time=20:00 #shortertime=soonerstart• #SBATCH--workdir=/scratch/nauid
• #replacethismodulewithsoftwarerequiredinyourworkload• moduleloadpython/3.3.4
• #examplejobcommands• #eachsrun commandisajobstep,sothisjobwillhave2steps• srun sleep300• srun python-V
![Page 29: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/29.jpg)
Interactive/DebugWork
• Runyourcompilesandtestingontheclusternodesby:
– srun –pallgcc hello.c –oa.out– srun –pall-c12make-j12– srun –qos=debug-c12make-j12– srun Rscript analysis.r– srun pythonanalysis.py– srun cp -av /scratch/NAUID/lots_o_files /scratch-lt/NAUID/destination
![Page 30: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/30.jpg)
LongInteractivework• salloc
– ObtainaSLURMjoballocationthatyoucanworkwithforanextendedamountoftimeinteractively.Thisisusefulfortesting/debuggingforanextendedamountoftime.
[cbc@wind ~]$salloc –c8--time=2-00:00:00salloc:Grantedjoballocation33442[cbc@wind ~]$srun pythonanalysis.py[cbc@wind ~]$exitsalloc:Relinquising job allocation 33442[cbc@wind ~]$salloc -N2salloc:Grantedjoballocation33443[cbc@wind ~]$srun hostnamecn3.nauhpccn2.nauhpc[cbc@wind ~]$exitsalloc:Relinquising job allocation 33443
![Page 31: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/31.jpg)
JobParametersYouwant SwitchesneededMorethan onecpu forthejob --cpus-per-task=2, or-c2
To specify an ordering of your jobs --dependency=afterok:job_id,or-djob_id
Split up the output, and errors --output=result.txt --error=error.txtTo run your job at a particular time/day
--begin=16:00 --begin=now+1hour --begin=2010-01-20T12:34:00
Add MPItasks/rankstoyourjob --ntasks=2, or-n2To control job failure options --norequeue –requeueToreceivestatusemail --mail-type=ALL
![Page 32: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/32.jpg)
Contraints andResourcesYouwant SwitchesneededTo choose a specific node feature (e.g. avx2) --constraint=avx2
To use a generic resources (e.g. a gpu) --gres=gpu:tesla:1
To reserve a whole node for yourself --exclusive To chose a partition --partition
![Page 33: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/33.jpg)
Submitthescript
[cbc@wind ~]$sbatch jobscript.shSubmittedbatchjob85223
– slurm returns ajob idforyour job that you can use tomonitorormodify constraints
![Page 34: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/34.jpg)
Monitoringyourjob
• squeue– viewinformationaboutjobslocatedintheSLURMschedulingqueue.
• squeue --start• squeue -ulogin• squeue -o“%j%u…“• squeue -ppartitionname• squeue -Ssortfield• squeue -t<state>(PDorR)
![Page 35: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/35.jpg)
Clusterinfo
• sinfo– viewinformationaboutSLURMnodesandpartitions.
• sinfo -N–l• sinfo –R
– Listreasonsfordownednodesandpartitions
![Page 36: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/36.jpg)
Monitoringyourjob
• sprio– viewthefactorsthatcompriseajob’sschedulingpriority
• sprio –l-- listpriorityofusersjobsinpendingstate
• sprio -o“%j%u…“• sprio -w
![Page 37: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/37.jpg)
Monitoringyourjob
• sstat– Displayvariousstatusinformationofarunningjob/step.
• sstat -jjobid• sstat -oAveCPU,AveRSS• Onlyworkswithjobscontainingsteps
![Page 38: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/38.jpg)
Controling yourjob
• scancel– UsedtosignaljobsorjobstepsthatareunderthecontrolofSlurm.
• scancel -jjobid• scancel -njobname• scancel -umylogin• scancel -tpending(onlyyours)
![Page 39: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/39.jpg)
Controllingyourjob
• scontrol– UsedtoviewandmodifySlurm configurationandstate.
• scontrol showjob85224
![Page 40: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/40.jpg)
JobAccounting• sacct
– displaysaccountingdataforofyourjobsandjobstepsintheSLURMjobaccountinglogorSLURMdatabase
• sacct -jjobid -ojobid,elapsed,maxrss• sacct -Nnodelist• sacct -umylogin
• Tryouralias“jobstats”– jobstats– jobstats –j<jobid>
![Page 41: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/41.jpg)
JobAccounting
• sshare– Toolforlistingthesharesofassociationstoacluster.
• sshare -l:viewandcompareyourfairshare withotheraccounts
• sshare -a:viewallusersfairshare• sshare –A–a<account>:viewallmembersinyouraccount(group)
![Page 42: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/42.jpg)
Accounthierarchy• Youruseraccountbelongstoaparentfacultyaccount(group)• Youruseraccountsharesresourcesthatareprovidedforyourgroup
• Example:– coffey
• cbc• mkg52
• Viewtheaccountstructureyoubelongtowith:“sshare -a–A<account>”
• Example:– sshare -a-Acoffey
![Page 43: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/43.jpg)
Limitsontheaccount(group)
• Limitsareinplacetopreventintentionalorunintentionalmisuseofresourcestoensurequickandfairturnaroundtimesonjobsforeveryone.
• Groupsarelimitedtoatotalnumberofcpu minutesinuseatonetime:700,000
• Thiscpu resourcelimitmechanismisreferredtoas:“TRESRunMins”
![Page 44: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/44.jpg)
TRESRunMins Limit
• Whattheheckisthat!?• Anumberwhichlimitsthetotalnumberofremainingcpu minuteswhichyourrunning jobscanoccupy.
• Enablesflexibleresourcelimiting• Staggersjobs• Increasesclusterutilization• Leadstomoreaccurateresourcerequests
• Sumofjobs(cpus *timelimit remaining)
![Page 45: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/45.jpg)
Examples
• 14400=10jobs,1cpu,1dayinlength• 144000=10jobs,10cpu,1dayinlength• 576000=10jobs,10cpu,5daysinlength• 648000=100jobs,1cpu,½dayinlength
Questions?
• Checkyourgroupscpu minusage:– sshare -l
![Page 46: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/46.jpg)
Exercise1
GettoknowmonsoonandSlurm,onyourown.
1. Howmanynodesmakeupmonsoon?– Hint:use“sinfo”
2. Howmanynodesareintheallpartition?3. Howmanyjobsarecurrentlyintherunningstate?
– Hint:use“squeue -tR”4. Howmanyjobsarecurrentlyinthependingstate?Why?
– Hint:use“squeue –tPD”
![Page 47: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/47.jpg)
Exercise2• Createasimplejobinyourhomedirectory• Examplehere:/common/contrib/examples/job_scripts/simplejob.sh (copyitifyoulikeJ )
• Nameyourjob:“exercise”• Nameyourjobsoutput:“exercise.out”• Outputshouldgoto/scratch/<user>/exercise.out• Loadthemodule“workshop”• Runthe“date”command• Andadditionally,the“secret”command• Submityourjobwithsbatch,i.e.“sbatch simplejob.sh”
![Page 48: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/48.jpg)
Exercise3• Makeyourjobsleepfor5minutes(sleep300)
– Sleepisacommandthatcreatesaprocessthat…sleeps• Monitoryourjob
– squeue -uyour_nauid– squeue -tR– scontrol showjobjobnum– sacct -jjobnum
• Inspectthesteps
• Cancelyourjob– scancel jobnum
![Page 49: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/49.jpg)
Exercise4• Copyjobscriptandedit:
– /common/contrib/examples/job_scripts/lazyjob.sh
• Submitthejob,itwilltake65sectocomplete• Usesstat andmonitorthejob
– sstat -j<jobid>• Reviewtheresourcesthatthejobused
– jobstats -j<jobid>• Wearelookingfor“MaxRSS”,MaxRSS isthe
maxamountofmemoryused• Editthejobscript,reducethememorybeing
requestedinMBandresubmit,edit“--mem=“,e.g.--mem=600
• Reviewtheresourcesthattheoptimizedjobutilizedonceagain– jobstats -j<jobid>
• Ok,memorylooksgood,butnoticethattheusercpuisthesameastheelapsedtime
Usercpu =num utilizedcpus *elapsedtime
• Thisisbecausetheapplicationwewererunningonlyused1ofthe4cpus thatwerequested
• Editthelazyjobscript,commentoutfirstsruncommand,anduncommentthesecondsruncommand.
• Resubmit• Rerunjobstats -j<jobid>,noticenowusercpu isa
multipletimestheelapsedtime,inthiscase(4).Becausewewereallocated4cpus,andused 4cpus.
![Page 50: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/50.jpg)
Slurm Arrays!
![Page 51: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/51.jpg)
Slurm ArraysExercise
• Fromyourscratchdirectory:“/scratch/nauid”• tarxvf /common/contrib/examples/bigdata_example.tar• cdbigdata• editthefile“job_array.sh”sothatitworkswithyournauidreplacingallNAUIDwithyours
• Submitthescript“sbatch job_array.sh”• Run“squeue”,noticethereare5jobsrunning,howdidthathappen!
![Page 52: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/52.jpg)
MPIExample
• RefertotheMPIexamplehere:– /common/contrib/examples/job_scripts/mpijob.sh
• Editit,foryourworkareas,thenexperiment:– Changenumberoftasks,nodes…etc
• Alsocanruntheexamplelikethis:– srun --qos=debug–n4/common/contrib/examples/mpi/hellompi
![Page 53: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/53.jpg)
Keepthesetipsinmind
• Knowthesoftwareyouarerunning• Requestresourcesaccurately• Supplyanaccuratetimelimitforyourjob• Don’tbelazy,itwilleffectyouandyourgroupnegatively
![Page 54: Intro to Monsoon and Slurm](https://reader035.vdocuments.mx/reader035/viewer/2022072219/62d9f7f67c655e6e710add8e/html5/thumbnails/54.jpg)
QuestionandAnswer
• Moreinfohere:http://nau.edu/hpc
• Linuxshellhelphere:– http://linuxcommand.org/tlcl.php– Freebookdownload
• Andonthenauhpc listserv– [email protected]