job scheduler simulator extension for evaluating queue ...susumu date, yuki matsui, yasuhiro...
TRANSCRIPT
SusumuDate,YukiMatsui,YasuhiroWatashiba,TakashiYoshikawa,ShinjiShimojo
CybermediaCenter,OsakaUniversity
JobSchedulerSimulatorExtensionforEvaluatingQueueMappingtoComputingNode
1
[Background]Fromsupercomputingcenter viewpoint
• HighPerformanceComputingEnvironmenthasbeenplayingaroleofgreaterimportance.
• Efficientandhigherjobthroughputisanimportantmissioninsupercomputingcenters.Waitingtimeisalsoimportant..
SupercomputingcenterUser
Job
Computingresults
ComputingClusterisamajorarchitectureinHPC
2
[Background]Fromasystemadministrationandmanagementviewpoint
administrators ForHigh-throughputadministration,・Utilizationrestrictionofcomputationalresourcesonschedulerqueue・Per-userconfigurationintermsofjobpriority・Selectionofschedulingalgorithm
Duetothescale-outandarchitecturalheterogeneityofsystems,alotofconfigurationparametersshouldbeconsidered.
Administrators’workloadisbecomingheavier.
Amechanismthatreducesadministrators’workloadisdemanded.
3
[Configurationexample]Queuemappingconfiguration
• Jobschedulerhasmultiplequeuesconfiguredinmostcases.• Asinglequeuemightbegoodtheoretically?butmultiplequeueare
configuredinreality.• Hybridsystemofdifferentarchitecture• Prioritizationofjobs• etc
Mappingbetweenqueuesandcomputingnodesbecomesahardconfigurationproblem.
AvailablefromQueue1
AvailablefromQueue2
Queue1Mapping:A~F
Queue2Mapping:D~H
ABCD
EFGH
4
Examplecase(1/2)
Initialstate:jobrequesting4nodesonqueue2cannotbeassigned.
Queue1Mapping:A~F
Queue2Mapping:D~H
ABCDEFGH
Requesting4nodes.
5
Examplecase(2/2)
ABCDEFGH
ABCDEFGH
IfCismappedtoQueue2,thewaitingjobrequesting4nodescanbeexecutedonC,F,GandH.
IfE,FarenotlinkedtoQueue1,therunningjobonDandEcouldbeexecutedandthewaitingjobrequestinng 4nodescanberunonE,F,GandH.
AddC
E,Fnotlinked
Actualexample:queuemappingonOCTOPUS
CPUnodeCPUnodeCPUnode
GPUnode
GPUnodeGPUnodeGPUnode
CPUqueue
GPUqueue
ApartofGPUnodescanbeusedforCPUnodes.
HowmanyGPUnodescanbeusedagainstourworkload?
OsakauniversityCybermedia cenTer Over-Petascale UniversalSupercomputer)
7
[Motivation]Difficultiesinconfiguringqueuemapping
administratorNoobjectivecriteriaforconfiguration,relyingonknowhowandexperience
Propermappingconfigurationhavetobeassisted.
Alotof combinationbetweenconfigurationparameters havetobeconsideredinatrial-and-errormanner.
He/shecannotsaywhetherhis/herconfigurationgainhigh-throughputornot.
9
[Approach]UseofJobschedulersimulationforanalyzingthebehaviorofjobassignmentsinsystem
• Anumberofjobschedulersimulatorsareavailable.GridSim,Slurm simulator,gem5-gpusimulator,MERPSYS,ALEA…
• However,mostofjobschedulersimulatorsdonothelpusinexploringtheanalysisspaceofqueuemappingproblem.• Manyofjobschedulersimulatorsfocusonschedulingalgorithm
researchdevelopment.Onlysinglequeueissupported.
• Manyofthemdoesnotallowustoconfigurethemappingrelationshipbetweencomputingnodeandqueue.
Bymodifyingandextendinganexistentschedulersimulator,werealizeatoolthatfacilitatesthesystemadministratortolearnwhichqueuemappingisbetterforacertainseriesofjobrequests.
Ourapproach: todevelopsuchatoolbasedonALEA[Klusáček et.al.2010].
ALEA:agrid-simbasedsimulatorHowever,threefunctionalitiesatleastarelackedforachievingthegoal.
10
ALEAarchitecture andlackedfunctionalities.
Jobscheduler
queue Jobselection
2.Submissionjudgement
Request:3node Cluster
3.AssignComputingnode
job
Assignedto
1.Mappingcannotbeconfigured.
2.ALEAdoesnotallowsimulationtochangethestatusofcomputingnodes,meaningthatALEAjustcountsthenumberofbusy/freenodes.
3.ALEAdoesnotdistinguish(identify)computingnode,meaningthatitdoesnotallowsimulationtochooseanyspecificcomputingnode,althoughinternaldatastructureidentifieseachnode.
11
Solution:ComputingnodemanagementmoduleasExtensiontoALEA
ComputingnodemanagementmoduleComp.nodestatusmanagementfunc.
Thisfunc.EnablesFine-grainedmapping
configuration
Mappingmng/adm func.
Thisfunc.allowssimulationtoassignaspecificcomputingnodebyidentifyingand
understandingaspecificnodestatus.
Assignmentdeterminationfunc.
Thisfunc.enablessimulationtodeterminecomputingnodeforassignmentdestinationnode.
13
Evaluation
1.Comparisonwithrealenvironment
=
2.Timetosearchallanalysisspace
Toverifywhethertosimulatethebehaviorofjobsonactualclustersystem
Toknowhowlongittakestoderivetopropermappingconfiguration
14
Experimentconditions
・・・
50jobs
4並列⽤優先度⾼
1並列⽤
・・・16nodes
投⼊ FIFO 割当
Clustersystem
Jobscheduler
For4-nodeparallelPriorityHigh
Forsingle-node
Submit Assign
1node/4node・queuefor4nodehashigherpriority
・jobisselectedinFCFS
・nodededication・nodifferenceinnodeperformance・usenon-overlappednodewithpriority
・FCFS(First-come,First-Served)・Arrival: in60secondsafterthepreviousjob・Executiontime:1~120seconds・#ofnodes:1or4nodes.
15
Experiment1: Comparison
Simulator Realenvironment
Simulatorbehavesinalmostthesamewayastherealenvironmentintermsofthroughput
• ClustersystemwithSlurm deployedissimulated.
throughput
#ofcomputingnodesmappedtoasingle-nodequeue
#ofcomputingnodesmappedtoa4-nodequeue
#ofcomputingnodesmappedtoa4-nodequeue
#ofcomputingnodesmappedtoasingle-nodequeue
throughput
16
Experiment2:Executiontimetosearchtheanalysisspace
Executiontimedramaticallyincreases..
#ofcomputingnodes
Executiontim
etose
archth
eanalysisspace
17
Experiment2:Executiontimetosimulateamappingpattern
Executiontimeisnotdependenton#ofcomp.nodes.
1
#ofcomputingnodes
Executionim
etosimulateamappingpattern
19
Summary
Problem:Currently,administratorsarerelyingontheirownexperienceandknowhowonscheduler’squeuemapping.
Goal:Buildajobschedulingsimulatorthatallowsadministratorstoinvestigatethebehaviorofjobsonqueue-computingnodemappings.
Evaluation:Sofar,wehavecheckedthatextendedsimulatorofferssimulationresultsclosetorealenvironment.
FutureIssues• Reducingtheexecutiontimetosearchanalysisspace.