scheduling generic parallel applications – classification, meta- scheduling sathish vadhiyar...

Scheduling Generic Parallel Scheduling Generic Parallel Applications – classification, Applications – classification,

Meta-schedulingMeta-scheduling

Sathish VadhiyarSathish Vadhiyar

Sources/Credits/Taken from: Sources/Credits/Taken from: Papers listed in “References” slidePapers listed in “References” slide

Scheduling ArchitecturesScheduling Architectures

Centralized schedulersCentralized schedulersSingle-site scheduling – a job does not span across sitesSingle-site scheduling – a job does not span across sitesMulti-site – the oppositeMulti-site – the opposite

Hierarchical structures - A central scheduler Hierarchical structures - A central scheduler (metascheduler) for global scheduling and local (metascheduler) for global scheduling and local scheduling on individual sitesscheduling on individual sites

Decentralized scheduling – distributed schedulers Decentralized scheduling – distributed schedulers interact, exchange information and submit jobs to interact, exchange information and submit jobs to remote systemsremote systems

Direct communication – local scheduler directly contacts Direct communication – local scheduler directly contacts remote schedulers and transfers some of its jobsremote schedulers and transfers some of its jobsCommunication via central job pool – jobs that cannot be Communication via central job pool – jobs that cannot be immediately executed are pushed to a central pool, immediately executed are pushed to a central pool, other local schedulers pull the jobs out of the poolother local schedulers pull the jobs out of the pool

Various Scheduling ArchitecturesVarious Scheduling Architectures

Metascheduler across MPPsMetascheduler across MPPs

TypesTypes CentralizedCentralized

A meta scheduler and local dispatchersA meta scheduler and local dispatchersJobs submitted to meta schedulerJobs submitted to meta scheduler

HierarchicalHierarchicalCombination of central and local schedulersCombination of central and local schedulersJobs submitted to meta schedulerJobs submitted to meta schedulerMeta scheduler sends job to the site for which earliest Meta scheduler sends job to the site for which earliest start time is expectedstart time is expectedLocal schedulers can follow their own policiesLocal schedulers can follow their own policies

DistributedDistributedEach site has a metascheduler and a local schedulerEach site has a metascheduler and a local schedulerJobs submitted to local metaschedulerJobs submitted to local metaschedulerJobs can be transffered to sites with lowest loadJobs can be transffered to sites with lowest load

Evaluation of schemesEvaluation of schemesCentralized

Hierarchical

Distributed

1. Global knowledge of all resources – hence optimized schedules

2. Can act as a bottleneck for large number of resources and jobs

3. May take time to transfer jobs from meta scheduler to local schedulers – need strategic position of meta scheduler

1. Medium level overhead

2. Sub optimal schedules

3. Still need strategic position of central scheduler

1. No bottleneck – workload evenly distributed

2. Needs all-to-all connections between MPPs

Evaluation of Various Scheduling Evaluation of Various Scheduling ArchitecturesArchitectures

Experiments to evaluate slowdowns in the 3 Experiments to evaluate slowdowns in the 3 schemesschemesBased on actual trace from a supercomputer centre Based on actual trace from a supercomputer centre – 5000 job set– 5000 job set4 sites were simulated – 2 with the same load as 4 sites were simulated – 2 with the same load as trace, other 2 where run time was multiplied by 1.7trace, other 2 where run time was multiplied by 1.7FCFS with EASY backfilling was usedFCFS with EASY backfilling was usedslowdown = (wait_time + run_time) / run_timeslowdown = (wait_time + run_time) / run_time2 more schemes2 more schemes

Independent – when local schedulers acted independently, Independent – when local schedulers acted independently, i.e. sites are not connectedi.e. sites are not connected

United – resources of all processors are combined to form United – resources of all processors are combined to form a single sitea single site

ResultsResults

ObservationsObservations1. Centralized and hierarchical performed slightly better than uniteda. Compared to hierarchical, scheduling decisions have to be

made for all jobs and all resources in united – overhead and hence wait time is highb. Comparing united and centralized.

i. 4 categories of jobs corresponding to 4 different combinations of 2 parameters – execution time (short, long) and number of resources requested (narrow, wide)

ii. Usually larger number of long narrow jobs than short wide jobs

iii. Why is centralized and hierarchical better than united?2. Distributed performed poorly

a. Short narrow jobs incurred more slowdown

b. short narrow jobs are large in number and best candidates for back filling

c. Back filling dynamics are complex

d. A site with an average light may not always be the best choice. SN jobs may find earliest holes in a heavily loaded site.

Newly Proposed ModelsNewly Proposed Models

K-distributed modelK-distributed model Distributed scheme where local metascheduler Distributed scheme where local metascheduler

distributes jobs to k least loaded sitesdistributes jobs to k least loaded sites When job starts on a site, notification is sent to When job starts on a site, notification is sent to

the local metascheduler which in turn asks the the local metascheduler which in turn asks the k-1 schedulers to dequeuek-1 schedulers to dequeue

K-Dual queue modelK-Dual queue model 2 queues are maintained at each site – one for 2 queues are maintained at each site – one for

local jobs and other for remote jobslocal jobs and other for remote jobs Remote jobs are executed only when they Remote jobs are executed only when they

don’t affect the start times of the local jobsdon’t affect the start times of the local jobs Local jobs are given priority during backfillingLocal jobs are given priority during backfilling

Results – Benefits of new schemesResults – Benefits of new schemes

45% improvement 15% improvement

Results – Usefulness of K-Dual Results – Usefulness of K-Dual schemescheme

Grouping jobs submitted at lightly loaded sites and Grouping jobs submitted at lightly loaded sites and heavily loaded sitesheavily loaded sites

Meta scheduler with Meta scheduler with AppLeS Local SchedulersAppLeS Local Schedulers

GoalsGoals

The aim was to overcome deficiencies of The aim was to overcome deficiencies of using plain AppLeS agentsusing plain AppLeS agents

Also to have global policiesAlso to have global policies Resolving different claims of applicationsResolving different claims of applications Improving the response times of individual Improving the response times of individual

apps.apps. Taking care of load dynamicsTaking care of load dynamics

Work done as part of GrADS projectWork done as part of GrADS project GrGrid id AApplication pplication DDevelopment evelopment SSoftwareoftware Collaboration between different universitiesCollaboration between different universities

GridRoutine /

ApplicationManager

User

Initial GrADS ArchitectureInitial GrADS Architecture

ResourceSelector

PerformanceModeler

MDSNWS

Matrix size, block size

Resource characteristics,

Problem characteristics

Final schedule – subset of resources

Performance ModelerPerformance ModelerGrid

Routine /Application

Manager

PerformanceModeler

All resources,

Problem parameters

Final schedule – subset of resources

SchedulingHeuristic

SimulationModel

All resources, problem parameters

Final Schedule

Candidate resources Execution cost

The scheduling heuristic passed only those candidate schedules that had “sufficient” memory

This is determined by calling a function in simulation model

Simulation ModelSimulation Model

Simulation of the ScaLAPACK right Simulation of the ScaLAPACK right looking LU factorizationlooking LU factorization

More about the applicationMore about the application Iterative – each iteration corresponding Iterative – each iteration corresponding

to a blockto a block Parallel application in which columns are Parallel application in which columns are

block-cyclic distributedblock-cyclic distributed Right looking LU – based on Gaussian Right looking LU – based on Gaussian

eliminationelimination

Gaussian Elimination - ReviewGaussian Elimination - Review

for each column ifor each column i zero it out below the diagonal by zero it out below the diagonal by

adding multiples of row i to later adding multiples of row i to later rowsrows

for i= 1 to n-1for i= 1 to n-1 for each row j below row ifor each row j below row i for j = i+1 to nfor j = i+1 to n A(j, i) = A(j, i) / A(i, i)A(j, i) = A(j, i) / A(i, i) for k = i+1 to nfor k = i+1 to n A(j, k) = A(j, k) – A(j, i)* A(i, k)A(j, k) = A(j, k) – A(j, i)* A(i, k)

0

0

0

0

0

0

0

0

0

0

0

i

ii,i

X

X

x

j

k

A(i,i)

A(j,i) A(j,k)

A(i,k)

Finished multipliers

Finished multipliers

i

i

A(i+1:n, i)

A(i, i+1:n)

A(i+1:n, i+1:n)

Need for blocking - BLASNeed for blocking - BLAS

Basic Linear Basic Linear Algebra Algebra SubroutineSubroutine

Memory hierarchy Memory hierarchy efficiently efficiently exploited by exploited by higher level BLAS higher level BLAS

3 levels3 levels

BLASBLAS MemMemory ory Refs.Refs.

FlopFlopss

FlopsFlops//MemMemory ory refs.refs.

Level-1Level-1

(vector)(vector)

y=y+axy=y+ax

Z=y.xZ=y.x

3n3n 2n2n 2/32/3

Level-2Level-2

(Matrix-vector)(Matrix-vector)

y=y+Axy=y+Ax

A = A+(alpha) xyA = A+(alpha) xyTT

nn22 2n2n22 22

Level-3Level-3

(Matrix-Matrix)(Matrix-Matrix)

C=C+ABC=C+AB

4n4n22 2n2n33 n/2n/2

Converting BLAS2 to BLAS3Converting BLAS2 to BLAS3

Use blocking for optimized Use blocking for optimized matrix-matrix-multipliesmultiplies (BLAS3) (BLAS3)Matrix multiplies by Matrix multiplies by delayed delayed updatesupdatesSave several updates to trailing Save several updates to trailing matricesmatricesApply several updates in the form of Apply several updates in the form of matrix multiplymatrix multiply

Modified GE using BLAS3Modified GE using BLAS3Courtesy: Dr. Jack DongarraCourtesy: Dr. Jack Dongarra

for ib = 1 to n-1 step b /* process matrix b columns at a time */for ib = 1 to n-1 step b /* process matrix b columns at a time */ end = ib+b-1;end = ib+b-1; /* Apply BLAS 2 version of GE to get A(ib:n, ib:end) factored./* Apply BLAS 2 version of GE to get A(ib:n, ib:end) factored. Let LL denote the strictly lower triangular portion of A(ib:end, ib:end) */Let LL denote the strictly lower triangular portion of A(ib:end, ib:end) */ A(ib:end, end+1:n) = LLA(ib:end, end+1:n) = LL-1-1A(ib:end, end+1:n) /* update next b rows of U */A(ib:end, end+1:n) /* update next b rows of U */ A(end+1:n, end+1:n) = A(end+1:n, end+1:n) - A(ib:n, ib:end)* A(ib:end, A(end+1:n, end+1:n) = A(end+1:n, end+1:n) - A(ib:n, ib:end)* A(ib:end,

end+1:n) end+1:n) /* Apply delayed updates with single matrix multiply *//* Apply delayed updates with single matrix multiply */

bib end

ib

end

bCompleted part of L

Completed part of U A(ib:end, ib:end)

A(end+1:n, ib:end) A(end+1:n, end+1:n)

OperationsOperations

So, the LU application in each So, the LU application in each iteration involves:iteration involves: Block factorization – (ib:n, ib:ib) floating Block factorization – (ib:n, ib:ib) floating

point operationspoint operations Broadcast for multiply – message size Broadcast for multiply – message size

equals approximately n*block_sizeequals approximately n*block_size Each process does its own multiply:Each process does its own multiply:

Remaining columns divided by number of Remaining columns divided by number of processorsprocessors

Back to the simulation modelBack to the simulation modeldouble getExecTimeCost(int matrix_size, int block_size, candidate_schedule){double getExecTimeCost(int matrix_size, int block_size, candidate_schedule){

for(i=0; i<number_of_blocks; i++){for(i=0; i<number_of_blocks; i++){ /* find the proc. Belonging to the column. Note its speed, its connections to other /* find the proc. Belonging to the column. Note its speed, its connections to other

procs. */procs. */ tfact += … /* simulate block factorization. Depends on {processor_speed, tfact += … /* simulate block factorization. Depends on {processor_speed,

machine_load, flop_count of factorization */machine_load, flop_count of factorization */

tbcast += max(bcast times for each proc.) /* scalapack follows split ring broadcast. tbcast += max(bcast times for each proc.) /* scalapack follows split ring broadcast. Simulate broadcast algorithm for each proc. Depends on {elements of matrix to be Simulate broadcast algorithm for each proc. Depends on {elements of matrix to be broadcast, connection bandwidth and latency */broadcast, connection bandwidth and latency */

tupdate += max(matrix multiplies across all proc.) /* depends on {flop count of matrix tupdate += max(matrix multiplies across all proc.) /* depends on {flop count of matrix multiply, processor speed, load} */multiply, processor speed, load} */

}}

return (tfact + tbcast + tupdate);return (tfact + tbcast + tupdate);

}}

GridRoutine /

ApplicationManager

User

Initial GrADS ArchitectureInitial GrADS Architecture

ResourceSelector

PerformanceModeler

AppLauncher

ContractMonitor Application

MDSNWS

Matrix size, block size

Resource characteristics,

Problem characteristics

Problem, parameters, app. Location, final schedule

Contract Monitor ArchitectureContract Monitor Architecture

AutopilotManager

Application

Sensors

ContractMonitor

Fork

registrationObtain sensor information

Obtain information about variable x

Performance Model EvaluationPerformance Model Evaluation

GrADS LimitationsGrADS Limitations

Hence a metascheduler that has global knowledge of all applications is needed

MetaschedulerMetaschedulerTo ensure that applications are To ensure that applications are scheduled based on correct resource scheduled based on correct resource informationinformationTo accommodate as many new To accommodate as many new applications as possibleapplications as possibleTo improve the performance contract of To improve the performance contract of new applicationsnew applicationsTo minimize the impact of new To minimize the impact of new applications on executing applicationsapplications on executing applicationsTo employ policies to migrate executing To employ policies to migrate executing applicationsapplications

GridRoutine /

ApplicationManager

User

Modified GrADS ArchitectureModified GrADS Architecture

ResourceSelector

PerformanceModeler

ContractDeveloper

AppLauncher

ContractMonitor

Application

MDS

NWSPermission

Service

RSS

ContractNegotiator

Rescheduler

DatabaseManager

Database ManagerDatabase Manager

A persistent service listening for requests from A persistent service listening for requests from the clientsthe clientsMaintains global clockMaintains global clockHas event notification capabilities – clients can Has event notification capabilities – clients can express their interests in various events.express their interests in various events.Stores various information:Stores various information:

Application’s statesApplication’s states Initial machinesInitial machines Resource informationResource information Final scheduleFinal schedule Location of various daemonsLocation of various daemons Average number of contract violationsAverage number of contract violations

Database Manager (Contd…)Database Manager (Contd…)

When an application stops or completes, When an application stops or completes, the database manager calculates the database manager calculates percentage completion time of the percentage completion time of the applicationapplication

time_diff : (current_time – time when the time_diff : (current_time – time when the application instance started)application instance started)

Avg_ratio: average of (actual costs / Avg_ratio: average of (actual costs / predicted costs)predicted costs)

Permission ServicePermission Service

After collecting resource information from After collecting resource information from NWS, the GrADS apps. contact PS.NWS, the GrADS apps. contact PS.PS makes decisions based on the problem PS makes decisions based on the problem requirements and resource characteristics requirements and resource characteristics makes decisionsmakes decisionsIf resources have enough capacity, then If resources have enough capacity, then permission is givenpermission is givenIf not, the permission serviceIf not, the permission service Waits for resource consuming applications to Waits for resource consuming applications to

end soonend soon Preempts resource consuming applications to Preempts resource consuming applications to

accommodate short applicationsaccommodate short applications

Permission Service (Pseudo code)Permission Service (Pseudo code)

Permission Service – determining Permission Service – determining resource consuming applicationsresource consuming applicationsFor each currently executing GrADS app., For each currently executing GrADS app., contact DBM, obtain NWS resource contact DBM, obtain NWS resource information.information.

Determine change of resources caused by Determine change of resources caused by app. iapp. i

Add the change to current resource Add the change to current resource characteristics to obtain resource characteristics to obtain resource parameters in the absence of app. iparameters in the absence of app. i

Determining remaining execution Determining remaining execution timetime

Whenever a meta scheduler Whenever a meta scheduler component wants to determine component wants to determine remaining execution time of app., it remaining execution time of app., it contacts the contract monitor of app.contacts the contract monitor of app.Retrieves average of ratios between Retrieves average of ratios between actual times and predicted timesactual times and predicted timesUses {average, predicted time, Uses {average, predicted time, percentage completion time} to percentage completion time} to determine r.e.t.determine r.e.t.

Determing r.e.t (pseudo code)Determing r.e.t (pseudo code)

Determining r.e.t (pseudo code)Determining r.e.t (pseudo code)

Contract NegotiatorContract Negotiator

Main functionalitiesMain functionalities Ensure apps. made decisions based on updated resource Ensure apps. made decisions based on updated resource

informationinformation Improve the performance of current apps. by possibly stopping Improve the performance of current apps. by possibly stopping

and continuing executing big apps.and continuing executing big apps. Reduces the impact caused by current apps. on executing Reduces the impact caused by current apps. on executing

apps.apps.When contract is approved, the application starts using When contract is approved, the application starts using resourcesresourcesWhen contract is rejected, the application goes back to When contract is rejected, the application goes back to obtain new resource characteristics, and generates new obtain new resource characteristics, and generates new schedulescheduleEnforces ordering of the applications whose application-Enforces ordering of the applications whose application-level characteristics use the same resourceslevel characteristics use the same resources

approves the contract of one applicationsapproves the contract of one applications Waits for the application to start using resourcesWaits for the application to start using resources Rejects the contract of the otherRejects the contract of the other

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Ensuring app. has made scheduling decision based on correct resource information

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Improving the performance of the current app. by preempting an executing large app.

Contract Negotiator – 3 scenariosContract Negotiator – 3 scenariost1 – average completion time of current app. and big app. when big app. is preempted, current app. accommodated, big app. continued

t2 - average completion time of current app. and big app. when big app. is allowed to complete, then current app. is accommodated

t3 - average completion time of current app. and big app. when both applications are executed simultaneously

if (t1 < 25% of min(t2, t3) case 1

else if(t3 > 1.2t2) case 2

else case 3

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Improving the performance of the current app. by preempting an executing large app.

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Reducing the impact of the current app. on executing app. by modifying the schedule

Application and Metascheduler InteractionsApplication and Metascheduler Interactions

User

ResourceSelection

RequestingPermission

PermissionService

Permission?

Application SpecificScheduling

ContractDevelopment

ContractNegotiator

ContractApproved?

ApplicationLaunching

Problem parameters

Initial list of machines

PermissionNO

YES

Abort

Exit

Get new resource information

Application specific schedule

Get new resource information

NOYES

ApplicationCompletion?

Application Completed

Wait for restartsignal

Application was stopped

Problem parameters, final schedule Get new resource

information

Experiments and ResultsExperiments and ResultsDemonstration of Permission ServiceDemonstration of Permission Service

ReferencesReferences

A taxonomy of scheduling in general-purpose distributed A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Transactions on Software computing systems. IEEE Transactions on Software Engineering. Engineering. Volume 14 , Issue 2 (February 1988) Pages: 141 - Volume 14 , Issue 2 (February 1988) Pages: 141 - 154 Year of Publication: 1988 154 Year of Publication: 1988 AuthorsAuthors T. L. Casavant J. G. Kuhl T. L. Casavant J. G. KuhlEvaluation of Job-Scheduling Strategies for Grid Evaluation of Job-Scheduling Strategies for Grid ComputingSourceLecture Notes In Computer Science. ComputingSourceLecture Notes In Computer Science. Proceedings of the First IEEE/ACM International Workshop on Proceedings of the First IEEE/ACM International Workshop on Grid Computing. Grid Computing. Pages: 191 - 202 Year of Publication: 2000 Pages: 191 - 202 Year of Publication: 2000 ISBN:3-540-41403-7. Volker Hamscher Uwe Schwiegelshohn ISBN:3-540-41403-7. Volker Hamscher Uwe Schwiegelshohn Achim Streit Ramin YahyapourAchim Streit Ramin Yahyapour"Distributed Job Scheduling on Computational Grids using Multiple "Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests" Vijay Subramani, Rajkumar Kettimuthu, Simultaneous Requests" Vijay Subramani, Rajkumar Kettimuthu, Srividya Srinivasan, P. Sadayappan, Proceedings of 11th IEEE Srividya Srinivasan, P. Sadayappan, Proceedings of 11th IEEE Symposium on High Performance Distributed Computing (HPDC Symposium on High Performance Distributed Computing (HPDC 2002), July 20022002), July 2002

ReferencesReferences

Vadhiyar, S., Dongarra, J. and Yarkhan, A. “Vadhiyar, S., Dongarra, J. and Yarkhan, A. “GrADSolve - RPC for GrADSolve - RPC for High Performance Computing on the GridHigh Performance Computing on the Grid". ". Euro-Par 2003, 9th Euro-Par 2003, 9th International Euro-Par Conference, ProceedingsInternational Euro-Par Conference, Proceedings, Springer, LCNS , Springer, LCNS 2790, p. 394-403, August 26 -29, 2003.2790, p. 394-403, August 26 -29, 2003.Vadhiyar, S. and Dongarra, J. “Vadhiyar, S. and Dongarra, J. “Metascheduler for the GridMetascheduler for the Grid”. ”. Proceedings of theProceedings of the 11th IEEE International Symposium on High 11th IEEE International Symposium on High Performance Distributed ComputingPerformance Distributed Computing, pp 343-351, July 2002, , pp 343-351, July 2002, Edinburgh, Scotland.Edinburgh, Scotland.Vadhiyar, S. and Dongarra, J. “Vadhiyar, S. and Dongarra, J. “GrADSolve - A Grid-based RPC GrADSolve - A Grid-based RPC system for Parallel Computing with Application-level system for Parallel Computing with Application-level SchedulingScheduling". ". Journal of Parallel and Distributed ComputingJournal of Parallel and Distributed Computing, , Volume 64, pp. 774-783, 2004.Volume 64, pp. 774-783, 2004.Petitet, A., Blackford, S., Dongarra, J., Ellis, B., Fagg, G., Roche, K., Petitet, A., Blackford, S., Dongarra, J., Ellis, B., Fagg, G., Roche, K., Vadhiyar, S. "Numerical Libraries and The Grid: The Grads Vadhiyar, S. "Numerical Libraries and The Grid: The Grads Experiments with ScaLAPACK, " Experiments with ScaLAPACK, " Journal of High Performance Journal of High Performance Applications and SupercomputingApplications and Supercomputing, Vol. 15, number 4 (Winter 2001): , Vol. 15, number 4 (Winter 2001): 359-374. 359-374.

Metascheduler ComponentsMetascheduler Components

PermissionService

ContractNegotiator

DatabaseManager

Rescheduler

Applications

Metascheduler

Requests from applications for permission to execute on the Grid

Storing and retrieval of the states of the applications

Application level schedules from the applications

•Decisions based on resource capacities

•Can stop an executing resource consuming application

•Can accept or reject contracts

•Acts as queue manager

•Ensures scheduling based on correct information

•Improves performance contracts

•Minimizes impactRequest for Migration

Reschedules executing applications:

- To escape from heavy load

- To use free resources

Taxonomy of scheduling for Taxonomy of scheduling for distributed heterogeneous systems distributed heterogeneous systems

– Casavant and Kuhl (1988)– Casavant and Kuhl (1988)

TaxonomyTaxonomy

Local vs GlobalLocal vs Global Local – scheduling processes to time slices on a single Local – scheduling processes to time slices on a single

processorprocessor Global – deciding which processor should a job go toGlobal – deciding which processor should a job go to

Approximate vs heuristicApproximate vs heuristic Approximate – stop when you find a “good” solution. Uses same Approximate – stop when you find a “good” solution. Uses same

formal computational model . The ability to succeed depends on. formal computational model . The ability to succeed depends on. Availability of a function to evaluate a solutionAvailability of a function to evaluate a solutionThe time required to evaluate a solutionThe time required to evaluate a solutionThe ability to judge according to some metric valueThe ability to judge according to some metric valueMechanism to intelligently prune the solution spaceMechanism to intelligently prune the solution space

HeuristicsHeuristicsWorks on assumptions about the impact of “important” parametersWorks on assumptions about the impact of “important” parametersCannot quantize the assumption and the amount of impact all the Cannot quantize the assumption and the amount of impact all the timestimes

Also…Also…

Flat characteristicsFlat characteristics Adaptive vs. non-adaptiveAdaptive vs. non-adaptive Load balancingLoad balancing Bidding – e.g. CondorBidding – e.g. Condor Probabilistic – random searchesProbabilistic – random searches One time assignment vs. dynamic One time assignment vs. dynamic

reassignmentreassignment

Evaluation – Subramani et. al.Evaluation – Subramani et. al.

Results – Usefulness of K-Dual Results – Usefulness of K-Dual schemescheme

Grouping jobs submitted at lightly loaded sites and Grouping jobs submitted at lightly loaded sites and heavily loaded sitesheavily loaded sites

Experiments and ResultsExperiments and ResultsPractical ExperimentsPractical Experiments

5 applications were integrated into GrADS – 5 applications were integrated into GrADS – ScaLAPACK LU, QR, Eigen, PETSC CG and ScaLAPACK LU, QR, Eigen, PETSC CG and Heat equationHeat equationIntegration involved – developing Integration involved – developing performance models, instrumenting with SRSperformance models, instrumenting with SRS50 problems with different arrival rates50 problems with different arrival rates

- Poisson distribution with different mean arrival rates - Poisson distribution with different mean arrival rates for job submissionfor job submission

- uniform distributions for problem types, problem - uniform distributions for problem types, problem sizessizes

Different statistics were collectedDifferent statistics were collectedMetascheduler was enabled or disabledMetascheduler was enabled or disabled

Experiments and ResultsExperiments and ResultsPractical Experiments – Total Throughput ComparisonPractical Experiments – Total Throughput Comparison

Experiments and ResultsExperiments and ResultsPractical Experiments – Performance Contract ViolationsPractical Experiments – Performance Contract Violations

Measured Time/Expected Time

Maximum allowed Measured Time/Expected Time

Contract Violation: Measured/Expected > maximum allowed Measured/Expected

scheduling generic parallel applications – classification, meta- scheduling sathish vadhiyar...

Documents