scheduling generic parallel applications – classification, meta- scheduling sathish vadhiyar...

58
Scheduling Generic Scheduling Generic Parallel Parallel Applications – Applications – classification, classification, Meta-scheduling Meta-scheduling Sathish Vadhiyar Sathish Vadhiyar Sources/Credits/Taken Sources/Credits/Taken from: Papers listed in from: Papers listed in “References” slide “References” slide

Upload: hilary-hicks

Post on 20-Jan-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Scheduling Generic Parallel Scheduling Generic Parallel Applications – classification, Applications – classification,

Meta-schedulingMeta-scheduling

Sathish VadhiyarSathish Vadhiyar

Sources/Credits/Taken from: Sources/Credits/Taken from: Papers listed in “References” slidePapers listed in “References” slide

Page 2: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Scheduling ArchitecturesScheduling Architectures

Centralized schedulersCentralized schedulersSingle-site scheduling – a job does not span across sitesSingle-site scheduling – a job does not span across sitesMulti-site – the oppositeMulti-site – the opposite

Hierarchical structures - A central scheduler Hierarchical structures - A central scheduler (metascheduler) for global scheduling and local (metascheduler) for global scheduling and local scheduling on individual sitesscheduling on individual sites

Decentralized scheduling – distributed schedulers Decentralized scheduling – distributed schedulers interact, exchange information and submit jobs to interact, exchange information and submit jobs to remote systemsremote systems

Direct communication – local scheduler directly contacts Direct communication – local scheduler directly contacts remote schedulers and transfers some of its jobsremote schedulers and transfers some of its jobsCommunication via central job pool – jobs that cannot be Communication via central job pool – jobs that cannot be immediately executed are pushed to a central pool, immediately executed are pushed to a central pool, other local schedulers pull the jobs out of the poolother local schedulers pull the jobs out of the pool

Page 3: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Various Scheduling ArchitecturesVarious Scheduling Architectures

Page 4: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Various Scheduling ArchitecturesVarious Scheduling Architectures

Page 5: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Metascheduler across MPPsMetascheduler across MPPs

TypesTypes CentralizedCentralized

A meta scheduler and local dispatchersA meta scheduler and local dispatchersJobs submitted to meta schedulerJobs submitted to meta scheduler

HierarchicalHierarchicalCombination of central and local schedulersCombination of central and local schedulersJobs submitted to meta schedulerJobs submitted to meta schedulerMeta scheduler sends job to the site for which earliest Meta scheduler sends job to the site for which earliest start time is expectedstart time is expectedLocal schedulers can follow their own policiesLocal schedulers can follow their own policies

DistributedDistributedEach site has a metascheduler and a local schedulerEach site has a metascheduler and a local schedulerJobs submitted to local metaschedulerJobs submitted to local metaschedulerJobs can be transffered to sites with lowest loadJobs can be transffered to sites with lowest load

Page 6: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Evaluation of schemesEvaluation of schemesCentralized

Hierarchical

Distributed

1. Global knowledge of all resources – hence optimized schedules

2. Can act as a bottleneck for large number of resources and jobs

3. May take time to transfer jobs from meta scheduler to local schedulers – need strategic position of meta scheduler

1. Medium level overhead

2. Sub optimal schedules

3. Still need strategic position of central scheduler

1. No bottleneck – workload evenly distributed

2. Needs all-to-all connections between MPPs

Page 7: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Evaluation of Various Scheduling Evaluation of Various Scheduling ArchitecturesArchitectures

Experiments to evaluate slowdowns in the 3 Experiments to evaluate slowdowns in the 3 schemesschemesBased on actual trace from a supercomputer centre Based on actual trace from a supercomputer centre – 5000 job set– 5000 job set4 sites were simulated – 2 with the same load as 4 sites were simulated – 2 with the same load as trace, other 2 where run time was multiplied by 1.7trace, other 2 where run time was multiplied by 1.7FCFS with EASY backfilling was usedFCFS with EASY backfilling was usedslowdown = (wait_time + run_time) / run_timeslowdown = (wait_time + run_time) / run_time2 more schemes2 more schemes

Independent – when local schedulers acted independently, Independent – when local schedulers acted independently, i.e. sites are not connectedi.e. sites are not connected

United – resources of all processors are combined to form United – resources of all processors are combined to form a single sitea single site

Page 8: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

ResultsResults

Page 9: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

ObservationsObservations1. Centralized and hierarchical performed slightly better than uniteda. Compared to hierarchical, scheduling decisions have to be

made for all jobs and all resources in united – overhead and hence wait time is highb. Comparing united and centralized.

i. 4 categories of jobs corresponding to 4 different combinations of 2 parameters – execution time (short, long) and number of resources requested (narrow, wide)

ii. Usually larger number of long narrow jobs than short wide jobs

iii. Why is centralized and hierarchical better than united?2. Distributed performed poorly

a. Short narrow jobs incurred more slowdown

b. short narrow jobs are large in number and best candidates for back filling

c. Back filling dynamics are complex

d. A site with an average light may not always be the best choice. SN jobs may find earliest holes in a heavily loaded site.

Page 10: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Newly Proposed ModelsNewly Proposed Models

K-distributed modelK-distributed model Distributed scheme where local metascheduler Distributed scheme where local metascheduler

distributes jobs to k least loaded sitesdistributes jobs to k least loaded sites When job starts on a site, notification is sent to When job starts on a site, notification is sent to

the local metascheduler which in turn asks the the local metascheduler which in turn asks the k-1 schedulers to dequeuek-1 schedulers to dequeue

K-Dual queue modelK-Dual queue model 2 queues are maintained at each site – one for 2 queues are maintained at each site – one for

local jobs and other for remote jobslocal jobs and other for remote jobs Remote jobs are executed only when they Remote jobs are executed only when they

don’t affect the start times of the local jobsdon’t affect the start times of the local jobs Local jobs are given priority during backfillingLocal jobs are given priority during backfilling

Page 11: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Results – Benefits of new schemesResults – Benefits of new schemes

45% improvement 15% improvement

Page 12: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Results – Usefulness of K-Dual Results – Usefulness of K-Dual schemescheme

Grouping jobs submitted at lightly loaded sites and Grouping jobs submitted at lightly loaded sites and heavily loaded sitesheavily loaded sites

Page 13: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Meta scheduler with Meta scheduler with AppLeS Local SchedulersAppLeS Local Schedulers

Page 14: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

GoalsGoals

The aim was to overcome deficiencies of The aim was to overcome deficiencies of using plain AppLeS agentsusing plain AppLeS agents

Also to have global policiesAlso to have global policies Resolving different claims of applicationsResolving different claims of applications Improving the response times of individual Improving the response times of individual

apps.apps. Taking care of load dynamicsTaking care of load dynamics

Work done as part of GrADS projectWork done as part of GrADS project GrGrid id AApplication pplication DDevelopment evelopment SSoftwareoftware Collaboration between different universitiesCollaboration between different universities

Page 15: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

GridRoutine /

ApplicationManager

User

Initial GrADS ArchitectureInitial GrADS Architecture

ResourceSelector

PerformanceModeler

MDSNWS

Matrix size, block size

Resource characteristics,

Problem characteristics

Final schedule – subset of resources

Page 16: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Performance ModelerPerformance ModelerGrid

Routine /Application

Manager

PerformanceModeler

All resources,

Problem parameters

Final schedule – subset of resources

SchedulingHeuristic

SimulationModel

All resources, problem parameters

Final Schedule

Candidate resources Execution cost

The scheduling heuristic passed only those candidate schedules that had “sufficient” memory

This is determined by calling a function in simulation model

Page 17: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Simulation ModelSimulation Model

Simulation of the ScaLAPACK right Simulation of the ScaLAPACK right looking LU factorizationlooking LU factorization

More about the applicationMore about the application Iterative – each iteration corresponding Iterative – each iteration corresponding

to a blockto a block Parallel application in which columns are Parallel application in which columns are

block-cyclic distributedblock-cyclic distributed Right looking LU – based on Gaussian Right looking LU – based on Gaussian

eliminationelimination

Page 18: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Gaussian Elimination - ReviewGaussian Elimination - Review

for each column ifor each column i zero it out below the diagonal by zero it out below the diagonal by

adding multiples of row i to later adding multiples of row i to later rowsrows

for i= 1 to n-1for i= 1 to n-1 for each row j below row ifor each row j below row i for j = i+1 to nfor j = i+1 to n A(j, i) = A(j, i) / A(i, i)A(j, i) = A(j, i) / A(i, i) for k = i+1 to nfor k = i+1 to n A(j, k) = A(j, k) – A(j, i)* A(i, k)A(j, k) = A(j, k) – A(j, i)* A(i, k)

0

0

0

0

0

0

0

0

0

0

0

i

ii,i

X

X

x

j

k

A(i,i)

A(j,i) A(j,k)

A(i,k)

Finished multipliers

Finished multipliers

i

i

A(i+1:n, i)

A(i, i+1:n)

A(i+1:n, i+1:n)

Page 19: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Need for blocking - BLASNeed for blocking - BLAS

Basic Linear Basic Linear Algebra Algebra SubroutineSubroutine

Memory hierarchy Memory hierarchy efficiently efficiently exploited by exploited by higher level BLAS higher level BLAS

3 levels3 levels

BLASBLAS MemMemory ory Refs.Refs.

FlopFlopss

FlopsFlops//MemMemory ory refs.refs.

Level-1Level-1

(vector)(vector)

y=y+axy=y+ax

Z=y.xZ=y.x

3n3n 2n2n 2/32/3

Level-2Level-2

(Matrix-vector)(Matrix-vector)

y=y+Axy=y+Ax

A = A+(alpha) xyA = A+(alpha) xyTT

nn22 2n2n22 22

Level-3Level-3

(Matrix-Matrix)(Matrix-Matrix)

C=C+ABC=C+AB

4n4n22 2n2n33 n/2n/2

Page 20: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Converting BLAS2 to BLAS3Converting BLAS2 to BLAS3

Use blocking for optimized Use blocking for optimized matrix-matrix-multipliesmultiplies (BLAS3) (BLAS3)Matrix multiplies by Matrix multiplies by delayed delayed updatesupdatesSave several updates to trailing Save several updates to trailing matricesmatricesApply several updates in the form of Apply several updates in the form of matrix multiplymatrix multiply

Page 21: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Modified GE using BLAS3Modified GE using BLAS3Courtesy: Dr. Jack DongarraCourtesy: Dr. Jack Dongarra

for ib = 1 to n-1 step b /* process matrix b columns at a time */for ib = 1 to n-1 step b /* process matrix b columns at a time */ end = ib+b-1;end = ib+b-1; /* Apply BLAS 2 version of GE to get A(ib:n, ib:end) factored./* Apply BLAS 2 version of GE to get A(ib:n, ib:end) factored. Let LL denote the strictly lower triangular portion of A(ib:end, ib:end) */Let LL denote the strictly lower triangular portion of A(ib:end, ib:end) */ A(ib:end, end+1:n) = LLA(ib:end, end+1:n) = LL-1-1A(ib:end, end+1:n) /* update next b rows of U */A(ib:end, end+1:n) /* update next b rows of U */ A(end+1:n, end+1:n) = A(end+1:n, end+1:n) - A(ib:n, ib:end)* A(ib:end, A(end+1:n, end+1:n) = A(end+1:n, end+1:n) - A(ib:n, ib:end)* A(ib:end,

end+1:n) end+1:n) /* Apply delayed updates with single matrix multiply *//* Apply delayed updates with single matrix multiply */

bib end

ib

end

bCompleted part of L

Completed part of U A(ib:end, ib:end)

A(end+1:n, ib:end) A(end+1:n, end+1:n)

Page 22: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

OperationsOperations

So, the LU application in each So, the LU application in each iteration involves:iteration involves: Block factorization – (ib:n, ib:ib) floating Block factorization – (ib:n, ib:ib) floating

point operationspoint operations Broadcast for multiply – message size Broadcast for multiply – message size

equals approximately n*block_sizeequals approximately n*block_size Each process does its own multiply:Each process does its own multiply:

Remaining columns divided by number of Remaining columns divided by number of processorsprocessors

Page 23: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Back to the simulation modelBack to the simulation modeldouble getExecTimeCost(int matrix_size, int block_size, candidate_schedule){double getExecTimeCost(int matrix_size, int block_size, candidate_schedule){

for(i=0; i<number_of_blocks; i++){for(i=0; i<number_of_blocks; i++){ /* find the proc. Belonging to the column. Note its speed, its connections to other /* find the proc. Belonging to the column. Note its speed, its connections to other

procs. */procs. */ tfact += … /* simulate block factorization. Depends on {processor_speed, tfact += … /* simulate block factorization. Depends on {processor_speed,

machine_load, flop_count of factorization */machine_load, flop_count of factorization */

tbcast += max(bcast times for each proc.) /* scalapack follows split ring broadcast. tbcast += max(bcast times for each proc.) /* scalapack follows split ring broadcast. Simulate broadcast algorithm for each proc. Depends on {elements of matrix to be Simulate broadcast algorithm for each proc. Depends on {elements of matrix to be broadcast, connection bandwidth and latency */broadcast, connection bandwidth and latency */

tupdate += max(matrix multiplies across all proc.) /* depends on {flop count of matrix tupdate += max(matrix multiplies across all proc.) /* depends on {flop count of matrix multiply, processor speed, load} */multiply, processor speed, load} */

}}

return (tfact + tbcast + tupdate);return (tfact + tbcast + tupdate);

}}

Page 24: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

GridRoutine /

ApplicationManager

User

Initial GrADS ArchitectureInitial GrADS Architecture

ResourceSelector

PerformanceModeler

AppLauncher

ContractMonitor Application

MDSNWS

Matrix size, block size

Resource characteristics,

Problem characteristics

Problem, parameters, app. Location, final schedule

Page 25: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract Monitor ArchitectureContract Monitor Architecture

AutopilotManager

Application

Sensors

ContractMonitor

Fork

registrationObtain sensor information

Obtain information about variable x

Page 26: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Performance Model EvaluationPerformance Model Evaluation

Page 27: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

GrADS LimitationsGrADS Limitations

Hence a metascheduler that has global knowledge of all applications is needed

Page 28: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

MetaschedulerMetaschedulerTo ensure that applications are To ensure that applications are scheduled based on correct resource scheduled based on correct resource informationinformationTo accommodate as many new To accommodate as many new applications as possibleapplications as possibleTo improve the performance contract of To improve the performance contract of new applicationsnew applicationsTo minimize the impact of new To minimize the impact of new applications on executing applicationsapplications on executing applicationsTo employ policies to migrate executing To employ policies to migrate executing applicationsapplications

Page 29: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

GridRoutine /

ApplicationManager

User

Modified GrADS ArchitectureModified GrADS Architecture

ResourceSelector

PerformanceModeler

ContractDeveloper

AppLauncher

ContractMonitor

Application

MDS

NWSPermission

Service

RSS

ContractNegotiator

Rescheduler

DatabaseManager

Page 30: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Database ManagerDatabase Manager

A persistent service listening for requests from A persistent service listening for requests from the clientsthe clientsMaintains global clockMaintains global clockHas event notification capabilities – clients can Has event notification capabilities – clients can express their interests in various events.express their interests in various events.Stores various information:Stores various information:

Application’s statesApplication’s states Initial machinesInitial machines Resource informationResource information Final scheduleFinal schedule Location of various daemonsLocation of various daemons Average number of contract violationsAverage number of contract violations

Page 31: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Database Manager (Contd…)Database Manager (Contd…)

When an application stops or completes, When an application stops or completes, the database manager calculates the database manager calculates percentage completion time of the percentage completion time of the applicationapplication

time_diff : (current_time – time when the time_diff : (current_time – time when the application instance started)application instance started)

Avg_ratio: average of (actual costs / Avg_ratio: average of (actual costs / predicted costs)predicted costs)

Page 32: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Permission ServicePermission Service

After collecting resource information from After collecting resource information from NWS, the GrADS apps. contact PS.NWS, the GrADS apps. contact PS.PS makes decisions based on the problem PS makes decisions based on the problem requirements and resource characteristics requirements and resource characteristics makes decisionsmakes decisionsIf resources have enough capacity, then If resources have enough capacity, then permission is givenpermission is givenIf not, the permission serviceIf not, the permission service Waits for resource consuming applications to Waits for resource consuming applications to

end soonend soon Preempts resource consuming applications to Preempts resource consuming applications to

accommodate short applicationsaccommodate short applications

Page 33: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Permission Service (Pseudo code)Permission Service (Pseudo code)

Page 34: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Permission Service (Pseudo code)Permission Service (Pseudo code)

Page 35: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Permission Service – determining Permission Service – determining resource consuming applicationsresource consuming applicationsFor each currently executing GrADS app., For each currently executing GrADS app., contact DBM, obtain NWS resource contact DBM, obtain NWS resource information.information.

Determine change of resources caused by Determine change of resources caused by app. iapp. i

Add the change to current resource Add the change to current resource characteristics to obtain resource characteristics to obtain resource parameters in the absence of app. iparameters in the absence of app. i

Page 36: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Determining remaining execution Determining remaining execution timetime

Whenever a meta scheduler Whenever a meta scheduler component wants to determine component wants to determine remaining execution time of app., it remaining execution time of app., it contacts the contract monitor of app.contacts the contract monitor of app.Retrieves average of ratios between Retrieves average of ratios between actual times and predicted timesactual times and predicted timesUses {average, predicted time, Uses {average, predicted time, percentage completion time} to percentage completion time} to determine r.e.t.determine r.e.t.

Page 37: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Determing r.e.t (pseudo code)Determing r.e.t (pseudo code)

Page 38: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Determining r.e.t (pseudo code)Determining r.e.t (pseudo code)

Page 39: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract NegotiatorContract Negotiator

Main functionalitiesMain functionalities Ensure apps. made decisions based on updated resource Ensure apps. made decisions based on updated resource

informationinformation Improve the performance of current apps. by possibly stopping Improve the performance of current apps. by possibly stopping

and continuing executing big apps.and continuing executing big apps. Reduces the impact caused by current apps. on executing Reduces the impact caused by current apps. on executing

apps.apps.When contract is approved, the application starts using When contract is approved, the application starts using resourcesresourcesWhen contract is rejected, the application goes back to When contract is rejected, the application goes back to obtain new resource characteristics, and generates new obtain new resource characteristics, and generates new schedulescheduleEnforces ordering of the applications whose application-Enforces ordering of the applications whose application-level characteristics use the same resourceslevel characteristics use the same resources

approves the contract of one applicationsapproves the contract of one applications Waits for the application to start using resourcesWaits for the application to start using resources Rejects the contract of the otherRejects the contract of the other

Page 40: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Ensuring app. has made scheduling decision based on correct resource information

Page 41: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Improving the performance of the current app. by preempting an executing large app.

Page 42: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract Negotiator – 3 scenariosContract Negotiator – 3 scenariost1 – average completion time of current app. and big app. when big app. is preempted, current app. accommodated, big app. continued

t2 - average completion time of current app. and big app. when big app. is allowed to complete, then current app. is accommodated

t3 - average completion time of current app. and big app. when both applications are executed simultaneously

if (t1 < 25% of min(t2, t3) case 1

else if(t3 > 1.2t2) case 2

else case 3

Page 43: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Improving the performance of the current app. by preempting an executing large app.

Page 44: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Reducing the impact of the current app. on executing app. by modifying the schedule

Page 45: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Contract Negotiator (Pseudo code)Contract Negotiator (Pseudo code)Reducing the impact of the current app. on executing app. by modifying the schedule

Page 46: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Application and Metascheduler InteractionsApplication and Metascheduler Interactions

User

ResourceSelection

RequestingPermission

PermissionService

Permission?

Application SpecificScheduling

ContractDevelopment

ContractNegotiator

ContractApproved?

ApplicationLaunching

Problem parameters

Initial list of machines

PermissionNO

YES

Abort

Exit

Get new resource information

Application specific schedule

Get new resource information

NOYES

ApplicationCompletion?

Application Completed

Wait for restartsignal

Application was stopped

Problem parameters, final schedule Get new resource

information

Page 47: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Experiments and ResultsExperiments and ResultsDemonstration of Permission ServiceDemonstration of Permission Service

Page 48: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

ReferencesReferences

A taxonomy of scheduling in general-purpose distributed A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Transactions on Software computing systems. IEEE Transactions on Software Engineering. Engineering. Volume 14 ,  Issue 2  (February 1988) Pages: 141 - Volume 14 ,  Issue 2  (February 1988) Pages: 141 - 154   Year of Publication: 1988 154   Year of Publication: 1988 AuthorsAuthors T. L. Casavant J. G. Kuhl T. L. Casavant J. G. KuhlEvaluation of Job-Scheduling Strategies for Grid Evaluation of Job-Scheduling Strategies for Grid ComputingSourceLecture Notes In Computer Science. ComputingSourceLecture Notes In Computer Science. Proceedings of the First IEEE/ACM International Workshop on Proceedings of the First IEEE/ACM International Workshop on Grid Computing. Grid Computing. Pages: 191 - 202   Year of Publication: 2000 Pages: 191 - 202   Year of Publication: 2000 ISBN:3-540-41403-7. Volker Hamscher Uwe Schwiegelshohn ISBN:3-540-41403-7. Volker Hamscher Uwe Schwiegelshohn Achim Streit Ramin YahyapourAchim Streit Ramin Yahyapour"Distributed Job Scheduling on Computational Grids using Multiple "Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests" Vijay Subramani, Rajkumar Kettimuthu, Simultaneous Requests" Vijay Subramani, Rajkumar Kettimuthu, Srividya Srinivasan, P. Sadayappan, Proceedings of 11th IEEE Srividya Srinivasan, P. Sadayappan, Proceedings of 11th IEEE Symposium on High Performance Distributed Computing (HPDC Symposium on High Performance Distributed Computing (HPDC 2002), July 20022002), July 2002

Page 49: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

ReferencesReferences

Vadhiyar, S., Dongarra, J. and Yarkhan, A. “Vadhiyar, S., Dongarra, J. and Yarkhan, A. “GrADSolve - RPC for GrADSolve - RPC for High Performance Computing on the GridHigh Performance Computing on the Grid". ". Euro-Par 2003, 9th Euro-Par 2003, 9th International Euro-Par Conference, ProceedingsInternational Euro-Par Conference, Proceedings, Springer, LCNS , Springer, LCNS 2790, p. 394-403, August 26 -29, 2003.2790, p. 394-403, August 26 -29, 2003.Vadhiyar, S. and Dongarra, J. “Vadhiyar, S. and Dongarra, J. “Metascheduler for the GridMetascheduler for the Grid”. ”. Proceedings of theProceedings of the 11th IEEE International Symposium on High 11th IEEE International Symposium on High Performance Distributed ComputingPerformance Distributed Computing, pp 343-351, July 2002, , pp 343-351, July 2002, Edinburgh, Scotland.Edinburgh, Scotland.Vadhiyar, S. and Dongarra, J. “Vadhiyar, S. and Dongarra, J. “GrADSolve - A Grid-based RPC GrADSolve - A Grid-based RPC system for Parallel Computing with Application-level system for Parallel Computing with Application-level SchedulingScheduling". ". Journal of Parallel and Distributed ComputingJournal of Parallel and Distributed Computing, , Volume 64, pp. 774-783, 2004.Volume 64, pp. 774-783, 2004.Petitet, A., Blackford, S., Dongarra, J., Ellis, B., Fagg, G., Roche, K., Petitet, A., Blackford, S., Dongarra, J., Ellis, B., Fagg, G., Roche, K., Vadhiyar, S. "Numerical Libraries and The Grid: The Grads Vadhiyar, S. "Numerical Libraries and The Grid: The Grads Experiments with ScaLAPACK, " Experiments with ScaLAPACK, " Journal of High Performance Journal of High Performance Applications and SupercomputingApplications and Supercomputing, Vol. 15, number 4 (Winter 2001): , Vol. 15, number 4 (Winter 2001): 359-374. 359-374.

Page 50: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Metascheduler ComponentsMetascheduler Components

PermissionService

ContractNegotiator

DatabaseManager

Rescheduler

Applications

Metascheduler

Requests from applications for permission to execute on the Grid

Storing and retrieval of the states of the applications

Application level schedules from the applications

•Decisions based on resource capacities

•Can stop an executing resource consuming application

•Can accept or reject contracts

•Acts as queue manager

•Ensures scheduling based on correct information

•Improves performance contracts

•Minimizes impactRequest for Migration

Reschedules executing applications:

- To escape from heavy load

- To use free resources

Page 51: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Taxonomy of scheduling for Taxonomy of scheduling for distributed heterogeneous systems distributed heterogeneous systems

– Casavant and Kuhl (1988)– Casavant and Kuhl (1988)

Page 52: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

TaxonomyTaxonomy

Local vs GlobalLocal vs Global Local – scheduling processes to time slices on a single Local – scheduling processes to time slices on a single

processorprocessor Global – deciding which processor should a job go toGlobal – deciding which processor should a job go to

Approximate vs heuristicApproximate vs heuristic Approximate – stop when you find a “good” solution. Uses same Approximate – stop when you find a “good” solution. Uses same

formal computational model . The ability to succeed depends on. formal computational model . The ability to succeed depends on. Availability of a function to evaluate a solutionAvailability of a function to evaluate a solutionThe time required to evaluate a solutionThe time required to evaluate a solutionThe ability to judge according to some metric valueThe ability to judge according to some metric valueMechanism to intelligently prune the solution spaceMechanism to intelligently prune the solution space

HeuristicsHeuristicsWorks on assumptions about the impact of “important” parametersWorks on assumptions about the impact of “important” parametersCannot quantize the assumption and the amount of impact all the Cannot quantize the assumption and the amount of impact all the timestimes

Page 53: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Also…Also…

Flat characteristicsFlat characteristics Adaptive vs. non-adaptiveAdaptive vs. non-adaptive Load balancingLoad balancing Bidding – e.g. CondorBidding – e.g. Condor Probabilistic – random searchesProbabilistic – random searches One time assignment vs. dynamic One time assignment vs. dynamic

reassignmentreassignment

Page 54: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Evaluation – Subramani et. al.Evaluation – Subramani et. al.

Page 55: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Results – Usefulness of K-Dual Results – Usefulness of K-Dual schemescheme

Grouping jobs submitted at lightly loaded sites and Grouping jobs submitted at lightly loaded sites and heavily loaded sitesheavily loaded sites

Page 56: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Experiments and ResultsExperiments and ResultsPractical ExperimentsPractical Experiments

5 applications were integrated into GrADS – 5 applications were integrated into GrADS – ScaLAPACK LU, QR, Eigen, PETSC CG and ScaLAPACK LU, QR, Eigen, PETSC CG and Heat equationHeat equationIntegration involved – developing Integration involved – developing performance models, instrumenting with SRSperformance models, instrumenting with SRS50 problems with different arrival rates50 problems with different arrival rates

- Poisson distribution with different mean arrival rates - Poisson distribution with different mean arrival rates for job submissionfor job submission

- uniform distributions for problem types, problem - uniform distributions for problem types, problem sizessizes

Different statistics were collectedDifferent statistics were collectedMetascheduler was enabled or disabledMetascheduler was enabled or disabled

Page 57: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Experiments and ResultsExperiments and ResultsPractical Experiments – Total Throughput ComparisonPractical Experiments – Total Throughput Comparison

Page 58: Scheduling Generic Parallel Applications – classification, Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References”

Experiments and ResultsExperiments and ResultsPractical Experiments – Performance Contract ViolationsPractical Experiments – Performance Contract Violations

Measured Time/Expected Time

Maximum allowed Measured Time/Expected Time

Contract Violation: Measured/Expected > maximum allowed Measured/Expected