an evaluation of a framework for the dynamic load balancing of highly adaptive and irregular...

An Evaluation of a Framework for An Evaluation of a Framework for the Dynamic Load Balancing of the Dynamic Load Balancing of Highly Adaptive and Irregular Highly Adaptive and Irregular

Parallel ApplicationsParallel ApplicationsKevin J. Barker, Nikos P. Chrisochoides Kevin J. Barker, Nikos P. Chrisochoides

Proceedings of the ACM/IEEE SC2003 ConfeProceedings of the ACM/IEEE SC2003 Conferencerence

2003 ACM2003 ACMPresented by Presented by 張肇烜張肇烜

OutlineOutline

IntroductionIntroductionLoad Balancing State-of-the-ArtLoad Balancing State-of-the-ArtRepresentative Load Balancing SystemsRepresentative Load Balancing SystemsPREMAPREMAPerformance EvaluationPerformance EvaluationConclusionsConclusions

IntroductionIntroduction

Asynchronous and highly adaptive Asynchronous and highly adaptive applications are defined by several applications are defined by several characteristics.characteristics.No global synchronization points are inherent to No global synchronization points are inherent to

the application.the application.The computational weights associated with The computational weights associated with

individual work units may vary drastically individual work units may vary drastically throughout the execution of the application.throughout the execution of the application.

The computation progresses is impossible to The computation progresses is impossible to predict.predict.

Introduction Introduction (cont.)(cont.)

Existing load balancing methods found in Existing load balancing methods found in then literature and in publicly available then literature and in publicly available software are not suitable for asynchronous software are not suitable for asynchronous and highly adaptive applications for the and highly adaptive applications for the following three reasons:following three reasons:Large penalty for global synchronization.Large penalty for global synchronization.Difficulty in predicting future work loads.Difficulty in predicting future work loads.Heavy workloads may delay message Heavy workloads may delay message

processing.processing.

Load Balancing State-of-the-ArtLoad Balancing State-of-the-Art

This can be done by dividing the load This can be done by dividing the load balancing process into its three primary balancing process into its three primary step:step: Information gathering and dissemination.Information gathering and dissemination.Decision making.Decision making.Data or computation migration.Data or computation migration.

Load Balancing State-of-the-Art Load Balancing State-of-the-Art (cont.)(cont.)


(Loosely) Synchronous vs. Asynchronous(Loosely) Synchronous vs. AsynchronousSynchronous load balancing methods and Synchronous load balancing methods and

tools must gather load information from all tools must gather load information from all processors in order to reconstruct the global processors in order to reconstruct the global system state.system state.

Asynchronous methods require Asynchronous methods require communication with only a small fixed-size communication with only a small fixed-size ‘neighborhood’ of processors.‘neighborhood’ of processors.


Programmer-supplied Hints vs. Runtime Programmer-supplied Hints vs. Runtime Instrumentation.Instrumentation.First method for doing this is for the First method for doing this is for the

programmer to provide hints about the weight programmer to provide hints about the weight of pending computation.of pending computation.

Second method is to make the assumption Second method is to make the assumption that future performance will be related to what that future performance will be related to what has been seen in the past.has been seen in the past.


Explicitly Initiated Load Balancing vs. Explicitly Initiated Load Balancing vs. Preemptive Load Balancing.Preemptive Load Balancing.Explicit load balancing has the advantage that Explicit load balancing has the advantage that

well-tuned application routines will not be well-tuned application routines will not be interrupted.interrupted.

Implicit load balancing will periodically check Implicit load balancing will periodically check for pending balancer messages.for pending balancer messages.

Representative Load Balancing Representative Load Balancing SystemsSystems

ParMETISParMETISParMETIS is an MPI-based parallel library thaParMETIS is an MPI-based parallel library tha

t implements a variety of algorithms for partitiot implements a variety of algorithms for partitioning unstructured graphs.ning unstructured graphs.

This type of explicit repartitioning suffers from This type of explicit repartitioning suffers from the global synchronization and inaccurate worthe global synchronization and inaccurate workload prediction problems.kload prediction problems.

Representative Load Balancing Representative Load Balancing Systems Systems (cont.)(cont.)

Charm++Charm++Charm++ is a parallel object-oriented Charm++ is a parallel object-oriented

programming language based on C++.programming language based on C++.Programs written in Charm++ are Programs written in Charm++ are

decomposed into a number of cooperating decomposed into a number of cooperating message-driven objects called message-driven objects called chareschares . .

The load balancing methods are implemented The load balancing methods are implemented using a global barrier.using a global barrier.

Load balancing is achieved by mapping and Load balancing is achieved by mapping and re-mapping chares to available processors.re-mapping chares to available processors.

PREMAPREMA

PREMA is a runtime library based on a PREMA is a runtime library based on a design philosophy which includes:design philosophy which includes:Single-sided communication.Single-sided communication.A global namespace.A global namespace.A framework which allows implementation of A framework which allows implementation of

customized dynamic load balancing customized dynamic load balancing algorithms.algorithms.

A suite of commonly used dynamic load A suite of commonly used dynamic load balancing strategies.balancing strategies.

PREMA PREMA (cont.)(cont.)

First decomposed into some number of suFirst decomposed into some number of subdomains.bdomains.

Each subdomain is then registered with thEach subdomain is then registered with the PREMA system as a mobile object and ae PREMA system as a mobile object and assigned a unique mobile pointer.ssigned a unique mobile pointer.

The PERMA library allows load balancing tThe PERMA library allows load balancing to be initiated either explicitly or implicitly.o be initiated either explicitly or implicitly.


Explicit Load BalancingExplicit Load BalancingExplicit load balancing requires the applicatioExplicit load balancing requires the applicatio

n program to explicity hand control to the load n program to explicity hand control to the load balancing algoritm.balancing algoritm.

This is done with the polling operation.This is done with the polling operation.The delay often suffered by load balancing infThe delay often suffered by load balancing inf

ormation and request messages.ormation and request messages.


Implicit Load BalancingImplicit Load BalancingLoad balancing messages that are processed Load balancing messages that are processed

preemptively in no way affect the execution of preemptively in no way affect the execution of the application.the application.

Load balancing messages can be guaranteed Load balancing messages can be guaranteed to be received in a timely manner.to be received in a timely manner.

The number of wasted processor cycles is The number of wasted processor cycles is minimized.minimized.

Performance EvaluationPerformance Evaluation

The benchmark program allows us to The benchmark program allows us to compare the performance of the three load compare the performance of the three load balancers.balancers.

Command-line parameters are parsed to Command-line parameters are parsed to determine the number of work units.determine the number of work units.

The work units are created and distributed The work units are created and distributed to the available processors.to the available processors.

Computation is assigned to each work unit.Computation is assigned to each work unit.

Performance Evaluation Performance Evaluation (cont.)(cont.)

Control is handed to the runtime system Control is handed to the runtime system and the load balancer.and the load balancer.

There is no communication between work There is no communication between work units, and work units are able to execute in units, and work units are able to execute in any order.any order.

We vary two parameters: the initial We vary two parameters: the initial imbalance percentage and the difference imbalance percentage and the difference in computational weights.in computational weights.

Performance Evaluation Performance Evaluation (cont.)(cont.)

ConclusionsConclusions

We have presented a runtime software system We have presented a runtime software system for implementing asynchronous and highly for implementing asynchronous and highly adaptive and irregular applications on adaptive and irregular applications on distributed memory platforms.distributed memory platforms.

Our approach is effective in terms of minimizing Our approach is effective in terms of minimizing idle cycles due to work load imbalances and idle cycles due to work load imbalances and efficient in terms of the overhead introduced efficient in terms of the overhead introduced during work load balancing for asynchronous during work load balancing for asynchronous and highly adaptive applications.and highly adaptive applications.

an evaluation of a framework for the dynamic load balancing of highly adaptive and irregular...

Documents