a survey of techniques for mapping and scheduling ......noc design methodologies share many design...

A Survey of Techniques for Mapping and Scheduling Applications to

Network on Chip Systems

Ruxandra Pop and Shashi Kumar

ISSN 1404 – 0018 Research Report 04:4

A Survey of Techniques for Mapping and Scheduling Applications to

Network on Chip Systems

Ruxandra Pop and Shashi Kumar

Embedded Systems Group Department of Electronics and Computer Engineering

School of Engineering, Jönköping University Jönköping, SWEDEN

ISSN 1404 – 0018 Research Report 04:4

School of Engineering Mapping and Scheduling to NoC Jönköping University

Abstract

Network on Chip (NoC) architectures with regular topology provide scalable platforms for designing Systems on Chip (SoC) with large number of cores. Developing products and applications using a NoC architecture offers many challenges and opportunities. Many tools will be required to develop a NoC architecture for a specific application. A tool which can map an application or a set of applications to a given NoC architecture will be essential. In this report we survey the techniques for mapping and scheduling concurrent applications to NoC platforms. NoC platforms are basically a special class of Multiprocessor Embedded Systems (MPES). Conventional MPES architectures are mostly bus-based systems which have problems of scalability and reusability. There has been a lot of research on MPES development including research on mapping and scheduling applications on them. NoC researchers can easily adapt many of the results, techniques and ideas from MPES for NoC tool development. Therefore, we also include a survey of techniques for mapping and scheduling applications to bus-based MPES.

The parameter which affects the performance of an application running on a NoC platform is the communication delay for moving data from one IP core to another. This delay is affected by, besides other factors, the relative positions of IP cores on the NoC platform. This step in NoC development is generally referred as Network Assignment. We first classify the mapping and scheduling approaches for NoC available in literature according to whether or not the approach combine mapping and scheduling of computations with communication mapping and scheduling. Similarly we classify mapping and scheduling approaches for bus-based MPES into three categories according to their focus on energy minimization, handling soft real time constraints or memory awareness.

NoC and MPES share the same design goals and constraints, which mainly refer to reducing energy consumption, verifying real time constraints and minimizing on-chip memory. Off-line mapping and scheduling is usually recommended for embedded systems. Therefore all NoC and bus-based MPES approaches in this paper with a single exception are performed off-line. Non-pre-emptive scheduling with dynamic or static priorities is applied in the surveyed methodologies. All but one Voltage Selection approaches surveyed apply Dynamic Voltage Selection.


Mapping and scheduling are known to be computationally hard problems. A large range of exact and approximate optimization algorithms have been proposed by different groups for solving these problems. The methods include Branch-and–Bound (BB), constructive and transformative heuristics such as List Scheduling (LS), Genetic Algorithms (GA) and various types of Mathematical Programming algorithms.

We have also carried out a comparison between presented approaches based on the achievement of their design goals. Communication issue is highlighted for NoC approaches verifying hard deadlines, while Voltage Selection is stressed for approaches minimizing energy consumption. The comparison lists the strong points and weak points of various approaches and identifies the best approach for a particular design objective. The survey points to many problems for future research. There is a scope for improving the model for estimation of communication delay in every approach. Our study shows that no research group has considered dynamic mapping and scheduling techniques for NoC platforms. Also no research group has yet reported techniques for mapping and scheduling for a NoC platform with topology other than a two-dimensional mesh topology.

Keywords Network on Chip, Multiprocessor Embedded Systems, Platform Based Design, Mapping, Scheduling, Network Assignment, Dynamic Voltage Selection, Power Management, Task Graph, Heuristics, Genetic Algorithms, List Scheduling, Mathematical Programming, Hard Deadline, Soft Deadline, Low Power.


Table of Contents

1 INTRODUCTION .........................................................................................1

1.1 NoC Design Flow ................................................................................................ 1

1.2 NoC Design Goals ............................................................................................... 2

1.3 Tools for NoC Design ......................................................................................... 3

1.4 About this report ................................................................................................ 4

2 MAPPING AND SCHEDULING RELATED ISSUES...................................4

2.1 Architectural Model and Application Model................................................... 4

2.2 Issues and Objective ........................................................................................... 7 2.2.1 Constraints and Objective Functions............................................................ 7 2.2.2 Separated or Integrated................................................................................. 7 2.2.3 Static vs. Dynamic Mapping and Scheduling............................................... 7 2.2.4 Mapping and Scheduling Algorithms........................................................... 8

3 SURVEYED APPROACHES.......................................................................9

3.1 Approaches for NoC........................................................................................... 9 3.1.1 Approaches without Network Assignment................................................. 10 3.1.2 Approaches with Network Assignment...................................................... 12 3.1.3 Summary..................................................................................................... 14

3.2 Approaches for Multiprocessor Embedded Systems .................................... 15 3.2.1 Energy-aware Approaches.......................................................................... 15 3.2.2 Approaches for Soft Real Time Constraints............................................... 17 3.2.3 Memory-aware Approaches....................................................................... 18 3.2.4 Summary..................................................................................................... 19

4 COMPARISONS OF MAPPING AND SCHEDULING TECHNIQUES ......20

4.1 Verifying Real Time Constraints .................................................................... 20 4.1.1 Hard Real Time Constraints in NoC........................................................... 20 4.1.2 Hard Real Time Constraints in Multiprocessor Embedded Systems......... 24 4.1.3 Soft Real Time Constraints in Multiprocessor Embedded Systems........... 26 4.1.4 Conclusions................................................................................................ 27


4.2 Minimizing Energy Consumption................................................................... 27 4.2.1 Energy-Aware Task Mapping .................................................................... 27 4.2.2 Voltage Selection........................................................................................ 30 4.2.3 Experimental Results.................................................................................. 33 4.2.4 Conclusions................................................................................................ 34

4.3 Verifying Area and Memory Constraints ...................................................... 34 4.3.1 Comparison................................................................................................. 35

5 CONCLUSIONS ........................................................................................36


1

1 Introduction

Network on Chip (NoC) is a new design paradigm and on-chip architecture which aims to overcome the design problems and performance limitations of current bus-based Systems on Chip (SoC) methodologies [1]. A NoC based system uses a network of packet switched routers with regular topology and a layered protocol for on-chip communication among IP cores. The current SoC design methodologies use a variety of bus types with different protocols for sharing of communication resources. An IP core is connected to the communication backbone by using a network interface in NoC, or by using an adapter in a bus-based SoC. NoC and SoC are basically Multiprocessor Embedded Systems (MPES). MPES use bus-based communication so they have many similarities with SoC.

NoC provides increased parallelism due to theoretically infinite scalability of its regular network topology. The increased computational power and increased internal communication bandwidth of NoC can provide better timing performances to the embedded applications, than the heterogeneous and shared communication medium in current SoC architectures. Therefore, NoC based systems are especially suitable for implementing data-intensive applications with high computational requirements.

1.1 NoC Design Flow NoC design methodology is a platform-based design, relying on IP core reuse and specialization of communication structure. Figure 1 shows the steps required for an application specific NoC design. The starting point is a generic NoC description which specifies the general features of the architecture such as topology, constraints on sizes of IP cores, communication principles, etc. The next design step is to specialize the generic architecture for an application or a class of applications. This specialization implies deciding the size of NoC, selecting the IP cores and finalizing the design of switches (routing algorithm) and protocol format. The facile plug-and-play of IP cores in to the slots of the architecture and infinite network scalability, allow fast refining of NoC architecture for a certain application and design goals by heavily reusing previous designs for similar applications. This is quite different for SoC were IP cores reuse raises problems even within the same design and scalability is limited to a maximum number of buses and IP cores. The specialized NoC, with or without placed IP cores in to NoC tiles, is provided after NoC Specialization step. For the case when an Unplaced NoC is provided, the NoC tile allocation to IP cores is delayed until Network Assignment step.

Mapping and Scheduling steps follow NoC Specialization and their role is to implement the given application in to the selected architecture which mainly means to assign and order the tasks and communications of the application in to the resources of the architecture such that the design goals to be optimized. Communication Mapping (CM) and Communication Scheduling (CS) raises more problems in NoC than in SoC because a minimal routing path must be allocated for each NoC communication, the exclusive usage of communication environment must be ensured without an arbiter and


2

deadlock and congestion must be prevented. Moreover, in large NoC the communication distance has a big impact on communication delay.

Due to on-chip distance issues in NoC, tile allocation to IP cores is better postponed after task mapping instead of finalizing it during IP core selection. Tile Allocation (TA) together with Routing Path Allocation (RPA) (communication mapping) is known in the literature as Network Assignment (NA). This is usually performed after Task Mapping (TM) step and targets on-chip distance minimization.

Figure 1 – NoC Design Flow

Design space exploration steps are iterated separately or simultaneously until a practical optimum is reached. After NoC Specialization and Mapping steps Estimation is performed to foresee the feasibility and quality of the final solution using average or statistical measurements. After Scheduling, feasibility and quality of Mapped and Scheduled Model are checked via simulations or formal analysis, such that the unfeasible solutions to be replaced or new architectures to be tried.

1.2 NoC Design Goals NoC design methodologies share many design goals with SoC design methodologies, namely, reducing energy consumption, minimizing the chip area and maximizing the timing performance. Energy saving is very important, especially in the design of portable embedded systems, where extended and correct functionality depends on the battery life time and circuit heating. Chip area is related to switch design and on-chip memories, whose layouts can occupy up to 80% from total area. Reducing functional complexity of switches (routing algorithm) and maximizing the utilization of memories can reduce chip area. Timing constraints refers to hard and soft deadlines, whose misses could lead to failure or to quality degradation of results. These are contradictory goals


3

because minimizing power consumption implies to slow down the computations and thus affecting the system performance.

A good trade-off between design goals while embedding an application in to a selected architecture is the main issue of NoC design. Task Mapping and NoC Specialization can be targeted to any design goal, such as energy consumption, chip area and timing performances or to a combination of them. Thus, during NoC Specialization, certain number and types of IP cores are selected depending on their cost and performances, such that the design goals of embedded application to be met and chip area minimized. During Task Mapping the most energy consumption tasks can be assigned to the least energy consumption IP cores, such that the overall energy is minimized. Mapping is responsible for exploiting the concurrency of application depending on the parallelism of the architecture, this being correlated with timing performances and chip area goals and constraints. Scheduling aims to verify timing constraints, but a Voltage Scheduling can also be performed to reduce energy consumption in the system, by executing at lower voltage levels and operational frequencies some of the tasks assigned to voltage scalable resources.

1.3 Tools for NoC Design It is obvious from the NoC design flow, that a large number of tools will be required to aid a designer of an application specific NoC system. The most important tools are:

• IP Cores selection for specific computations • IP Cores evaluation tools • NoC architectural simulators • NoC power consumption estimators • Application mapping and scheduling tool Many of the existing tools for hardware and software synthesis also need to be integrated in order to implement an application specific tool. We feel that a tool which can help designer to map applications on multiple computing resources will be a key to the success of NoC paradigm. For this, we can learn a lot from the experience of researchers in the area of multi-processor systems.

Mapping and scheduling are among the most difficult tasks in any design context. Good mapping and scheduling algorithms are crucial to get maximum performance for an application on a given computing platform. NoC systems are basically loosely coupled heterogeneous multi-processor systems using packet switched network for inter-processor communication. One of the important differences between NoC and a general multiprocessor system is that a NoC may have some cores which are not suitable for general purpose processing but are dedicated to specialized functions. This important difference must be kept in mind while developing algorithms and tools for NoC based design.


4

1.4 About this report The aim of this report is to study and compare various techniques for mapping and scheduling of applications on to NoC architectures. Due to the novelty of NoC concept, only few techniques for mapping and scheduling have been proposed and therefore we will extend our search to bus-based MPES.

The report is organized in five sections. This section was introducing the NoC design flow and design goals. The next section defines mapping and scheduling problem for NoC by giving an example. The third section surveys the approaches for NoC and bus-based MPES, describing their methodologies, design goals and implementation algorithms. The forth section drives a comparison between presented approaches based on the achievement of their design goals. At this point the communication issue is highlighted for NoC approaches verifying hard deadlines, while Voltage Selection is stressed for approaches minimizing energy consumption. In the end few conclusions are drawn.

2 Mapping and Scheduling Related Issues

Mapping and scheduling are steps of NoC design flow which deal with the implementation of the application on a specialized architecture. The inputs to the mapping and scheduling problem are:

• Model of application(s) • Model of target architecture • Performance and power constraints • Objective functions to be optimized The expected output of this step is a partitioning of the application(s) among computing resources on the platform and a schedule for execution of various computational tasks on these resources.

2.1 Architectural Model and Application Model The architecture model is generally specified as a directed graph with two types of nodes representing the processing elements (PE) and switches which are interconnected by edges representing the communication links (CL) of the platform. Since NoC is basically a heterogeneous multiprocessor system, PE could be general purpose or special purpose processors, ASIC, FPGA or memories of various types. CL could be point-to-point connections or buses. PE and CL could be state-of-art voltage scalable components which have capability to dynamically switch among available supply and threshold voltage levels. Power management techniques could be employed to shut down totally or partially unutilized or idle PE or CL. The following parameters enclose the representation of the platform architecture: number, type and position of PE, speed (frequency) and voltage levels of PE and CL, memory size and area constraints of PE, interconnection topology of PE.


5

The application is, generally, given as a task graph (TG) with nodes representing tasks and edges representing communications. Several concurrent applications could be represented as a set of TG which could be executed in parallel on the computing platform. There are few parameters which enclose the representation: tasks execution times (ET) on each PE, hard and soft deadlines of tasks, TG period, communication volume of each edge, energy consumption per processing cycle at nominal supply voltage, energy consumption per communication unit (bit/byte) with ignoring communication distance, memory requirements of tasks.

Figure 2 – Illustration of Mapping and Scheduling Problem

Figure 2 shows an example of mapping and scheduling problem for 2D mesh topology NoC architecture with 3×3 resources. In the example, there are two concurrent applications represented by two TG. Mapping is defined as the assignment of tasks (communications) to processing elements (communication routes). Mapping for NoC could also include the assignment of IP cores to NoC tiles, which together with routing path allocation (communication mapping) is referred in literature as Network Assignment. Network Assignment is usually performed after Task Mapping and aims to reduce on-chip inter-communication distance.

Scheduling is the time ordering of tasks and communications on their assigned resources, which assures the mutual exclusion between executions of tasks on the same resource. Figure 3 shows a feasible scheduling of for the example of Figure 2. Gantt diagrams are often used for graphic representation of mapping and scheduling output.


6

Figure 3 – Scheduling of TGs on NoC platform

The output is the mapped and scheduled model which specifies the starting times and durations of tasks execution on the assigned resources. Timing constraints can be easily verified by comparing schedule length with the given deadlines. Gantt diagram could also include communication mapping and scheduling in a similar manner. The only difference is that NoC edges appear on several communication links which belong to their routing paths.

Scheduling for on-chip systems could be followed by Voltage Scheduling and Power Management techniques to minimize energy consumption.

Voltage Scheduling implies Voltage Selection (VS) and aims to minimize dynamic energy consumption of tasks or communications mapped on to voltage scalable resources. Voltage Selection assigns lower supply voltages to some tasks mapped on voltage scalable resources by slowing them down with exploiting the available slack after time scheduling. Single or multi-voltage levels from continuous or discrete domain can be assigned to individual tasks or to voltage scalable resources. Voltage switching overhead is not all the time negligible and therefore Voltage Scheduling is applied. Voltage Scheduling performs and ordering of the assigned voltage levels, in order to reduce the number of voltage levels switching.

Adaptive Body Biasing (ABB) is a special case of Voltage Scheduling which minimizes static energy consumption by slowing down some tasks and executing them at higher threshold voltage levels (leakage power).

Power Management (PM) techniques aim to reduce static power consumption by shutting down totally or partially unutilized or idle resources. Power Management could be applied to both processing elements and communication links.


7

2.2 Issues and Objective

2.2.1 Constraints and Objective Functions

A feasible mapping and scheduling is one which meets the design constraints such as timing constraints, data precedence constraints, memory size constraints etc. A quality mapping and scheduling is one which, besides meeting the design constraints, also optimizes the design goals such as minimizing energy consumption, maximizing timing performances or balancing memory utilization. Mapping and scheduling must, sometime, make a trade-off between contradictory design goals and constraints such as reducing energy consumption and verifying hard real time constraints, minimizing communication volume and maximizing computation energy savings, maximizing memory utilization and verifying hard real-time constraints, etc.

2.2.2 Separated or Integrated

The mapping and scheduling problems could be handled as separate independent problems or these could be handled as one integrated problem. Optimal solution of the integrated problem can provide the best results. But since both mapping and scheduling are computationally hard problems, solving them together is more difficult than solving them one by one. In this report the integrated problem is referred with simultaneous mapping and scheduling terminology.

2.2.3 Static vs. Dynamic Mapping and Scheduling

Mapping and scheduling can be performed on-line or off-line. Off-line or static mapping and scheduling are performed before application run-time. A table recording starting times for execution of tasks on the assigned resources as well as voltage levels and their switching times for tasks assigned to voltage scalable resources are provided before execution. Since static mapping and scheduling are executed once, at compile time, they do not influence application performances at run-time.

On-line or dynamic mapping and scheduling implies the assignment and ordering of tasks during the execution of the application. This should lead to a better solution, but computational overhead of mapping and scheduling algorithms could increase the delay and energy consumption of application at run-time. Moreover dynamic mapping and scheduling is difficult to test.

As a consequence, static mapping and scheduling is preferred for embedded systems and is especially recommended for NoC where communication routing overhead could impose significant delays if performed at run-time. This survey includes mostly static mapping and scheduling algorithms for NoC and bus-based MPES. Only one quasi-static solution for scheduling on bus-based MPES was proposed within the various research groups. This builds off-line a tree of schedules and chooses on-line the best scheduling path. Therefore a good quality on-line mapping and scheduling algorithms for NoC are still to be found.


8

Static and dynamic terms associated with Voltage Selection have different significations than off-line and on-line respectively. Static Voltage Selection (SVS) refers to the allocation of single or multi-voltage levels to a certain voltage scalable resource, regardless of the utilization of that resource during application run-time. SVS is usually performed off-line or when time scheduling of tasks on the voltage scalable resource is omitted. Switching overhead can be ignored in SVS.

Dynamic Voltage Scheduling (DVS) refers to the allocation of single or multi-voltage levels for tasks running on voltage scalable resources. Multi-voltage levels are usually discrete values and the time and energy overhead for switching between voltage levels is not all the time negligible. DVS could be performed off-line or on-line and is always applied after time scheduling of tasks on the voltage scalable resources.

The same observation related to static and dynamic terms is valid for Power Management. This means that the decision to switch off unutilized resources can be taken off-line or on-line, but the manner of applying PM to architectural resources or to application tasks gives the static or dynamic terminology to PM.

Scheduling algorithms can be pre-emptive or non-pre-emptive using dynamic or static priorities.

Task priorities give the selection order for unscheduled tasks and can be assigned randomly or correlated to one or several design goals. Task priorities can be determined statically or dynamically depending on the availability of information for computing their values. Dynamic priorities correspond to algorithms with alternative goals or with priorities values depending on the partial scheduling. In both cases task priorities must be updated after each task scheduling. Static priorities correspond to random assignment or to values independent of the partial scheduling. Static priorities are assigned once at the beginning of scheduling and do not require later updating. In pre-emptive scheduling, tasks with lower priorities can be suspended by tasks with higher priorities. In non-pre-emptive scheduling tasks are executed without interruption until their completion. Pre-emptive scheduling is usually associated with on-line scheduling, while non-pre-emptive scheduling corresponds to off-line scheduling. This is not a rule, and theoretically any type of scheduling can be derived by combining on-line or off-line with static or dynamic priorities and pre-emptive or non-pre-emptive tasks. In this report off-line scheduling with non-pre-emptive tasks and dynamic or static priorities are presented.

2.2.4 Mapping and Scheduling Algorithms

Mapping and scheduling are NP-hard problems so practical sizes of these problems can only be solved using constructive or transformative heuristics. The problems with smaller size can be handled by methods which deterministically provide the optimal solution. Deterministic methods explore exhaustively the solution space and return the theoretical optimum. Branch-and-Bound (BB) is an example of deterministic method.

Heuristics are pseudo-random search and optimization techniques which perform the exploration and exploitation of solution space based on learned experience. They are used when exhaustive search and deterministic methods are too hard or impossible to


9

apply and when the search times grows exponentially with the size of the problem. Heuristics provides a reasonable quality solution in relatively short time. Heuristics can be problem specific or general purpose, constructive or based on iterative improvements. Constructive heuristics build partial valid solutions until a complete solution is reached. Transformative heuristics alter existing solutions trying to enlarge the exploration of the solution space. List Scheduling (LS) and Genetic Algorithms (GA) are examples of general purpose heuristics from constructive and transformative class, respectively.

Mathematical Programming provides methods for minimizing or maximizing objective functions with meeting constraints imposed on the variables of the problem. Integer Linear Programming (ILP), Non-Linear Programming (NLP) and Mixed Integer Linear Programming (MILP) are examples of Mathematical Programming. Constraint Linear Programming (CLP) is the upper class of ILP and MILP. ILP and MILP are problems were objective and constraints functions are linear and all variables or only some of them are integers. NLP are problems were objective or constraint functions are non-linear. MILP is NP-complete, while for ILP and NLP there are algorithms with polynomial complexity [13].

3 Surveyed Approaches

This section describes representative mapping and scheduling methodologies for NoC and bus-based MPES.

The main issues addressed for each methodology are design steps, design goals and algorithms for implementation. A diagram will be depicted for every description in order to understand better the design flow of each methodology and to identify easier the steps from NoC design flow covered by each methodology.

Two subsections will group the approaches by the two classes of architectures. At the end of each subsection a set of tables will summarize information about presented methodologies.

3.1 Approaches for NoC NoC is a novel research area and therefore only few mapping and scheduling methodologies have been developed. Moreover these do not completely cover the NoC communication issues. There are few approaches which perform network assignment and even less which perform communication scheduling. The minimal NoC path is usually allocated during communication mapping.

This section presents four approaches for NoC, grouped by network assignment criteria. Two of them, G. Varatkar et al. [2] and T. Lei et al. [4], poorly treat communication issue, by leaving out communication mapping and scheduling. J. Hu et al. [6] approach is the one which performs communication scheduling and provides exact measures of communication overhead. D. Shin et al. [7] focuses on NoC communication issue, by achieving network assignment and link speed allocation, but it avoids scheduling communication.


10

Next two subsections describe the enumerated approaches for NoC with and without network assignment. In the end of this section two tables will capture the design steps and design goals of the methodologies and few conclusions will be drawn.

3.1.1 Approaches without Network Assignment

G. Varatkar et al. [2] have developed a two-step methodology for minimizing overall energy consumption in NoC (Figure 4). Communication-aware step performs simultaneous mapping and scheduling of tasks aiming to reduce communication energy by minimizing inter-processor communication volume. It also facilitates tasks energy minimization in voltage selection step, by maximizing the slack. The two optimization goals are alternated by a communication criterion, which has the role to keep the local inter-processor communication volume of incoming edges under a limit stated globally which depends on the application communication volume and a factor K (0≤K≤10). This factor K is tuned in the outer loop of the design methodology, until an optimum trade-off between design goals is reached. The second step performs DVS for tasks which exploits non-uniform distribution of slack and considers time and energy overhead of voltage switching [3]. The methodology does not carry out communication mapping and scheduling, and thus communication distance is only roughly approximated. Hard deadlines are not guaranteed, because communication delay is ignored when verifying hard deadlines.

A critical path LS algorithm, with dynamic priorities and adaptive goals, implements the communication-aware step. The most urgent task is assigned to its closest-in-time available processor or to the processor where its most dependent-to parent is assigned and hard deadlines are still verified, this depending on whether the communication criterion is verified or not. Communication criterion imposes that the average inter-processor communication volume of incoming edges does not exceed the average communication volume of all application edges multiplied by a factor K (NoC incoming communication volume < application communication volume × K).

DVS is implemented with ILP.

Figure 4 – G. Varatkar et al. Methodology

Task Mapping & Scheduling (LS)

Task Voltage Selection (ILP)

Application TG & NoC Architecture

Mapped and Scheduled Model

K


11

T. Lei et al. [4], [5] use two-step GA for task mapping and then applies ASAP/ALAP techniques for task scheduling (Figure 5). The mapping goal is to maximize timing performances, while scheduling is employed to check hard deadlines. The communication is not mapped and scheduled and its delay is estimated using average distance in NoC or Manhattan distance between processors. The methodology does not guarantee hard deadlines because it uses average instead of worst case execution times (WCET) for tasks and communication on critical path. For this approach, set of TG with different periods and single deadlines are mapped to NoC architecture.

The methodology has three steps. In partitioning phase the tasks are assigned to IP cores classes. In embedding phase the tasks are assigned to their highest-performance IP cores within the class. ASAP/ALAP is applied to check the feasibility of the assignment. The partitioning phase and embedding phase could be viewed as a pseudo-network assignment, because selecting a certain IP core from a group depends mostly on the communication distance.

Figure 5 – T. Lei et al. Methodology

J. Hu et al. [6] have developed an energy-aware methodology for NoC (Figure 6) which performs communication mapping and scheduling to minimum available path and uses overall energy-consumption-gap to decide between possible task mapping and scheduling when building initial solution. The methodology does not perform voltage selection and network assignment, but it exploits non-uniformly the slack by distributing it proportionally according to timing and energy profiles of tasks. It provides an accurate measure for communication delay and energy consumption and aims to meet hard deadlines. The methodology has two phases, Energy Aware Scheduling (EAS) and Search and Repair (SaR). During EAS the initial solution is built by using level-based LS for simultaneous task mapping and scheduling, and a deterministic method for simultaneous communication mapping and scheduling. In the preliminaries of EAS, slack is distributed and budgeted deadlines (BD) computed for all tasks. Greedy iterations are then employed in SaR to eliminate BD violations from initial solution.

Partitioning phase – IP group allocation (GA)

Embedding phase – Task Mapping (GA)

Task Scheduling (ASAP/ALAP)

Parallel Applications as Set of TG & NoC Architecture



12

Level-based LS with dynamic priorities is always assigning the most time-critical/energy-consumption-gap task from ready list to its highest-performance processor in terms of time/energy, this depending on existence/lack of BD violations. The deterministic method performs communication scheduling for all combinations of ready tasks and available processors in order to find out earliest finish time (EFT) of all ready tasks. BD of each task is computed once, at the beginning of EAS, from the slacks and the average execution times of predecessors on critical path with ignoring communication delays. Time critical tasks are those with EFT exceeding BD.

Two improving movements are iterated in SaR in order to eliminate BD violations at the cost of higher energy consumption: Local Task Swapping (LTS) which changes the execution order of a critical task with a non-critical task on the same processor, and Global Task Migration (GTM) which assigns critical tasks to another processor with an eye of improving total delay and to keep the low energy consumption.

Figure 6 – J. Hu et al. Methodology

3.1.2 Approaches with Network Assignment

D. Shin et al. [7] has proposed a methodology (Figure 7) with network assignment and link speed allocation for reducing communication energy in NoC with voltage scalable links, while verifying the hard real time constraints. Task mapping and network assignment target the minimization of inter-processor communication volume and distance, while static voltage selection and static power management aim to reduce communication energy of links. Task mapping verifies also area constraints. Hard deadlines are guaranteed, because even if communication scheduling is not performed, the worst case communication delay of links is used. Links can be shared by several edges, without having any constraints for link communication volume. As a

Task Mapping & Scheduling (LS)

Communication Mapping & Scheduling (deterministic)

Iterative Improvements (greedy)




13

consequence for leaving out communication scheduling, link voltage selection and power management are performed statically.

The methodology uses GA for task mapping and network assignment and LS for task scheduling and link speed assignment.

GA for task mapping uses a mapping chromosome, a two-point crossover and random mutation. GA for tile allocation uses permutations of tiles as chromosome, cycle crossover to generate only legal solutions, and random exchanges as mutation. GA for routing path allocation uses a binary chromosome to encode moves along X-direction and Y-direction, a coordinate crossover with crossover point at intersection of paths and a random mutation operator which exchange the locus of opposite directions. GA for routing path allocation has an impact on communication volume of each link and thus on link delay.

LS uses mobility of tasks (|ASAPstart – ALAPend|) as static priority, which is based on the pessimistic estimation of communication delay.

Generalized PV-DVS of M. T. Schmitz et al. [8] is adapted for static link speed assignment. The slack is distributed incrementally to the communication links, based on their pessimistic energy consumption. SVS for links is a single-voltage selection in discrete domain, which ignores switching overhead.

Figure 7 – D.Shin et al. Methodology

Tile Assignment (GA)

Task Mapping (GA)

Routing Path Allocation (GA)



Task Scheduling (LS)

Link Speed Allocation (LS)


14

3.1.3 Summary

Table 1 and Table 2 capture the design steps and design goals for NoC approaches previously described.

Table 1 – Methodologies for NoC

Mapping

Network

Assignment Scheduling

Voltage Selection

Power Management Ref.

Task Tile Comm. Task Comm. Task Comm. Task Comm. G.Varatkar et al. [2]+[3]

LS - - LS - ILP - - -

T.Lei et al. [4]

2-GA - - ASAP/ ALAP

- - - - -

J.Hu et al. [6]

LS - Determ

. LS Determ. - - - -

D.Shin et al. [7]

GA GA GA LS - - LS - LS

Table 2 – Design Goals for NoC Energy consumption

Dynamic Static Timing

constraints Memory

Ref.

Task Comm. Task Comm. Hard Soft Task Comm.

Area

G.Varatkar et al. [2]+[3]

DVS Y - - Y - - - -

T.Lei et al. [4]

- - - - Y - - - -

J.Hu et al. [6]

Y Y - - Y - - - -

D.Shin et al. [7]

- SVS - ABB SPM

Y - - - Y

As it can be seen, NoC methodologies do not cover completely design space exploration steps, especially those regarding communication issue. A single approach performs network assignment [7] and another one communication scheduling [6]. Though, two approaches [2], [7] minimize inter-processor communication volume during task mapping.

Energy consumption is the preferred optimization goal. Three different methodologies [2], [6], [7] were developed to minimize various components of energy consumption for tasks and communication. Thus, G. Varatkar et al. [2] uses dynamic voltage selection for tasks, D. Shin et al. [7] uses static link speed assignment and static power management for links, while J. Hu et al. [6] uses energy-consumption-gap and non-uniform distribution of slack during task mapping and scheduling.


15

All approaches aim to meet hard deadlines, but only two guarantees them [6], [7]. T. Lei et al. [4] methodology maximizes performances for set of applications. None of the approaches deal with soft deadlines.

Area constraints are verified only by D. Shin et al.[7].

Constructive (LS) and transformative algorithms (GA, greedy) are sometimes brought together to carry on efficiently mapping and scheduling steps. Thus, T. Lei et al. [4] and D. Shin et al. [7] combine GA with LS to perform separate mapping and scheduling of tasks. Greedy iterations were employed in [6] for improving initial solution of simultaneous task mapping and scheduling constructed with LS.

Simultaneous mapping and scheduling is sometimes performed to obtain a higher quality solution. Two approaches [2], [6] uses LS for simultaneous task mapping and scheduling, and another one [6] has developed a deterministic method for simultaneous communication mapping and scheduling.

Various LS algorithms (critical path, mobility, level-based, ASAP, ALAP) were used for scheduling [6], [7], voltage selection [7] and simultaneous mapping and scheduling [2], [6], while GA with standard and specialized genetic operators were employed for task mapping and network assignment steps.

3.2 Approaches for Multiprocessor Embedded Systems Multiprocessor embedded systems with bus-based communication raise less mapping and scheduling issues than NoC. This is because buses are assimilated to processors, and thus communication and tasks are treated similarly. A diversity of approaches were developed for bus-based MPES. They target different optimization goals and focus to certain design steps. Thus, energy minimization with DVS is targeted by M. T. Schmitz et al. [8]–[11], A. Andrei et al. [13] and F. Gruian et al. [14], scheduling for soft deadlines is implemented by L. A. Cortes et al. [15]–[18], while mapping and scheduling for balancing memory utilization is the focus of R. Szymanek et al. [22].

Next three subsections present in more details approaches for bus-based MPES grouped by the optimization goal criteria.

3.2.1 Energy-aware Approaches

M. T. Schmitz et al. [8]–[12] developed a Low Power CO-Synthesis (LOPOCOS) methodology for heterogeneous distributed embedded systems which contain state-of-the-art dynamic voltage scalable processors and use gate level power reduction techniques (such as gated clocks) to switch off un-utilized PE. LOPOCOS aims to reduce off-line the overall energy consumption of multi-rate multimedia and telecommunication systems while guaranteeing hard real-time constraints and chip-area constraints. The methodology has two nested loops: the outer loop for core allocation and task mapping and the inner loop for time scheduling, communication mapping and voltage selection (Figure 8).


16

The task mapping is treated separately from communication mapping and is implemented with a GA. Energy Efficient Genetic Mapping Algorithm (EE-GMA) is energy-aware in the sense that the least energy consuming mapped and scheduled solutions survive and mate, but its main contribution is guaranteeing chip-area constraints. A penalty factor is used for area constraints.

The communication mapping is carried on together with scheduling to optimize better the common goals. The scheduling is implemented with a combination of LS and GA, where dynamic priorities of the tasks in the ready list are obtained via GA. Energy Efficient Genetic List Scheduling Algorithm (EE-GLSA) is targeted on guaranteeing the hard deadlines and is energy-aware in the sense that the least energy consuming scheduled solutions survive and mate. A penalty factor is used for timing constraints.

Both GA use integer chromosomes (priority/mapping strings), multiobjective fitness, single or two-point crossovers, random mutations with dynamic probability scheme (exponentially decreasing), ranking selection for parents and offsprings, random initial population, and population diversity as termination criterion.

Figure 8 – M. T. Schmitz et al. Methodology

The voltage selection is applied for tasks mapped and scheduled to state-of-the-art DVS processors, by using the generalized Power Variation Dynamic Voltage Scaling (PV-DVS) algorithm. This non-uniformly exploits the available slack with considering the power profiles of the tasks. It schedules tasks to single voltage in continuous domain or to 2-voltage in discrete domain with ignoring switching overhead. PV-DVS is a LS with dynamic priorities which incrementally distributes the available slack to the highest energy consuming tasks.

Component Allocation

Task Mapping (GA)

Scheduling & Communication Mapping (GA+LS)

Task Voltage Selection (LS)

Application TG & Architecture Platform



17

A. Andrei et al. [13] combined ABB with DVS to reduce leakage power of tasks along with their dynamic power for multiprocessor systems. The approach considers time and energy overheads of switching between voltage levels and allows simultaneous voltage level switching via ABB and DVS. The switching overhead must be outdone by dynamic and leakage power savings in order to reduce the computation energy dissipation. Mathematical programming is applied for continuous and discrete versions of the problem with and without switching overheads. NLP returns a single-voltage pair (Vdd, Vbs) in continuous domain for each task. MILP returns the time spent by each task in each multi-voltage pair (Vdd, Vbs) of discrete domain.

F. Gruian et al. [14] developed a simultaneous scheduling and task voltage selection for bus-based multiprocessor embedded systems with a given mapping (Figure 9). The methodology aims to minimize computation energy while verifying hard real time constraints. DVS with uniform distribution of slack is applied in continuous and discrete domain for single and 2-voltage levels, with ignoring switching overhead.

The methodology is implemented with level-based LS with dynamic priorities. Level-based LS gives priority to the ready task which provides the largest energy minimization if delayed with one time unit or which is currently reaching its latest start time (LST). Priority of ready tasks contains also the urgency of task, weighted by a coefficient (α). This is tuned in the outer loop to find the optimal trade-off between delay and energy minimization.

Figure 9 – F. Gruian et al. Methodology

3.2.2 Approaches for Soft Real Time Constraints

L. A. Cortes et al. [15]–[18] proposed a quasi-static scheduling for multiprocessors systems with soft and hard real-time constraints (Figure 10). The scheduling builds off-line a tree of schedules and switching points depending on the expected and maximum execution times of tasks and on the utility functions associated to soft tasks. Based on this it chooses on-line the maximum utility path in the scheduling tree depending on the actual execution times of tasks and on the activated switching points. The static schedules of the tree guarantee meeting hard deadlines and maximize the total utility function of soft tasks, by scheduling most critical soft tasks first. It also assumes that

Mapped Model

Scheduling & Task Voltage Selection (LS)

Mapped and Scheduled Model α


18

tasks are already mapped into processors. The communication and computation tasks are treated the same, because buses are assimilated to processors.

Both exact (BB) and heuristic (LS) methods were developed to derive schedules for soft deadlines from a given prefix and to delimit the time intervals of switching points in the tree of schedules.

The total utility function is the sum of individual utility functions of soft tasks and expresses the quality degradation of system performances when soft deadlines are missed. The utility functions associated to soft tasks are non-increasing functions depending on the completion time of soft tasks and capture the importance and criticality of soft tasks.

The tasks execution times are variable and uniformly distributed over an interval. The execution time values used in this approach instead of classical WCET are expected execution time (mean), maximum execution time and actual execution time.

Figure 10 – L.A.Cortes et al. Methodology

3.2.3 Memory-aware Approaches

R. Szymanek et al. [19]–[22] developed a simultaneous mapping and scheduling for balancing memory utilization among processors with meeting hard deadlines (Figure 11). The methodology performs data and code memory placement and it assures that communicated data is present in the producer and consumer data memories during communication.

The methodology is implemented with LS with adaptive objective and dynamic priorities, which helps CLP tool to explore efficiently the solution space. The most urgent/data consuming task (depending on the objective) is always assigned to the cheapest processor in terms of data and code memory utilization. The objectives are interchanged when data constraints are not fulfilled.

Mapped Model

Quasi Static Scheduling (BB, LS)

Mapped & Scheduled Model


19

Figure 11 – R. Szymanek et al. Methodology

3.2.4 Summary

Table 3 and Table 4 capture methodologies and goals for mapping and scheduling in bus-based MPES.

Table 3 – Methodologies for bus-based MPES Mapping Scheduling Voltage Selection Power Management

Ref. Task Comm. Task Comm. Task Comm. Task Comm.

M.T.Schmitz et al. [8]-[11]

GA GA GA+LS GA+LS LS - LS -

A.Andrei et al. [13]

- - - - NLP MILP

- - -

F.Gruian et al. [14]

- - LS - - -

R.Szymanek et al. [22]

LS + CLP - - - -

L.A.Cortes et al. [15]-[18]

- - BB LS

- - - -

Table 4 – Goals for bus-based MPES

Energy consumption

Dynamic Static Timing

constraints Memory

Ref.

Task Comm. Task Comm. Hard Soft Code Data

Area

M.T.Schmitz et al.[8]-[11]

DVS - - - Y - Y - Y

A.Andrei et al.[13]

DVS - ABB - Y - - - -

F.Gruian et al.[14]

DVS - - - Y - - - -

R.Szymanek et al.[22]

- - - - Y - Y Y -

L.A.Cortes et al.[15]-[18]

- - - - Y Y - - -

Application Model & Architecture Model

Mapping & Scheduling (LS+CLP)

Mapped & Scheduled Model


20

As it can be seen the approaches are specialized for certain design space exploration steps and goals. Thus M. T. Schmitz et al. [8]–[11] have developed a comprehensive methodology for energy minimization, while A. Andrei et al. [13] and F. Gruian et al. [14] have focused on DVS. L. A. Cortes et al. [15]–[18] have developed a scheduling for soft deadlines, while R. Szymanek et al. [22] have brought a memory-aware mapping and scheduling. All approaches guarantee hard deadlines.

Constructive (LS) and transformative (GA) heuristics, as well as mathematical programming (NLP, MILP, CLP) and deterministic methods have been used to carry on simultaneously or separately the design steps. We have noticed the simultaneous scheduling and DVS of F. Gruian et al. [14] carried on with LS and the simultaneous scheduling and communication mapping of M. T. Schmitz et al. [8]–[11] performed with GA. Also an interesting combination of LS with GA which performs a scheduling with random priorities was noticed in M. T. Schmitz et al. [8]–[11].

4 Comparisons of Mapping and Scheduling Techniques

The previous presented mapping and scheduling techniques are compared here using three criteria: timing constraints, energy savings and area constraints.

4.1 Verifying Real Time Constraints Timing constraints refer to meeting hard and soft deadlines. Scheduling is always aiming to verify timing constraints, but also task mapping could target overall delay minimization. In the following are presented the aspects related to verifying timing constraints from previously introduced approaches. The following three subsections discus the handling of hard real-time constraints by various NoC and bus-based MPES architectures and the handling of soft real time constraints by bus-based MPES approaches. The following two issues are addressed in order to compare the methodologies: how accurate is the total delay and how are the mapping and scheduling aware of timing constraints. In the end of each subsection a set of tables will summarize this comparison.

4.1.1 Hard Real Time Constraints in NoC

The approaches for NoC target only verification of hard real time constraints. The communication in NoC raises more problems than bus-based communication in MPES. Thus a special attention will be paid to communication delay and its NoC dependent components.


21

G. Varaktar et al. [2] does not guarantee hard deadlines. Communication is not mapped and scheduled to NoC and its delay is ignored when verifying hard deadlines. Moreover average values for ET of tasks instead of WCET are used.

During simultaneous mapping and scheduling, the most critical task is assigned to the closest-in-time available processor (slightly earlier or later than ready time of tasks) or to the processor where the heaviest communicating parent is assigned and hard deadlines are still verified. The most critical task is that with minimum earliest start time (EST) added to latest finish time (LFT) (EST+LFT). EST denotes the moment when a task is ready and a processor is available for it. LFT is the minimum between deadline and the latest start time of successors.

T. Lei et al. [4] does not guarantee hard deadlines. WCET for tasks and average communication delays are used when verifying hard deadlines in each TG of TG set. Communication mapping and scheduling are not performed, so minimal NoC path assumed for edges is only a rough estimation of minimal available NoC path which to assure the exclusive use of links by the edges and to prevent deadlock. Edge delay depends on communication volume, minimum communication distance, link delay and network interface delay for packetization/depachetization of messages.

During IP group allocation and task mapping, overall delay on critical path is minimized for each TG in TG set. The objective function is the maximum report between delay on critical path and deadline of each TG, normalized to maximum deadline of TG set. Hard deadlines are verified on critical path, when objective function has a value less or equal with maximum deadline of TG set.

D. Shin et al. [7] guarantees hard deadlines. WCET of tasks and pessimistic values for communication delays are used when verifying hard deadlines. Communication is mapped on minimal NoC path and this is minimized at tile allocation. Since several edges are allowed to share the same link, routing is deadlock free and communication scheduling is no more required. This also makes communication delay to be strongly dependent on link communication volume and link speed (frequency). Inter-processor communication volume is minimized at task mapping, while link communication delay is known only after communication mapping. Maximum link speed is used when verifying hard deadlines at task scheduling, but a lower speed for links with high communication volume and without exceeding hard deadlines can be obtained at link speed allocation. Edge delay is the sum of link delays on routing path. Link delay is pessimistically evaluated as the total communication volume of edges sharing the link relative to link bandwidth and link speed (frequency). Switching time overhead is ignored.

During task scheduling, least mobility task has the highest priority. Task mobility is given by |ASAPstart-ALAPend| and is based on pessimistic evaluation of communication delays.


22

J.Hu et al. [6] guarantees hard deadlines. WCET for tasks and communications are used when verifying hard deadlines. Communication is mapped and scheduled on minimum available NoC path and is deadlock free. Exclusive usage of links is enforced. Communication delay is not provided but it should depend on wormhole routing.

During simultaneous mapping and scheduling the most time critical task (EFT ≥ BD) from ready list is assigned to its fastest processor. The most time critical task is that which exceeds its BD with maximum difference EFT-BD. When no critical tasks (EDF < BD) are in ready list, BD is still verified when assigning the highest energy-consumption-gap task to one of its minimum energy consumption processor.

During greedy iterations, BD violations are searched and repaired either by rescheduling critical tasks on the same processors (LTS) or by reassigning them to other processors (GTM). Incoming communication of critical task must be reassigned only at GTM because during LTS a critical task is exchanged with a non-critical task on which it has no dependency.

EFT contains the delay of incoming communications and WCET of tasks, while BD contains the average ET of tasks and their slack with ignoring communication. The slack is non-uniformly distributed among tasks based on their energy and delay profiles (statistical variances).

4.1.1.1Comparison Table 5 and Table 6 capture the main criteria of comparison for NoC approaches aiming to verify hard real time constraints. Table 5 shows how communication delay is minimized in each approach and how are guaranteed hard deadlines. Table 6 depicts the modalities for maximizing timing performances during task mapping and scheduling.

Table 5 – Communication delay in NoC

Communication Delay Overall Delay

Ref. Min. Comm.

Volume

Min. Comm. Distance Link

Speed Task ET

Comm. Delay

Guarantee

Hard Deadlines


TM - - - WCET - N

T.Lei et al. [4]

- - Manhattan const. WCET AVG N

D.Shin et al. [7]

TM NA Manhattan SVS WCET WCRT Y

J.Hu et al. [6]

- CM + CS Min.

available const. WCET WCRT Y


23

Table 6 – Computation Delay in Noc Task Mapping Task Scheduling

Ref. Goal PE Assignment Priority


- Closest available Time critical

min (EST + LFT)

T.Lei et al. [4]

Min. Delay Critical Path - ASAP, ALAP

D.Shin et al. [7]

- - Time critical

min |ASAPstart-ALAPend|

J.Hu et al. [6]

Repair BD violations EFT<BD

Fastest Greedy reassignments

Time critical max (EFT-BD), EFT ≥ BD

4.1.1.1.1 Communication delay

Communication delay in NoC has a big impact to overall delay. Edge delay depends on the message size, communication bandwidth and transmission delay per communication unit. Message size per communication bandwidth gives the number of communication units to be transmitted. They usually represent communication volume of the edge and link bandwidth, respectively. Transmission delay per communication unit is influenced by the routing algorithm and depends on the communication distance and on the delay of communication components in NoC such as link, switch and network interface. Link delay depends on the link speed, link communication volume and link bandwidth. When exclusive usage of links is enforced, link communication volume is equal with link bandwidth, and link speed becomes link delay.

Except for communication volume and communication distance, all the other components of edge delay are usually given as constants, which are unique for each type of communication component in NoC. Therefore minimizing communication volume and distance will have a big impact on communication delay. Inter-processor communication volume can be minimized at task mapping, while link communication volume can be limited at communication mapping. Communication distance is minimized at tile allocation, while minimum routing path for each edge is allocated at task mapping.

Deadlocks and congestions in NoC can be prevented by performing communication scheduling.

4.1.1.1.2 Guaranteeing hard deadlines

Hard deadlines are guaranteed if and only if worst case ET of tasks and communication delays are provided. Task ET are usually provided as WCET. Pessimistic estimations of communication delay involve knowing the worst conditions in which the communication is carried on, as for example, the total communication volume of edges sharing a link [7] or the minimal available path when exclusive usage of links is enforced [6].


24

4.1.1.1.3 Maximizing Timing Performances

Priority at task scheduling or processor selection and goal at task mapping could increase the chances for meeting hard deadlines by minimizing overall delay or only computation delay.

Thus, if task priority includes information about incoming communication and available slack, a better scheduling will be performed. If the most time critical task is assigned to its fastest processor [6], or mapping goal is to minimize overall delay [4], the scheduling will be helped to meet hard deadlines.

With these specifications the methodologies which yield the most accurate communication delay, which minimize communication in NoC and guarantee hard deadlines are by D. Shin et al. [7] and J. Hu et al. [6].

4.1.2 Hard Real Time Constraints in Multiprocessor Embedded Systems

Multiprocessor embedded systems have fewer problems regarding communication delay. This is because buses can be assimilated with processors and thus no difference is made between tasks and communications. As a consequence, hard deadlines are guaranteed for all approaches.

F.Gruian et al. [14] guarantees hard deadlines. Both tasks and communications are scheduled and their WCET considered when verifying hard deadlines. ET of tasks assigned to voltage scalable processors depends on supply voltage and clock frequency. Switching time overhead is ignored when verifying hard deadlines after DVS.

During simultaneous scheduling and DVS, the most urgent task has the highest priority if α has a big value and task slack becomes too small or if task is actually reaching its LFT.

M. T. Schmitz et al. [8]–[11] guarantees hard deadlines. Both tasks and communications are mapped and scheduled and their WCET considered when verifying hard deadlines. ET of tasks assigned to voltage scalable processors depends on supply voltage and clock frequency. Switching time overhead is ignored when verifying hard deadlines during 2-voltage DVS.

A time penalty based on deadline violations (DV) is used in the objective function of scheduling and DVS. DV is the positive difference between finish time and deadline of a task, otherwise is zero. DV are searched on all paths. Time penalty is the report between sum of squared DV and squared TG hyperperiod. Squaring is applied to give higher penalty to larger DV. Feasible solutions are free of DV and have no time penalty. Unfeasible solutions with minimum time penalties survive and mate.


25

R. Szymanek et al. [22] guarantees hard deadlines. Both tasks and communications are mapped and scheduled and their WCET considered when verifying hard deadlines.

During simultaneous mapping and scheduling the most urgent task is assigned to the cheapest processor in terms of time and memory loading. The most urgent task is that with minimum LST. Processor time loading is the report between task ET added to delay of incoming communication (average) and the remaining time until application deadline. The solution balances the load of processors and thus raises chances to meet hard deadlines.

A. Andrei et al. [13] guarantees hard deadlines. It performs DVS for a given scheduling which considers switching overhead when verifying hard deadlines.

4.1.2.1 Comparison All approaches guarantee hard deadlines after scheduling. Verifying hard deadlines after DVS ignores switching time overhead only when single or 2-voltage levels are allowed per task (F. Gruian et al. [14], M. T. Schmitz et al. [8]-[11]).

Table 7 captures the main steps for maximizing timing performances during mapping and scheduling. Thus, PE assignment and optimization goal at task mapping as well as task priority during task scheduling have an impact on verifying hard real time constraints. Time critical tasks assigned to the least loaded processors in constructive methodologies (R. Szymaneck et al. [22]) or minimizing deadline violation penalty in transformative methodologies (M. T. Schmitz et al. [8]-[11]) can raise the chances for meeting hard deadlines.

Table 7 – Hard Real Time Constraints in bus-based MPES

Mapping Scheduling Ref.

Goal PE

Assignment Priority


- - Time critical

(α/slack)


Deadline violation penalty - -

R.Szymaneck et al. [22]

- Least Time Loaded Time critical (min LST)


26

4.1.3 Soft Real Time Constraints in Multiprocessor Embedded Systems

There is a single approach for meeting soft deadlines in bus-based MPES. It also guarantees hard deadlines.

L. A. Cortes et al. [15]–[18] guarantees hard deadlines and maximizes total utility of soft tasks. Execution times for tasks on each processor are variable and uniformly distributed over a time interval. Therefore three metrics for execution times are used in different contexts: maximum execution times for verifying hard deadlines, actual execution times for computing utility function of prefix tasks and expected execution times (mean) for estimating utility function of unscheduled tasks. Similarly is for communication tasks.

Both an exact method and a heuristic were developed to perform a quasi-static scheduling of the assigned tasks and communications for mono-processor and multi-processor embedded systems. Both start from a basis schedule and try to generate a tree of schedules and switching points for each prefix provided by the basis schedule. The difference between the two algorithms is that the heuristic tries to avoid the generation of all permutations of unscheduled tasks for a given prefix, ignores duplicate schedules and tries to limit the size of the tree by giving priority to certain branches, while the exact method is exhaustive. The basis schedule uses expected execution times for guaranteeing hard deadlines and maximizing total utility of soft tasks. The schedules from branches use actual execution times for prefix tasks and expected execution times for unscheduled tasks. The switching points for prefix tasks are determined using auxiliary total utility functions computed with maximum execution times.

There are two algorithms – exact and heuristic – for off-line building the basis schedule [16]. Both give priority to tasks leading to the next hard or soft task. The difference between them is at generating priorities for soft tasks. The exact method generates all permutations of soft tasks, while the heuristic generates dynamically the priorities for soft tasks based on their utility functions with expected execution times. Both returns the schedule with maximum total utility function found, which guarantee hard deadlines. Maximum execution times are used when verifying deadlines, while expected execution times gives the total utility.

Table 8 summarizes the information presented for L. A. Cortes et al. [15]–[18].

Table 8 – Soft Real Time Constraints in bus-based MPES

Scheduling

Goal Ref. Priority

Soft deadlines Hard deadlines L.A.Cortes et al. [15]-[18]

Utility (expected ET) Max. Total Utility(expected ET) Max. ET


27

4.1.4 Conclusions

Verifying real time constraints is the main concern of time scheduling and VS. Mapping can only help scheduling to meet deadlines by maximizing timing performances. Real time constraints can be hard or soft. All approaches aim to meet hard deadlines, while only a single approach target soft deadlines. NoC approaches have problems in guaranteeing hard deadlines because communication delay is more difficult to estimate. Reducing communication volume and distance in NoC can improve timing performances.

4.2 Minimizing Energy Consumption Energy minimization refers to reduction of dynamic and static energy consumption for tasks and communication during task mapping and voltage selection. The majority of approaches introduced in section 3 target dynamic energy minimization for tasks. Some of them perform both energy-aware task mapping and DVS for tasks. A single approach performs SVS for links, while NoC approaches aim to minimize communication energy by minimizing NoC communication volume and distance. Static energy is reduced by two approaches, one applying ABB and the other SPM.

The next two subsections give the details for energy-aware and DVS approaches. A third section provides experimental results for both groups. A table will summarize the information at the end of each subsection.

4.2.1 Energy-Aware Task Mapping

Five approaches are surveyed which perform task mapping for energy minimization: three for NoC and two for bus-based MPES. The following issues will be addressed for each approach: how task priority, PE assignment and mapping goals are affected by energy minimization and how accurate is the energy model.

4.2.1.1Energy-Aware Task Mapping in NoC G. Varatkar et al. [2] aims to minimize communication energy by minimizing inter-processor communication volume. It also enforces that local inter-processor communication volume of incoming edges does not exceed a limit imposed globally (communication criterion). During simultaneous task mapping and scheduling, the most urgent task is assigned to the processor where its heaviest communicating parent is assigned and the hard deadlines are still verified. However this happens only when the communication criterion if not verified.

Communication energy depends on the inter-processor communication volume and on the average NoC communication distance. Communication mapping and scheduling are not performed, so the communication distance and communication volume per link are not known.

Communication criterion imposes that the local average communication volume per edge (from incoming edges in TG) to be less than the global average communication


28

volume (from all edges in TG) multiplied by a parameter K (0≤K≤10), tuned by the user. When K is 1 a sort of task clustering happens.

D. Shin et al. [7] aims to minimize communication energy by minimizing inter-processor communication volume and NoC communication distance during task mapping and network assignment.

Communication energy is the sum of dynamic and static energy for all links. Dynamic energy of link depends on the total communication volume of edges sharing that link, on the link speed (frequency) and on the switching capacitance of the link. Communication volume of each link is known after communication mapping.

J. Hu et al. [6] aims to minimize overall energy consumption. During simultaneous task mapping and scheduling non-critical tasks with biggest energy-consumption-gap are assigned to least energy consumption processors which are fast enough to verify hard deadlines. Energy-consumption-gap of a task is the difference between the first two minimum values of the overall energy consumption among all partial schedules including that task. Non-critical tasks are those with EFT < BD. Communication is mapped and scheduled on minimal available NoC path and is deadlock free.

Overall energy consumption is the sum of computation energy of all tasks and communication energy of all edges. Communication energy of an edge depends on the communication volume of that edge (in bits) and on the energy per bit transfer. Communication energy per bit has a wormhole routing-like formula depending on the routing path length (number of hops), link energy and switch energy.

4.2.1.2 Energy-Aware Task Mapping in MPES M. T. Schmitz et al. [8]–[11] minimizes overall energy consumption during task mapping in the sense that minimum energy consumption solutions obtained with GA survive and mate.

Overall energy is the sum of computation and communication energy for all tasks and edges. Energy consumption of non-voltage scalable tasks and communications depends on the power consumption and execution time at nominal supply voltage. Energy consumption of voltage scalable tasks depends in addition supply voltage and nominal supply voltage.

R. Szymanek et al. [22] targets a balanced solution for time and memory loading of processors. The simultaneous mapping and scheduling assigns the most data consuming tasks to the least loaded processors such that the overall produced data to be kept under a certain limit. Data consuming tasks are the tasks which produce less data than they consume. Least loaded processors are the processors with minimum rapport between the required and the available data and code memory. Keeping the data produced under a certain limit means to keep the overall energy of the system constant. Thus, a certain inter-processor communication volume and a certain number of future running tasks are allowed in the system at any moment.


29

4.2.1.3 Comparison Table 9 captures the main criteria of comparison for presented approaches.

Table 9 – Energy Aware Task Mapping

Task Mapping Ref.

Priority PE Assignment Goal G.Varatkar et al. [2]

- Most Comm. Vol. Limit Comm. Vol.

D.Shin et al. [7]

- - Min. Comm. Vol.

J.Hu et al. [6]

Energy Diff. Least Energy -


- - Min. Energy


Data Mem. Diff. Least Loaded Limit Mem.

Task priority at scheduling as well as PE assignment and optimization goal at task mapping can increase savings of dynamic energy consumption. Thus, assigning the task with the highest energy-consumption-gap to its minimum energy consumption processor during constructive task mapping and scheduling (J. Hu et al. [6]) or including energy as optimization goal during transformative task mapping (M.T. Schmitz et al. [8]-[11]) could rise energy savings. In NoC approaches communication energy consumption could be minimized by reducing inter-processor communication volume at task mapping and by minimizing NoC communication distance at network assignment.

Communication energy model depends on the communication volume and energy transmission of communication unit. J. Hu et al. [6] includes routing information in energy transmission per bit, while D. Shin et al. [7] includes information for link speed assignment.

According to these statements, J. Hu et al. [6] is the best energy-aware approach for NoC, since it minimizes overall energy consumption and schedules communication on minimal available path with avoiding deadlock. The second best is that of D. Shin et al. [7] which minimizes communication energy and communication distance. G. Varatkar et al. [2] minimizes only inter-processor communication volume, without performing communication mapping.

The same classification is yielded by the experimental results from Table 11. Thus 44% overall energy savings are obtained for task mapping of J. Hu et al. [6] and 35% communication energy savings are yield for network assignment of D. Shin et al. [7].


30

4.2.2 Voltage Selection

Voltage Selection assigns lower voltage levels and operational frequencies to high energy consuming tasks in order to reduce dynamic energy consumption by only exploiting the available slack. DVS or SVS in discrete or continuous domain for single or multi-voltage levels, applied to tasks or communications are different versions of the VS problem. ABB is a variant of VS which minimizes leakage power by scaling down clock frequency and increasing threshold voltage.

The main issues related to VS are slack distribution, switching overhead and energy model. Uniform or non-uniform distribution of slack, number of voltage switching per task, ignoring or considering switching overhead can heavily influence energy savings.

There are five approaches from those introduced in section 3 which perform VS in order to minimize dynamic energy consumption. Two of them perform also ABB and PM in order to reduce static energy consumption. The main issues regarding VS are presented for each approach in order to carry on a fair comparison. First will be presented the approaches implemented with LS, followed by those implemented with mathematical programming. A table will summarize the information.

4.2.2.1 DVS with LS F.Gruian et al. [14] DVS aims to minimize computation energy. It distributes uniformly the slack by using a LS which gives priority to tasks causing the largest energy improvement when delayed with one time unit. Tasks are scheduled at single voltage in continuous domain. Two voltage levels in discrete domain can be found to approximate the continuous voltage, in which case the time spent at each voltage level is of interest.

Task energy is an average computed as mathematical integral over interval [EST, LFT]. Tasks intervals are initially delimited by applying ASAP and ALAP at nominal supply voltage (maximum). During DVS these intervals are narrowed with the shifting of EST due to the exploitation of slack by tasks predecessors. Task energy is dependent of number of clock cycles (N), switched capacitance (Cp) and supply voltage (Vdd).

M. T. Schmitz et al. [8]–[11] PV-DVS aims to minimize computation energy. It distributes incrementally the slack to the tasks which brings the highest energy savings when delayed with a time quantum. The time quantum is the minimum slack available at each iteration divided by the number of expandable tasks. The time quantum has a minimum bound to prevent insignificant task extensions. Expandable tasks are eliminated when their slack becomes lower than the minimum bound of time quantum. Every task scheduled with PV-DVS will run at single voltage in continuous domain.

Generalized PV-DVS for discrete domain returns the time spent by each task in the two discrete voltage levels around the continuous voltage. Switching overhead is ignored. PV-DVS can be easily extended for multi-voltage with considering switching overhead.

Task energy depends on number of switching (N01), load capacitance (Cp) and supply voltage (Vdd).


31

D. Shin et al. [7] adapted the generalized PV-DVS of M. T. Schmitz [8]– [11] for static link speed allocation and applied SPM for links in order to reduce communication energy. SVS for links distributes incrementally the slack to the links which yield the highest energy savings when slowed down with a certain time quantum. The slack of a link is the minimum among the slacks of edges assigned to (sharing) that link. The time quantum is the minimum slack among links divided by number of expendable links. SVS is a single-voltage scheduling in discrete domain, which ignores switching overhead, but it could be easily extended to multi-voltage scheduling in discrete domain with or without switching overhead.

SPM shuts down unutilized links for entire duration of application.

Communication energy is the sum of dynamic and static energy of links. Dynamic energy of a link depends on the total communication load of edges sharing that link, on the link speed (frequency) (f) and on the switching capacitance (Cp) of the link. Static energy of a link is its leakage power (PLK).

4.2.2.2 DVS with Mathematical Programming A.Andrei et al. [13] combined ABB with DVS to reduce static and dynamic energy consumption of tasks, with or without considering switching overhead (see section 3.2.1 for more details). A pair of voltages (Vdd, Vbs) characterizes each task at any moment. NLP finds single-voltage pairs in continuous domain for each task, while MILP returns number of execution cycles for each task and each multi-voltage pair in discrete domain.

Computation energy is sum of dynamic energy, static energy and energy switching overhead for all tasks. Dynamic task energy depends on number of cycles (N), switched capacitance (Cp) and supply voltage (Vdd). Static task energy depends on the threshold voltage (Vth), supply voltage (Vdd), body bias voltage (Vbs), body junction leakage current (ILK), number of gates (G) and nano CMOS technology-dependent constants. Switching energy overhead is function of supply voltage (Vdd), body bias voltage (Vbs), and power rail capacitance and total substrate and well capacitance (C).

G. Varatkar et al. [2]+[3] DVS for tasks is implemented with ILP and aims to minimize computation energy. It exploits non-uniform distribution of slack. It regards single and multi voltage levels with considering time and energy overhead for voltage switching.

Computation energy is sum of dynamic energy and energy switching overhead for all tasks. Task energy depends on the supply voltage (Vdd), cycle time (C=1/f) (inverse of clock frequency) and on the number of cycles executed at each supply voltage (N). Unlike the other approaches, cycle time instead number of cycles varies with supply voltage and clock frequency. Switching overhead is a constant.


32

4.2.2.3 Comparison Table 10 captures the main features of VS introduced in section 3 and detailed here.

Table 10 – Voltage Selection

VS (supply voltage)

ABB / PM (threshold voltage) Ref.

Task Comm. Task Comm

Switching Overhead

Slack Distrib.

LS Priority/ Algorithm


D - - - N UNIF. Energy diff/ 1 time unit


D - - - N NON- UNIF.

Energy diff/ time quantum

D.Shin et al. [7]

- S - / S N NON- UNIF.

Energy diff/ time quantum


D - D / - Y NON- UNIF.

NLP, MILP


D - - - Y NON- UNIF.

ILP

As it can be seen the majority of approaches perform DVS for tasks. The only exception is that of D. Shin et al. [7] which uses SVS for communication links. Two approaches [13], [7] apply ABB and PM respectively in order to minimize static energy consumption.

Energy minimization of VS is strongly dependent of slack distribution and switching overhead. Thus if the highest energy consumption tasks get the most slack, they can be slowed down for a longer interval of time and thus can run at lower voltage levels. Also a reduced number of switching will reduce switching overhead.

4.2.2.3.1 Slack Distribution

The majority of approaches perform non-uniform distribution of slack. The only exception is F. Gruian et al. [14]. All LS approaches distribute the slack to the expandable tasks which brings the highest energy gain when delayed with a certain time quantum. Depending on how the list of expandable tasks is maintained and how the time quantum is computed, the slack is distributed uniformly or non-uniformly. If the time quantum is a constant and each expandable task is scheduled once, then the slack is distributed uniformly. This is the case of F. Gruian et al. [14], which uses a level-based LS and a fixed time quantum of 1 time unit. The other LS keep the expandable tasks in list as long as they have a non-zero slack, and thus the chances that tasks with high energy consumption to be rescheduled are increased. Moreover, time quantum is updated at each iteration.


33

4.2.2.3.2 Switching overhead

Switching overhead depends on the number of switching and implicitly on the maximum limit for multi-voltage levels (maximum number of voltage levels assigned per task), and on the voltage execution order (voltage scheduling). DVS with multi-voltage in discrete domain presents the highest switching overhead, while SVS with single voltage in continuous domain has the lowest switching overhead. Considering switching overhead during VS gives a more accurate energy model, and thus the possibility to reduce more energy.

In our approaches, only DVS implemented with mathematical programming considers switching overhead. This is because they perform DVS in discrete domain for multi-voltage, which present the highest switching overhead. They are able to reduce number of switching by performing voltage selection and voltage scheduling. On the other hand, LS approaches for DVS ignores switching overhead, because they first perform DVS for single voltage in continuous domain and then approximate the continuous value with single or 2-voltage in discrete domain. SVS for single voltage in discrete domain [7] is the only one which does not present switching overhead, but it also has the minimal energy savings. A. Andrei et al. [13] has the most accurate switching overhead model. The experiments made by A. Andrei et al. [13] show that considering switching overhead, energy can be reduced with 6%.

4.2.2.3.3 Energy Model

Dynamic energy model for tasks is similar for all DVS. It is function of supply voltage, switching capacitance and number of clock cycles. The only exception is [2]+[3] which varies cycle time instead of number of cycles along to supply voltage and clock frequency. Dynamic energy model for communication is provided only by D. Shin et al. [7]. Static energy model for tasks and communication is based on leakage power. Switching overhead is a constant in [2]+[3] and a function of supply voltage and power rail capacitance in [13]. Task energy model in [13] is also dependent of body bias voltage due to ABB. The most detailed energy model for tasks including dynamic and static components as well as switching overhead is provided by A. Andrei et al. [13].

According to previous considerations, approaches of A. Andrei et al. [13] and M. T. Schmitz et al. [8]–[11] are the best. The same results are yielded by experimental results in Table 10. Thus 10.8% static energy savings are obtained with ABB of A. Andrei et al. [13] and 44% dynamic energy savings for DVS of M. T. Schmitz et al. [8]–[11]. Energy savings for DVS with uniform distribution of slack [14] are unfortunately very small (<4%).

4.2.3 Experimental Results

Table 11 shows the experimental results for real-life applications of the most representative approaches. The first three rows correspond to VS, while the last two correspond to energy aware task mappings. All energy savings in the second column are expressed as overall energy and are the result of comparison with other versions stated in the last column.


34

Table 11 – Experiments on real-life applications

Ref. Energy savings

Real-life application Compared with


< 4% OFD (Optical Flow Detection) LS crit.path+VS


44% OFD Version without DVS


10.8% GSM video codec Version without ABB

D.Shin et al. [7]

35% Multimedia

(H.263/MP3 encoder/decoder)

Version with BB-TM and balanced RPA + SPM for

links

J.Hu et al. [6]

44% MSB ( Multimedia System Benchmark )

( MP3/H263 audio/video encoder ) EDF

4.2.4 Conclusions

Energy minimization refers to reducing dynamic and static energy consumption for tasks and communication.

Dynamic energy consumption is usually minimized by applying DVS for tasks, but also energy-aware task mapping could be employed to reduce dynamic energy for tasks and communications. Three approaches from those introduced in section 3 have applied them both: G. Varatkar et al. [2]+[3], D. Shin et al. [7] and M.T. Schmitz et al. [8]–[11]. Good results are yield by the last two. Energy-aware task mapping is applied with success in J. Hu et al. [6], without taking help of DVS but with exploiting non-uniformly the available slack.

Communication energy in NoC approaches can be reduced by minimizing communication volume and NoC distance at task mapping and network assignment. Two approaches deal the best with this: J. Hu et al. [6] and D. Shin et al. [7]. Static energy consumption is minimized with ABB or PM. Only two approaches from those introduces in section 3 minimizes static energy: A. Andrei et al. [13], M. T. Schmitz et al. [8]-[11]. Both yield good results.

4.3 Verifying Area and Memory Constraints Area constraints refer to memory size or to maximum number of tasks running on a processing element. Three approaches from those introduced in section 3 deal with area constraints. The following two issues are addressed in order to compare the approaches: what does the area represent and how is task mapping aware of area constraints. In the end of the section a table will summarize the information.

D. Shin et al. [7] check the area constraints during task mapping. Area constraints impose that each processing element must fit into NoC tile area. This refers to the number of tasks assigned to each processing element such that memory size and gate number to not exceed NoC tile area.


35

M. T. Schmitz et al. [8]–[11] check the area constraints at mapping. Area refers to memory size or to silicon area depending on whether the implementation of task is done in software (GP, DSP) or in hardware (ASIC, FPGA). Mapping objective function contains a penalty factor for area violations. Area penalty captures the size of used area relative to maximum available area, weighted by an aggressiveness factor (k=0.02).

R. Szymanek et al. [22] check area constraints by performing the actual placement of data and code of tasks in to the corresponding memory of processors and by foreseeing the future amount of data to be placed in data memory of processors.

During simultaneous mapping and scheduling, the most data consuming task is assigned to the least code and data memory loaded processor such that future data memory requirements are fulfilled. The most data consuming task requires the highest amount of data to be placed in to the available data memory of processor (highest difference between data produced and data consumed). The least loaded processor provides the highest amount of available data and code memory relative to the task requirements. An assignment is acceptable if the data requirements estimated in the worst case for the remaining tasks can be fulfilled with the actual configuration of available data memories on different processors.

4.3.1 Comparison

Table 12 summarizes the comparison criteria related to area constraints.

Table 12 – Area Constraints

Mapping Scheduling Ref.

Goal PE assignment Priority

D.Shin et al. [7] Area constraints (PE area ≤ NoC tile area)

- -


Area penalty ∏ (k × area violations / max area +1)

- -


Future data requirements Least data and code memory loaded

Most data consuming

Most of the approaches refer to memory size. R. Szymanek et al. [22] performs also placement of data and code in memory of PE. They have different goals at task mapping related to chip area. Thus, D. Shin et al. [7] solely verifies area constraints, M. T. Schmitz et al. [8]–[11] aims to minimize chip area, while R. Szymanek et al. [22] target a balanced utilization of chip distributed memory. R. Szymanek et al. [22] approach is considered the best, because it is foreseeing the future data requirement and has a memory-aware task priority and PE.


36

5 Conclusions

NoC research is still in the early stages, but the number of research groups is growing very fast. In this report we have surveyed the early approaches for mapping and scheduling to NoC. These perform static mapping and scheduling and target NoC architectures with 2D mesh of switches topology and wormhole routing. We have left out NoC approaches which treat solely network assignment, which make use of adaptive routing or which target other NoC topologies, because they are infrequent cases and cannot be fairly compared with the majority. We have focused instead on mapping and scheduling techniques for bus-based MPES, trying to find the common issues with NoC and the specific ones for NoC. The report gives the survey and the comparison of nine methodologies, four targeted to NoC and five for bus-based MPES.

The main conclusion that can be drawn is that the task mapping and scheduling is very similar in NoC and bus-based MPES, while communication raises special issues in NoC. Thus, whether in bus-based MPES there is no difference between tasks and communication because buses can be modelled as processors, in NoC a routing path must be allocated for each communication such that the communication delay to be minimized and network deadlocks and congestions to be prevented. On-chip distance has a big impact on communication overhead especially in a large NoC, and therefore its minimization may be a central goal in NoC design. For this reason the final design step from NoC Specialization, called Tile Allocation can be delayed until after Task Mapping and performed along with Communication Mapping in Network Assignment step in order to obtain the highest communication distance reduction for a particular application. Inter-processor communication volume can also influence communication overhead, so its minimization can be targeted at Task Mapping and Communication Mapping.

Optimization goals and their achievement are similar for NoC and bus-based MPES.

All surveyed approaches aim to meet hard deadlines, but not all guarantee them, this depending on whether the communication issue is tackled or not. Except for two NoC approaches which ignore communication issue, all surveyed approaches guarantee hard deadlines at scheduling and after voltage selection. A single approach [15]–[18] target soft deadlines and this assumes variable ET for tasks.

Energy minimization is a preferred optimization goal for NoC approaches. Communication energy is usually minimized by reducing inter-processor communication volume and distance at task mapping and network assignment. Dynamic energy consumption is minimized at task mapping and voltage selection, while static energy consumption is reduced by ABB and PM techniques. DVS for tasks is applied quite often, while SVS for links, ABB and PM are applied in disparate cases. Slack distribution and switching overhead have a big impact on energy savings at VS. Energy difference is used as task priority for both task mapping and VS. Energy savings brought by task mapping and DVS for tasks taken separately are comparable and close to 44%. ABB yields 10% improvements, while network assignment with SVS for links give 35% energy savings.

Area constraints refer to maximum number of tasks running on a PE or to on-chip memory utilization. Only three approaches, one for NoC and two for bus-based MPES


37

verifies area constraints at task mapping. One of those for bus-based MPES performs memory placement and aims to balance memory utilization [22].

The limitations of presented technologies refer to tackling communication issue in NoC and to the frequency of applying certain design steps for optimizing certain design goals. For example a single NoC approach [6] performs communication scheduling assuring thus the exclusive usage of communication links and preventing the appearance of deadlock and congestion in NoC. Another NoC approach [7] performs network assignment to minimize on-chip communication distance, but it avoids communication scheduling assuming that communication links can be shared by several edges. There is no approach which to perform network assignment and communication scheduling with assuring exclusive usage of communication links or with at least imposing an upper bound to communication volume per link.

Related to design goals, only a single approach [22] aims to balance the utilization of on-chip resources such that the targeted goal to be equally optimized in all parts of the chip. Balancing the resource utilization is a desirable optimization goal in on-chip architectures, because overloading is most of the time associate with heating which can lead to incorrect functionality or to short life time for circuits. Related to the frequency of design steps for achieving certain design goals, DVS for tasks is the most encountered technique for minimizing dynamic energy consumption. Only two approaches aims to minimize dynamic energy consumption of tasks at task mapping and a single approach applies VS for communication links.

Our belief is that communication issue will be the focus of future research in mapping and scheduling algorithms for NoC, regardless of targeted NoC architectures which use other topologies and routing algorithms than 2D mesh of switches and wormhole routing. Performing network assignment and communication scheduling with assuring the exclusion of communication link usage, deadlock prevention and congestion avoidance is not a trivial task, even though NoC topologies and routing algorithms does not influence too much mapping and scheduling algorithms. However, topology and routing algorithm have a big impact on optimization goal and communication overhead. To our knowledge, mapping and scheduling algorithms for solely NoC with 2D mesh topology and wormhole routing have been developed so far.

Developing dynamic or quasi-static mapping and scheduling algorithms for NoC could be another direction to explore. On-line decisions theoretically exploit better the actual configuration and thus might improve the quality of mapping and scheduling. However, the cost of deciding on-line must be justified by the quality improvement.

Acknowledgements

This survey was carried under the project “Specialization and Evaluation of Network on Chip Architectures for Multi-Media applications” , funded by Swedish KK Foundation. We would like to thank Prof. Petru Eles, IDA Linköping University, for many useful suggestions and discussions.


38

References

[1] S. Kumar, A. Jantsch, J.P. Soininen, M. Forsell, M. Millberg, J. Öberg, K. Tiensyrjä, A. Hemani, A Network on Chip Architecture and Design Methodology, IEEE Computer Society Annual Symposium on VLSI 2002.

[2] G. Varatkar, R. Marculescu, Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization, in Proc. IEEE/ACM Intl. Conf. on Computer Aided Design, San Jose, CA, Nov. 2003.

[3] Y. Zhang, X. Hu, and D. Chen, Task Scheduling and Voltage Selection for Energy Minimization, DAC 2002.

[4] T. Lei, S. Kumar, A Two-step Genetic Algorithm for Mapping Task Graphs to a NoC Architecture, DSD 2003.

[5] T. Lei, S. Kumar, Algorithms and Tools for NoC Based System Design, SBCCI 2003.

[6] J. Hu, R. Marculescu, Energy-Aware Communication and Task Scheduling for Network-on-Chip Architectures under Real-Time Constraints, in Proc. Design, Automation and Test in Europe Conf., Paris, France, Feb. 2004.

[7] D. Shin, J. Kim, Power-Aware Communication Optimization for Networks-on-Chips with Voltage Scalable Links, in Proc. CODES+ISSS'04, pp. 170-175, Sep. 2004.

[8] M. T. Schmitz, B. M. Al-Hashimi, P. Eles, Energy-Efficient Mapping and scheduling for DVS Enabled Distributed Embedded Systems, In Proc. Design, Automation and Test Europe Conference (DATE2002) pp. 514-521, Paris, 2002.

[9] M. T. Schmitz, B. M. Al-Hashimi, Considering Power Variations of DVS Processing Elements for Energy Minimization in Distributed Systems, In Proc. Inter. Symposium on System Synthesis (ISSS2001) pp. 250-255, Montreal, 2001.

[10] M. T. Schmitz, B. M. Al-Hashimi, P. Eles, Synthesizing Energy-Efficient Embedded Systems with LOPOCOS, Design Automation for Embedded Systems, Vol. 6, 401-424, 2002.

[11] M. T. Schmitz, B. M. Al-Hashimi, P. Eles, Iterative Schedule Optimization for Voltage Scalable Distributed Embedded Systems, ACM Transactions on Embedded Computing Systems, Vol. 3, No. 1, 182-217, Feb. 2004.

[12] M. T. Schmitz, B. M. Al-Hashimi, P. Eles, A Co-Design Methodology for Energy-Efficient Multi-Mode Embedded Systems with Consideration of Mode Execution Probabilities, In Proc. Design, Automation and Test Europe Conference (DATE2003), pp. 960-965, Munich, 2003.

[13] Andrei, M. T. Schmitz, P. Eles, Z. Peng, B. M. Al-Hashimi, Overhead-Conscious Voltage Selection for Dynamic and Leakage Power Reduction of Time-Constraint Systems, In Proc. Design, Automation and Test Europe Conference (DATE2004), accepted for publication, Paris, 2004.


39

[14] F. Gruian, K. Kuchcinski, LenS: Task Scheduling for Low-energy Systems Using Variable Supply Voltage Processors, ASP-DAC 2001.

[15] L. A. Cortes, P. Eles, Z. Peng , Static Scheduling of Monoprocessor Real-Time Systems composed of Hard and Soft Tasks, The IEEE International Workshop on Electronic Design, Test and Applications (DELTA 2004), Perth, Australia, January 28-30, 2004, pp. 115-120

[16] L. A. Cortes, P. Eles, Z. Peng, Quasi-Static Scheduling for Real-Time Systems with Hard and Soft Tasks, Technical Report, Embedded Systems Lab, Dept. of Computer and Information Science, Linköping University, September 2003.

[17] L. A. Cortes, P. Eles, Z. Peng , Quasi-Static Scheduling for Real-Time Systems with Hard and Soft Tasks, Design, Automation and Test in Europe (DATE 2004), Paris, France, February 16-20, 2004, pp. 1176-1181.

[18] L. A. Cortes, P. Eles, Z. Peng , Quasi-Static Scheduling for Multiprocessor Real-Time Systems with Hard and Soft Tasks, Technical Report, Embedded Systems Lab, Dept. of Computer and Information Science, Linköping University, December 2003.

[19] R. Szymanek Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems, Licentiate Thesis No. 13, ISSN 1404-1219, Lund, June 2001.

[20] R. Szymanek, K. Kuchcinski, A Constructive Algorithm for Memory-Aware Task Assignment and Scheduling, Proc. of the Ninth International Symposium on Hardware/Software Codesign Copenhagen, Denmark, April 25-27, 2001.

[21] R. Szymanek, K. Kuchcinski, Task Assignment and Scheduling under Memory Constraints, Euromicro 2000.

[22] R. Szymanek, K. Kuchcinski, Design Space Exploration in System Level Synthesis under Memory Constraints, Euromicro 25, Milan, September 8-10, 1999.

a survey of techniques for mapping and scheduling ......noc design methodologies share many design...

Documents