an efficient list scheduling algorithm for time placement problem

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/223011353

Anefficientlistschedulingalgorithmfortimeplacementproblem

ARTICLEinCOMPUTERS&ELECTRICALENGINEERING·JULY2007

ImpactFactor:0.82·DOI:10.1016/j.compeleceng.2007.02.005·Source:DBLP

CITATIONS

33

READS

34

3AUTHORS:

MtibaaAbdellatif

NationalEngineeringSchoolofMonastir

215PUBLICATIONS290CITATIONS

SEEPROFILE

BouraouiOuni

nationalengineeringschoolofSousse

54PUBLICATIONS105CITATIONS

SEEPROFILE

AbidMariem

InstitutNationaldesSciencesAppliqu…

666PUBLICATIONS1,583CITATIONS

SEEPROFILE

Allin-textreferencesunderlinedinbluearelinkedtopublicationsonResearchGate,

lettingyouaccessandreadthemimmediately.

Availablefrom:BouraouiOuni

Retrievedon:03February2016

https://www.researchgate.net/publication/223011353_An_efficient_list_scheduling_algorithm_for_time_placement_problem?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_2

https://www.researchgate.net/publication/223011353_An_efficient_list_scheduling_algorithm_for_time_placement_problem?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_3

https://www.researchgate.net/?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_1

https://www.researchgate.net/profile/Mtibaa_Abdellatif?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_4


https://www.researchgate.net/institution/National_Engineering_School_of_Monastir?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_6


https://www.researchgate.net/profile/Bouraoui_Ouni?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_4



https://www.researchgate.net/profile/Abid_Mariem?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_4


https://www.researchgate.net/institution/Institut_National_des_Sciences_Appliquees_de_Rennes?enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg%3D%3D&el=1_x_6


Computers and Electrical Engineering 33 (2007) 285–298

www.elsevier.com/locate/compeleceng

An efficient list scheduling algorithm for time placement problem

Abdellatif Mtibaa a,b,*, Bouraoui Ouni a, Mohamed Abid c

a Electronics and Microelectronics Laboratory (ElE), Faculty of Science of Monastir, Environment Street, 5019 Monastir, Tunisiab National Engineering School of Monastir (ENIM), Ibn ElJazzar Street, 5019 Monastir, Tunisia

c CES Laboratory, National Engineering School of Sfax, (ENIS), B.P.W. 3038, Sfax, Tunisia

Available online 23 March 2007

Abstract

The partially reconfigurable FPGAs allows an overlap between the execution and the reconfiguration of tasks. The par-tial approach can be used to fit a large application into the FPGA device by partitioning the application over time. Theexecutions being partitioned over time and the configurations of tasks are done so that the imposed constraints are satis-fied. The main aim of this work consists in answering the question when will a task be mapped in the FPGA? A time place-ment algorithm based on the list scheduling technique is developed to solve efficiently the above question. We have justused the list scheduling algorithm because of its fast run time. Compared to the run time of other algorithms used in thisfiled like the spectral and ILP algorithms, the list scheduling algorithm remains a good temporal placement candidate,especially, for a several nodes graph. Also, a part of this paper is devoted for the study and the implementation ofDCT task graph. This graph is the most computationally intensive part of the Color Layout Descriptor algorithm of alow-level visual descriptor of MPEG 7. The studied case shows that the use of the partial approach is very efficient in termsof latency of the whole application than the full one.� 2007 Elsevier Ltd. All rights reserved.

Keywords: Run time reconfiguration; Partial reconfiguration; Time placement; List scheduling

1. Introduction

Today, most of usual instruments are based on digital circuits, which are designed under more and morepressure time and area constraints. One of the solutions introduced by designers consists in transforming ahigh level specification to ASICs [1,2]. In spite of its advantage in term of performance, the ASICs approachprovides inefficient solutions for many applications which have heterogeneous nature and which are composedof several sub-tasks with different characteristics. For instance, a multimedia application may include a dataparallel task, a bit level task, irregular computation, high precision word operations and real time compo-nents. For such application the ASICs approach would lead to uneconomical size and a large number of

0045-7906/$ - see front matter � 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.compeleceng.2007.02.005

* Corresponding author. Address: Electronics and Microelectronics Laboratory (ElE), Faculty of Science of Monastir, EnvironmentStreet, 5019 Monastir, Tunisia.

E-mail addresses: [email protected] (A. Mtibaa), [email protected] (B. Ouni), [email protected](M. Abid).

mailto:[email protected]



https://www.researchgate.net/publication/235344545_Methodology_for_design_of_embedded_systems?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/3249928_Hardware-Software_Cosynthesis_for_Digital_Systems?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

286 A. Mtibaa et al. / Computers and Electrical Engineering 33 (2007) 285–298

separate chips. On the other hand, if one chip does not operate appropriately in the system, a new design mustbe developed. These types of problems lead designers to use the Dynamically Reconfigurable Field Program-mable Gate Arrays (DRFPGAs). These circuits allow better compromise between cost, flexibility and perfor-mance criteria. With the introduction of dynamically reconfigurable architectures, several approaches areinteresting in this field. Some of them are interesting to solve some problems that appear during the designprocess for fully reconfigurable systems such as the time partitioning. In the literature, several methods aredeveloped by authors to solve the time partitioning problem such as ILP technique [3,4], heuristic technique[5], a Dynamic algorithm [6], list scheduling technique [7,8] and the networks flow technique [9]. However, fewothers’ works are interested in the design flow for partially reconfigurable systems. In order to reduce thelatency of the graph, in [10,11] authors use the ILP approach to overlap the reconfiguration of partition(n + 1) with the execution of partition (n). Other recent groups are tackling the time placement problem suchas the ‘‘Hardware-Software-Co-Design’’ group of University of Erlangen-Nuremberg, Germany. In [14,16,18]authors propose a dynamic reconfiguratable model. This model is made upon a scheduler, an online placerand the reconfigurable device. The scheduler manages the tasks and decides when and on which device (hard-ware or software) a task should be executed. If the placer fails to find a free space onto the hardware, itacknowledges the scheduler which then decides either to run the task in software and to try it again lateron hardware or to reject it. If the placement is successful, then the placer will configure the reconfigurable sys-tem and acknowledge the scheduler. The execution of a task on the reconfigurable device leads to the onlineplacement problem, for which a method based on free rectangles management and heuristics fitting has beenproposed in [17] and improved in [13,19]. In [12] the author modelled the time placement as a three-dimen-sional problem. A task is represented by a cube in which the X and the Y coordinates represent the widthand the height of the task in the given FPGA. The Z coordinates represents the time at which the task willbe mapped in the FPGA. According to the author, the time placement problem was formulated as follows;given an input graph, the objective is either to find the minimal execution time of the input graph on afixed-size of FPGA, or to find the minimal size of FPGA part to accomplish the tasks within a fixed limitof time. The author uses the spectral method to solve the time placement problem. Such technique is charac-terized by an algorithm with high time execution, and which hardly gives solutions for several nodes graph.

This paper is organized as follows: next section presents the design flow, the target architecture, and def-initions according to the literature. In Section 3, we introduce the technique used to solve our problem andthe proposed framework. The evaluations of the algorithm’s performance are introduced in Section 4. Finally,the conclusion is in Section 5.

2. Background and definitions

2.1. Design flow and target architecture

Fig. 1 presents the design flow and the target architecture. The scheduler calculates the time at which a taskshould be executed, and then the placer places the selected task into available reconfigurable hardwareresources. The later presents a problem when using reconfigurable FPGA. This resource constraint can berelated to the application (e.g. when it requires more than available area), or when extensions are requiredto an existing application.

The hardware platform for experiments and evaluation is based on Xilinx’s Virtex-II FPGA (RC203 ofCeloxica) and the Embedded Development Kit (EDK) [20,21]. EDK includes a 32-bit soft processor corecalled MicroBlaze and several other hardware IPs. The MicroBlaze processor can be used as a global control-ler. It interacts with both the reconfigurable device and the memory to load and to execute tasks, the placer inorder to place task in its appropriate place in the FPGA and the scheduler in order to load task at its appro-priate time.

2.2. Definitions

Before explaining the proposed technique, definitions and assumptions according to the related works inthis area will be given in the following paragraph.

https://www.researchgate.net/publication/35660198_Synthesis_of_dataflow_graphs_for_reconfigurable_systems_using_temporal_partitioning_and_temporal_placement?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/220886158_Speeding_up_Online_Placement_for_XILINX_FPGAs_by_Reducing_Configuration_Overhead?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/221224430_Partitioning_Sequential_Circuits_on_Dynamically_Reconfiguable_FPGAs?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/220201393_Synthesis_and_Time_Partitioning_for_Reconfigurable_Systems?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/224614339_Time_partitioning_framework_for_fully_reconfigurable_systems?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/3786091_Network_flow_based_circuit_partitioning_for_time-multiplexed_FPGAs?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/220951617_Fast_Online_Task_Placement_on_FPGAs_Free_Space_Partitioning_and_2D-Hashing?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/4106803_Task_scheduling_for_heterogeneous_reconfigurable_computers?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/3250250_Fast_template_placement_for_reconfigurable_computing_systemsJ?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/220952577_A_New_Approach_for_On-line_Placement_on_Reconfigurable_Devices?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

https://www.researchgate.net/publication/220759364_Heuristics_for_Onine_Scheduling_Real-Time_Tasks_to_Partially_Reconfigurable_Devices?el=1_x_8&enrichId=rgreq-6919987b-cafc-46e4-a3f6-9b166ae80432&enrichSource=Y292ZXJQYWdlOzIyMzAxMTM1MztBUzoyMDYxMzg1Mzc3MTM2NjZAMTQyNjE1ODY2NTE3Mg==

Scheduler

Placer

M2

M1

M4

Task Request

M1 M2

M4

M3

Database

Our contribution

Violation of Area

constraint?

No

Yes

HWProcess

HWProcess

HWProcess

Available reconfigurable

hardware resources

New FPGA family (SOPC Architecture)

Reconfigurable hardware region

MEM In/Out

Celoxica RC203 board

CPU

Fig. 1. Design flow and target architecture.

A. Mtibaa et al. / Computers and Electrical Engineering 33 (2007) 285–298 287

Definition 1 (Task graph). A task graph is a directed cyclic graph G = (V,E), where V = {T1,T2, . . . ,Tn} is theset of tasks (Nodes) and ‘‘E’’ is the set of arcs. An arc e(TiTj) 2 E is defined through the data dependencebetween task ‘‘Ti’’ and task ‘‘Tj’’.

Definition 2 (Partition). A partition ‘‘P’’ of the graph G(V,E) is its division into some disjoint subsets ‘‘P1’’,‘‘P2’’, . . . , ‘‘Pn’’ such that "Pk � P:

[nk¼1P K ¼ V

"PK, we haveP

T i2P KAi 6 Ak, where Ai is the area of task ‘‘Ti’’ and Ak is the area constraint.

Definition 3 (Time placement). Given a graph G(V,E) and a device ‘‘D’’, the time placement is a three-dimen-sional vector W = {X,Y,Z}; where X, Y represent the FPGA zone on which the task is mapped and ‘‘Z’’ rep-resents the time at which it will be mapped.

Definition 4 (Connectivity). The connectivity ‘‘Con(G)’’ of a given graph G(V,E) is the relation of number ofedges in G(V,E) over the number of all edges which can be built with the nodes (tasks) of G(V,E).


ConðGÞ ¼ ð2EÞ=ðV ðV � 1ÞÞ
where ‘‘E’’ is the number of edges and ‘‘V’’ is the number of nodes (tasks) in G(V,E).
Definition 5 (Quality). Given a graph G(V,E) and a set of time partitions {P0,P1, . . . ,Pn}, the quality (Q(G))of the graph G(V,E) is calculated as follows:

QðGÞ ¼ 1

n

Xi¼n

i¼1

ConðPiÞ

Definition 6 (Configuration). Given a reconfigurable device ‘‘D’’ and a set of tasks, we define the reconfigura-tion ‘‘Ci’’ of ‘‘D’’ at the time ‘‘ti’’ to be the set of tasks mapped in ‘‘D’’ at ‘‘ti’’.

Definition 7 (Off-line/on line placement). When the sequence of tasks to be performed by the chip is known inadvance the designer can optimize the use of resources off line and it can design an appropriate static control-ler. However, when the sequence is not predicable, or the tasks designs are not fixed, the controller needs tomake allocation decision on-line.

3. The proposed algorithm and framework

3.1. Scheduling process

In this section we develop a list scheduling algorithm in order to solve efficiently the following problem:Given an area constraint in the available reconfigurable part, and a task graph which should be mapped on

it. Each task has an area and time constraints. The scope of the algorithm is to find a scheduling on the inputtask graph, and to achieve the start and the reconfiguration times of each task. Furthermore, the algorithm iscapable to calculate the available area in the reconfigurable part and the latency of each reconfiguration. Thedifferent steps of the algorithm are illustrated on the following task graph of Fig. 2.

3.1.1. Step 1: Generation of ETG

In our approach, the task graph which modeled the target application is transformed to a new model calledExtended Task graph (ETG) (Fig. 3). In this new model, for each task in the original task graph we add a newone which represents its configuration. So, in the Extended Task graph model; TR represents the added taskand TE represents the original task. The added task is characterized by this couple (area (CLB), reconfigura-tion time), and the original task is characterized by this couple (area (CLB), computation time). It is too clearthat the added task and its corresponding original task should have the same area. A direct edge from task‘‘Ti’’ to task ‘‘Tj’’ represents a dependency between those tasks.

Fig. 2. Task graph.

Fig. 3. Example of extended task graph model.

Step-1

Step-2

Step-3

ASAP scheduling ALAP scheduling

T6 T7T8

T1 T3 T2

T9

T4

T10

T5

T6 T7 T8

T1 T2 T3 T9

T4 T10 T5

Step-1

Fig. 4. ASAP and ALAP scheduling.


3.1.2. Step 2: Calculation of ASAP and ALAP scheduling

In this step (Fig. 4), the algorithm calculates the ‘‘As soon and the As Late’’ schedule of tasks of the intro-duced task graph.

3.1.3. Step 3: Calculation of mobility

After achieving the As Soon and the As Late scheduling, the algorithm calculates the mobility of each task(Table 1). The later is calculated as follows:

TableMobili

Node

Mobili

MobilityðnodeðniÞÞ ¼ ALAPðnodeðniÞÞ �ASAPðnodeðniÞÞ
where, ALAP (node (ni)) is the number of control step of node (ni) according to the ALAP scheduling, andASAP (node (ni)) is the number of control step of node (ni) according to the ASAP scheduling.
3.1.4. Step 4: Building the list

In this step, the algorithm calculates the priority on the urgency PUr(ni)) and the priority on the mobility(PMo(ni)) of each node. Then, it calculates the priority of each node in the input graph. Finally, the algorithmplaces each node according to its priority in the list. The node having the highest priority is placed first in the list.

The priority of each node is calculated as follows:

PrðniÞ ¼ P UrðniÞ þ P MoðniÞ

1ty

n1 n2 n3 n4 n5 n6 n7 n8 n9 n10

ty 0 0 0 0 0 0 0 0 1 2


The priority on the urgency of each node is calculated as follows:

P UrðniÞ ¼ N Cstep � ðALAPðniÞÞ

where NCstep is the number of control steps.The priority on the mobility of each node is calculated as follows:

P MoðniÞ ¼ 1=ððMobility nodeðniÞÞ þ 1Þ

As we remark, given two nodes ni and nj, if PUr(ni) is greater than PUr (nj) then Pr(ni) is greater than Pr(nj)without regarding the priority on the mobility of each node PMo. Thus, the dependency constraint is alwayssatisfied. The dependency constraint signifies; given a task ‘‘Ti’’ on which another task ‘‘Tj’’ is dependent can-not be placed in a later partition than the partition in which task ‘‘Tj’’ is placed. The priority on the mobility isadded to put nodes with low mobility (especially nodes with no mobility) on the list as soon as all predecessorshave been placed on the list. We illustrate this idea in the example of Fig. 5.

For instance, we assume that node (n1) is placed on the list; the question is which node (n2) or (n3) should beplaced first on the list. Due to its mobility, node (n3) has more chance than node (n2) to be placed in otherreconfiguration without violation of dependency constraint. Thus node (n2) should be placed first on the list.

3.1.5. Step 5: Building configurations

The algorithm puts the first node on the list into the first configuration, and it removes nodes according totheir priority from the list to this configuration until the size of the FPGA is reached. We remain in this con-figuration until the remained area ‘‘Rarea’’ (see equation below) allows mapping other non removed tasks fromthe list in the FPGA. Next, the algorithm builds the new configuration by fetching in the list the first nonremoved node and removing nodes from the list to the new configuration until the size of the FPGA isreached. And we remain in this configuration until the remained area ‘‘Rarea’’ allows mapping other nonremoved tasks from the list. This process is repeated until all nodes are placed in configuration.

RareaðiÞ ¼ Acons þXL<i

AðCELÞ �

Xj<¼i

AðCRj Þ

Where AðCEi Þ ¼

PjUij � AðT jÞ and AðCR

i Þ ¼P

jW ij � AðT jÞ. Rarea(i) = remained area at the configuration (i); ifrom 1 to number of configuration. Where Uij = 1 if the task ‘‘Tj’’ is executed in the configuration Ci; other-wise Uij = 0. Wij = 1 if the task ‘‘Tj’’ is reconfigured in the configuration i; otherwise Wij = 0. A(Tj) is the areaof task ‘‘Tj’’.

3.1.6. Results

Dealing with the example of Fig. 2 (let the area constraint be 1000 CLBs) our algorithm divides the taskgraph into eight configurations. In the first reconfiguration tasks ‘‘T1’’, ‘‘T2’’ and ‘‘T3’’ are configured. Sincethere are no dependency constraints between these tasks, so each task can start its execution immediately afterits reconfiguration. Due to the area constraint the reconfiguration of task ‘‘T4’’ is possible only after the exe-cution of task ‘‘T1’’; also the reconfiguration of task ‘‘T5’’ is possible only after the execution of task ‘‘T2’’.Since there is a dependency constraint between tasks ‘‘T3’’, ‘‘T4’’ and ‘‘T5’’, then the execution of tasks‘‘T5’’ does not start immediately after the end of its reconfiguration, but it should wait till its inputs are avail-able after the execution of task ‘‘T4’’. Results show that during the implementation of the target task graph, weexploit an average of 88.55% of available reconfigurable area, so the watched resources represent an average of

T1

PMo(n2) =1 PUr (n2)= 3

PMo(n3) = 0, 5 PUr (n3)= 3

T3

T2

Fig. 5. Priority on the mobility.

Table 2Design results

Configurations Tasks in each configuration Used area (%) Latency (ns) Whole latency Quality

1 TR2, TR3, TR1 84.2 2202 TR2, TR3, TE1 84.2 303 TR2, TE3, TE1 84.2 704 TE2, TE3, TE1 84.2 2755 TE2, TE3, TR4 83.6 2256 TE3, TR4, TR5 65.4 557 TR5, TE4 36.8 6708 TE5 17.4 0

2390 ns 0.333

0 500 1000 1500 2000 2500 3000

Task_1

Task_2

Task_3

Task_4

Task_5

Rec_time Com_time

Fig. 6. Design results.


11.45%. The evaluations of the algorithm’s performance on the example of Fig. 2 are given in Table 2 andFig. 6.

3.2. ‘‘DRESSY’’ framework

In order to develop CAD tools that lead designers to correct implementations on fully and on partiallyreconfigurable devices, we are developing a CAD tool called ‘‘DRESSY’’ (DYnamically REconfigurableSmart System’’. This tool in its academic version solves efficiently the time partitioning and the time placementproblems. According to our framework, to solve the time partitioning problem, the user can choose the list

Fig. 7. Task graph.

Fig. 8. Task parameters.

Fig. 9. Architecture parameters and the choice (time partitioning or time placement).

Fig. 10. Extended task graph.


scheduling or the ILP algorithm. However, only the list scheduling is available for the time placementproblem. In the first step the user should introduce the task graph of its target application (Fig. 7), the task

Fig. 11. Solution of the time placement.

Fig. 12. Solution of the time partitioning.


parameters in terms of area, latency, used memory (Fig. 8) and his target architecture parameters (memoryconstraint, area constraint) (Fig. 9). In the second step, the user introduces his choice in terms of time parti-tioning or time placement (Fig. 9). In the case of time placement the tool achieves automatically the extendedtask graph (Fig. 10), the optimal solution in terms of the whole latency of the application (Fig. 11), and thesequence of reconfigurations to be mapped into the FPGA. In the case of time partition the tool gives auto-matically the tasks in each partition, the latency of each partition, and the whole latency (Fig. 12).

4. Case study

4.1. The Color Layout Descriptor (CLD)

The ‘‘CLD’’ is a low-level visual descriptor that can be extracted from images or video frames. The processof the CLD extraction consists of four stages (Fig. 13): Image partitioning, selection of a single representativecolor for each block, DCT transformation and non linear quantization and Zig-Zag scanning [22].

0 0 0 0 …1 0 0

63 bits

CrCbY

Dominant colorsélection

DCT

Quantization and Zig-Zag

scanning

{{DDYYii}} 11 ≤≤ ii ≤≤ 66

{{DDCCrrjj}} 11 ≤≤ ii ≤≤ 33

{{DDCCbbjj}} 11 ≤≤ ii ≤≤ 33

Partitioning

Binary Descriptor

Fig. 13. Block diagram of the CLD extraction.


Since DCT is the most computationally intensive part of the CLD algorithm, it was chosen to be imple-mented in hardware, and the rest of subtasks (partitioning, color selection, quantization, zig-zag scanningand Huffman encoding) were chosen for software implementation.

The two dimensional Discrete Cosine Transform (DCT) is defined as follows:

Cði; jÞ ¼ 1

4kðiÞkðjÞ

X7

x¼0

X7

y¼0

f ðx; yÞ cosð2xþ 1Þip

16cosð2y þ 1Þjp

16

" #

where c(i, j) is the DCT coefficient, f(i, j) is the original pixel value, i, j = 0,1, . . . , 7 and

kðiÞ ¼ kðiÞ 1=ffiffiffi2p

i ¼ 0

1 otherwise

(

In term of matrix notation, we can write C = T Æ F Æ Tt, where the 8 · 8 matrices (C = [c(i, j)] and F = [f(i, j)] arethe 64 DCT coefficients and the original pixel values respectively, and matrix Tt denotes the transpose of ma-trix T, which is the DCT matrix with entries t(I, j) given by:

tði; jÞ ¼ 1

2kðjÞ cos

ð2iþ 1Þjp16

Among the 64 DCT coefficients c(0,0) is known as the DC term which is related to the pixel values f(i, j) by:cð0; 0Þ ¼ 1

8

P7x¼0

P7y¼0f ðx; yÞ. The other 63 DCT coefficients are called AC coefficients.

The DCT can be viewed as four times the sum of two consecutive 4 · 4 matrix multiplications as shown inthe following equation:

Ai;j Ai;jþ4

Aiþ4;j Aiþ4;jþ4

� ��

Bi;j Bi;jþ4

Biþ4;j Biþ4;jþ4

� �¼

Ai;j � Bi;j þ Ai;jþ4 � Biþ4;j Ai;j � Ai;jþ4 þ Ai;jþ4 � Biþ4;jþ4

Aiþ4;j � Bi;j þ Aiþ4;jþ4 � Biþ4;j Aiþ4;j � Bi;jþ4 þ Aiþ4;jþ4 � Biþ4;jþ4

� �

Ai,j and Bi,j are 4 · 4 matrices and i, j = 0,1, . . . , 3.The multiplication of two 4 · 4 matrices give the following equation:

½Ai;j� � ½Bi;j� ¼ ½P i;j� ¼

p1;1 ¼ a11 � b11 þ a12 � b21 þ a13 � b31 þ a14 � b41

p1;2 ¼ a11 � b12 þ a12 � b22 þ a13 � b32 þ a14 � b2

..

.

p4;4 ¼ a41 � b41 þ a42 � b42 þ a43 � b43 þ a44 � b44

8>>>>><>>>>>:

The model proposed by [15] is based on 16 vector products. Thus, the entire DCT is a collection of 16 tasks,where each task is a vector product as presented in Fig. 14. There are two kinds of tasks in the task graph

Vector Product (Task)

*

Const.

*

Const.

*

Const.

*

Const.

+ +

+

*

Const.

*

Const.

*

Const.

*

Const.

+ +

+

Fig. 14. Vector product model.

T1 T1

T2 T2

T1 T1

T2 T2

T1 T1

T2 T2

T1

T2

Fig. 15. Task graph for DCT (16 Tasks).


proposed in [15], ‘‘T1’’ and ‘‘T2’’, whose structure is similar to vector product, but whose bit widths differ. Acollection of 8 tasks forms a row of the 4x4 output matrix, as shown in Fig. 15.

4.2. Experimental results

The development of the scheduler is the main scope of this work. It decides when each task should be exe-cuted by the FPGA and when each task should be mapped to it. At the end of this paper we illustrate thealgorithm efficiency on the DCT task graph. For instance, we assume that the DCT task graph has been cho-sen to be implemented on the available partially reconfigurable part. According to our algorithm the usershould introduce the area constraint, the computation (Com_time), the reconfiguration time (Rec_time)and the area of each task (In order to calculate those parameters the user can depend on methods proposedin [23,12]). The whole latency of the application, the start execution time (St_ex_time) and the start reconfig-uration time (St_re_time) of each task are calculated automatically. In addition our algorithm allows to: (i)identify the sequence of reconfigurations, (ii) calculate the available area in the FPGA at each time, (iii)and have a good estimation of the latency of each reconfiguration. The evaluations of the algorithm’s perfor-mance on the DCT task graph are shown in Figs. 16–18, Fig. 16. gives the different starting times of bothreconfigurable and execution of each task. For some tasks (e.g. task ‘‘T16’’, ‘‘T15’’, ‘‘T13’’) the execution doesnot starts immediately after the end of its correspondent reconfiguration, but after a given delay. This situa-tion is due to the dependency constraint, indeed any task should wait till its inputs are available from pervioustasks. Let s(Ti) the delay between the start execution time and the end reconfiguration time of task Ti. In theideal case, s(Ti) should be equal to zero. In the present example there are three positive delays (s(Ti) > 0) thatcause no effect on the whole latency of the target application. Fig. 17 shows the latency of each reconfiguration

0 200 400 600 800 1000 1200 1400 1600 1800 2000

T1

T2

T3

T4

T5

T6

T7

T8

T9

T10

T11

T12

T13

T 14

T15

T16 Rec_time

Com_time

2

Fig. 16. The start reconfiguration and execution times of each task.

0

50

100

150

200

250

300

350

recon_1 recon_7 recon_13 recon_19

Fig. 17. The latency of each reconfiguration (ns).

86

88

90

92

94

96

98

100

reco

n_1

reco

n_5

reco

n_9

reco

n_13

reco

n_17

Fig. 18. Used Area%.


which allows the generation of FSM controller. Also Fig. 17 shows that reconfiguration eight has the highestlatency (330 ns); and the reconfiguration nine has lowest latency (3 ns). Fig. 18 shows that during the imple-mentation of the DCT task graph we exploit an average of 95% of available reconfigurable area, so thewatched resources represent an average of 5%, which is a very significant value.


5. Conclusion

In this paper we have introduced a time placement algorithm used for solving the time placement problem.A list scheduling algorithm is introduced; it has the advantage of being able to divide the input task graph intoset of time configurations in order to reduce the latency of the target application. In addition, this paper showsthat in spite of its classicalness, the list scheduling algorithm remains a good candidate even for solving thetime placement problem.

To illustrate better the efficiency of our algorithm, we devote a part of this paper to implement a practicalexample by using the time partitioning and the time placement approaches. The studied case shows that theuse of the partial approach is more efficient in terms of latency than the full one.

The proposed approach provides significant results with a classic algorithm. These results can help the userto automatically generate the FSM controller, which is not usually true in other approaches such in [14]. Inaddition, the other techniques are based on algorithms with high execution time such as (ILP, spectralmethod). Generally, to compare two approaches the authors should provide enough information about theirtarget applications, this is not always true such as in [3,13,14] where they do not provide information in termsof the latency of tasks, the area of his target architecture and other important information. Therefore, thecomparison is not always possible.

References

[1] Gasjki D, Aggarwal G, Chang ES, Doner R, Ishii T, Kleinsmith J, Zhus J. Methodology for design of embedded systems, Technicalreport UCI-ICS-98-07, University of California, March 1998.

[2] Gupta R, Micheli GD. Hardware Software Co synthesis for digital systems. IEEE Design Test 1993.[3] Kaul K, Vermuri R, Govindarajan S, Ouaiss I. An automated temporal partitioning tool for a class of DSP application Workshop

and reconfigurable computing in international conference on parallel architecture and compilation technique PACT, 1998. p. 22–7.[4] Ouni B, Mtibaa A, Abid M. Time partitioning framework for fully reconfigurable systems. In: The 16th international IEEE

conference on microelectronics, ICM’04, December 6–8, Gammart, Tunisia, 2004. p. 742–5.[5] Ouni B, Mtibaa A, Abid M. approach of heuristic algorithm for time partitioning problem. Dedicated systems magazine – 2005 Q2.

<http://www.Dedicated-systems.com>.[6] Ouni B, Mtibaa A, Abid M. Synthesis and time partitioning for reconfigurable systems. J Des Autom Embedded Syst

2004;9(3):77–191.[7] Cardoso JMP, Neto HC. An enhance static-list scheduling algorithm for temporal partitioning onto rpus. In: IFIP TC10 WG10.5 10

international conference on very large scale integration (VLSI’99), Portugal, 1999. p. 485–96.[8] Chang D, Marek-Sadowska M. Partitioning sequential circuits on dynamically reconfigurable FPGAs, In: International symposium

on field programmable gate arrays (FPGA 98), Monterey, California, 1998. p. 161–7.[9] Liu H, Wong DF. Network flow-based circuit partitioning for time-multiplexed FPGAs, In: IEEE/ACM international conference on

computer-aided design, 1998. p. 497–504.[10] Hadely JD, Hutching Bl. Design methodologies for partially reconfigurable systems. In: Proceedings of the IEEE workshop FPGAs

for custom computing machines, 1995. p. 78–84.[11] Jeong B. Hardware software partitioning for reconfigurable architectures, MS Theses School of Electrical Engineering, Seoul

National university, 1999.[12] Bobda C. Synthesis of dataflow graphs for reconfigurable systems using temporal partitioning and temporal placement, Thesis 20003,

Faculty of Computer Science, Electrical Engineering and Mathematics of the University of Paderborn Germany.[13] Ahmadinia A, Teich J. Speeding up online placement for XILINX FPGAs by reducing configuration overhead. In: Proceedings of the

IFIP international conference on VLSI-SOC, Darmstadt, Germany, 2003. p. 118–22.[14] Ahmadinia A, Bobda C, Koch D, Majer M, Teich J. Task scheduling for heterogeneous reconfigurable computers. In: Proceedings of

the 17th symposium on integrated circuits and systems design (SBCCI), September 7–11. Pernambuco, Brazil: ACM Press; 2004. p.22–7.

[15] Kaul K, Vermuri R. Integrate Block processing and design space exploration in temporal partitioning for RTR’’ architecture. In:Reconfigurable architecture workshop, RAW’99. Springer Publication, p. 606–15.

[16] Ahmadinia A, Bobda C, Teich J. A new approach for on-line placement on reconfigurable devices. In: Proceedings of theinternational parallel and distributed processing symposium (IPDPS-2004). Reconfigurable architectures workshop (RAW-2004),April 26–27. Santa F NM, USA: IEEE-CS Press; 2004.

[17] Bazargan K, Kastner R, Sarrafzadeh M. Fast template placement for reconfigurable computing systems. In: IEEE design and test-special issue on reconfigurable computing, 2000. p. 68–83.

[18] Steiger C, Walder H, Platzner M. Heuristics for online scheduling real-time tasks to partially reconfigurable devices. In: Proceedingsof the 13th international conference on field programmable logic and application (FPL’03). Springer; 2003. p. 575–84.

http://www.Dedicated-systems.com


[19] Walder H, Steiger C, Platzner M. Fast online task placement on FPGAs: free space partitioning and 2D-hashing. In: Proceedings ofthe 17th international parallel and distributed processing symposium (IPDPS)/reconfigurable architectures workshop (RAW). IEEEComputer Society; 2003. p. 178.

[20] Celoxica, Platform Developer’s Kit RC200/203 hardware and PSL Reference Manual. <http://www.celoxica.com/>.[21] <http://www.xilinx.com>.[22] MPEG-7, Visual part of experimentation Model V 5.0. ISO/IEC JTC1/SC29/WG11, Doc N3321b, March 2000.[23] Camel Tanougast, Methodologie de partitionnement applicable aux systemes sur puce a base de FPGA, pour l’implantation en

reconfiguration dynamique d’algorithmes flot de donnees, These de l’universite de Henri Poincare, Nancy I, Octobre 2001, France.

Abdellatif Mtibaa received his PhD degree in Electrical Engineering at the National School of Engineering ofTunis. Since 1990 he has been an Assistant Professor in Micro-Electronics and Hardware Design with ElectricalDepartment at the National School of Engineering of Monastir. His research interests include high level synthesis,rapid prototyping and reconfigurable architecture for real-time multimedia applications.

Bouraoui Ouni received his licence and his master respectively in 1999 and 2001 in Microelectronics from the
faculty of science of Monastir. Since 2002 he has been an Assistant Professor in Digital Electronic at the HighInstitute of Informatics and Telecommunication of Hamman Sousse and the National School of Engineering ofSousse. His research interest includes high level synthesis, methodologies development for reconfigurablesarchitectures.
Mohamed Abid is currently Professor at Sfax University in Tunisia. He holds a Diploma in Electrical Engineering
in 1986 from the University of Sfax in Tunisia and received his PhD degree in Computer Engineering in 1989 atUniversity of Toulouse in France. His current research interests include Hardware-Software System on Chip co-design, reconfigurable FPGA, real time system and embedded system. Dr. Abid has authored/co-authored over100 papers in international journals and conferences. He served on the technical program committees for severalinternational conferences. He also served as a co-organizer of several international conferences.
http://www.celoxica.com/

http://www.xilinx.com

an efficient list scheduling algorithm for time placement problem

Documents