[ieee 2013 sixth international conference on advanced computational intelligence (icaci) - hangzhou,...

2013 Sixth International Conference on Advanced Computational Intelligence October 19-21, 2013, Hangzhou, China

Solving Noe Mapping Problem with Improved Particle Swarm

Algorithm

Zhengxue Li, Yang Liu and Mingsong Cheng

Abstract-Network on chip (NoC) mapping problem belongs to quadratic assignment problem, and quadratic assignment problem is a classical combinatorial optimization problem which has been proved to be a NP-complete problem. In this

paper, the widely used 2D-mesh as a NoC topology is studied, and the important factors influencing the performance of the system-power consumption and delay are modeled. Then the approximate optimal solution is found using the improved discrete particle swarm optimization algorithm. Simulation experiments show that very good optimal results are obtained.

I. INTRODUCTION

Because of science progress and increase of market demands, the integrated circuit designer can integrate more and more functions on a single chip, so System on chip (SoC) is produced. Meanwhile, the traditional bus structure is facing with many problems such as poor extensibility and low efficiency of average communication time, thus the new idea NoC with full system architecture has been formally proposed in 2000 [1]. This technique draws lessons from design thought of parallel computing and computer network, which involves many key techniques including network topology, routing algorithm, switching technology, QoS (Quality of Service), flow control, resource network interface, performance evaluation and mapping, etc. In this paper we mainly concern NoC mapping problem.

II. NoC

A. NoC mapping problem

NoC mapping problem means that based on the given task graph, design constraints (power consumption and/or delay, etc.) and IP (Intellectual Property) database, every task is distributed to an appropriate IP kernel, and arranged to be in suitable execution sequence, then the location of every IP kernel in the NoC architecture is determined. As so far now the problem always is solved by intelligent algorithm [2].

In order to clearly discuss the problem, we assume that there is an one-to-one correspondence between the tasks and the IP kernels, or in other words, every task is allocated to one IP kernel and every kernel accomplishes only one task. In this paper we use two-dimensional network topology structure (see figure 1.).

Zhengxue Li and Mingsong Cheng are with the School of Mathematical Sciences, Dalian University of Technology, No.2 Linggong Road, Ganjingzi District, Dalian, China (email: {Iizx.mscheng}@dlut.edu.cn). Yang Liu is with the Huawei Technologies Co. Ltd, Longgang district, Shenzhen, China (email: [email protected]).

This work was supported by the National Natural Science Foundation of China (No. 11171367) and by the Fundamental Research Funds for the Central Universities (No. DUTI2JS04 and No. DUT13LK04).

978-1-4673-6343-3/13/$3\.00 ©2013 IEEE 12

(a) TG(V, E)

(b) PG(U, F)

Fig. I. IP Kernel Task Graph and Noc Topology

definition 1 Assume the directed graph TG(V, E) is the communication task graph, then every node Vi E V denotes an IP kernel, eij E E denotes data communication from kernel Vi to kernel Vj, and the weight of eij denotes the communication bandwidth from Vi to Vj.

definition 2 Assume the directed graph PG(U, F) is the Noe architecture graph, then every node Ui E U denotes the position of a waiting kernel, Iij E F denotes communication way from node Ui to node Uj, and the weight of Iij denotes the maximum bandwidth from Ui to Uj.

definition 3 Assume the mapping from TG(V, E) to PG(U, F) is one to one, that is, map : TG --+ PG, map(vi) = Uj, \:jVi E V, 3uj E U, and IVI ::.; lUI.

B. Model of Communication Power Consumption

A model of power consumption is given in [3]. It is used to estimate the average energy of I-bit data transferred from node Ui to the neighbor node Uj:

where ESbit, ELbit are the energy comsumed on the exchanging switch and on the link between tiles, respectively, nhops is the number of routers the bit passes on its way. (I) is the data communication consumption model of 2Dmesh Noe architecture. It is easy to know from (1) that the data communication consumption is proportional to the number of exchanging switch. Assume the minimal routing algorithm is used, then the data communication consumption is proportional to the manhattan distance between the starting point and the end-point, and (I) can be revised to

where dui,ui is the manhattan distance between node Ui and Uj which satisfies

nhops = k x (dUi,Ui + 1)

If IP kernels Vi and Vj are mapped to the position nodes map( Vi), map( Vj), respectively, and the bandwidth of the topology structure is enough for demanding, then the data communication between IP kernels Vi and Vj is the sum Nij (or Nji) of the weights of directed edges eij and eji. Together with (2), the power consumption of data communication between IP kernels Vi and Vj is

EUi,Ui = [ (dUi,Ui + 1) X ESbit + dUi,Ui x ELbitl x k x Nij

For convenience of study, the above equation can be revised to

EUi,Uj = dUi,uj X (ESbit + ELbit) x Nij = dU.i ,Ui X Ebit X Nij (3)

= d (map(Vi), map(vj)) x e(vi, Vj)

where Ebit E Sbit + E Lbit is the energy one bit data transferred between the two adjacent switches, d(map(vi), map(vj)) is the manhattan distance of map(vi) and map( Vj), and e( Vi, Vj) is the power consumption of data communication per unit distance between IP kernels Vi and Vj, respectively. If there is no data communication between Vi and Vj, then e(vi,Vj) = O.

From the above analysis we can deduce that the total power consumption of data communication is

E(C) = L d(map(vi), map(vj)) x e(vi' Vj) (4) \/V'i ,Vj EV

The objective for the optimization model of data communication power consumption is to find a mapping from TC(V, E ) to PC (U, F) satisfies minE ( C).

13

C. Model of Communication Delay

Noe data communication is dynamical. The communication delay is small when the link has no congestion; on the other hand, it is big when the link has congestions. Therefore, it is very hard to predict the communication delay precisely. In this paper we revery the objective of optimization delay through balancing the link load for reference in [4] with a different definition. In [4], the variance of the link load is defined as

1'.1

V AR(L) = L (Load(l i) - Load(l)aug)2 1M (5)

i=1

where I i is the ith link, M the total links, Load(l i) the load of I i, and Load(l)avg the average load of the link. The variance represents the discrete degree of distribution, the bigger the variance, the more non-uniform the distribution. So the optimization objective of communication delay is transformed into minimize the variance of the link load.

The X - Y deterministic routing algorithm is used in this paper. A brief graph of the Noe architecture is figure 2, and the number on node is the position number. We explain the items of equation (5) combined with figure 2.

CD (D-l<D 0 8 CD CD 0 -

CD CD 0 CD @-1-0-1-Q) CD

CD 0 (0 @ CD @ @ (0

@ @ @ 0 @ @ @ @ (a) (b)

CD CD Y 0 CD 0-1� 0 N

CD CD 0 CD 0-101.0 CD

CD 0 Vl@ CD 0 Vl@ @ @ @ 0 @ @ @ 0

(c) (d)

Fig. 2. Link Load Diagram

In figure 2, (a), (b) and (c) are three communications from kernel to kernel, respectively. For example, the route in 2( a) represents the communication route from IP kernel 2 to IP kernel 11. In this route, 2 --+ 3, 3 --+ 7 and 7 --+ 11 are three links and the number on the link is the weight of this link which represents the number of communication passes through it. In X - Y routing algorithm, the numbers are all 1. If the overall communication task is the sum of

these three communications, then 2 (d) represents the overall communication task. Ii in the figure denotes the number passing over it after all the communications are ended, that is, the load of this link Load(l i). !vI is defined as the sum of all the weights. In figure 2, !vI is 9 and the number of all links is 6, so Load(l)avg = 9/6 = 1.5.

III. PARTICLE SWARM ALGORIT HM

A. Basic Particle Swarm Algorithm

Particle swarm optimization algorithm (PSO) is a new stochastic optimization algorithm based on swarm intelligence. It imitates foraging behavior of birds, compares the searching space with flying space of bird, and any bird is regarded as a particle without mass or volume which represents a candidate solution. Searching the optimal solution in the optimization problem is analogous to searching the location of diet. PSO algorithm makes a simple behavior rule for any particle, hence the motion of the whole particle swarm exhibits the characteristics analogous to foraging behavor of birds. Furthermore, it can be used to solve some complex optimization problem [5-8].

Based on the unique searching engine, PSO algorithm first initialize particle swarm randomly in the feasible solution space and velocity space, which determines the initial position and velocity of the particles. For example, the position and velocity of the ith particle in ad-dimensional searching space can be denoted by Xi = (Xi,l, Xi,2, . . . , Xi,d) and Vi = (Vi,l, Vi,2,'" ,Vi,d), respectively. By evaluating the objective function of every particle, determining the best position Xibest = (Xibest,l, Xibest,2,'" ,Xibest,d) of the ith particle and the best position Xgbest (Xgbest,l, Xgbest,2,'" ,Xgbest,d) of the whole swarm in time t, then we revise the position and velocity for a particle using the following equation:

Vi,j(t + 1) = W Vi,j(t) + CITI [Xibest,j - Xi,j(t)] + C2T2 [Xgbest,j - xi,j(t)]

Xi,j(t + 1) = Xi,j(t) + vi,j(t + 1), j = 1,2"" ,d

(6)

(7)

where W is the inertia factor, Cl and C2 the positive acceleration constants, Tl and T2 the random numbers uniformly distributed in [0, 1], respectively. In addition, the particle movement can be restricted appropriately by setting the velocity interval [Vmin, vmax ] and the position interval [Xmin' xmax ].

There are usually global and local versions of basic particle swarm algorithm. In global version, the two extreme values being traced are the best position of the particle itself and the best global position of the swarm. Meanwhile, in local version, besides its best position itself, the algorithm traces the best position of all its neighbors instead of the best global position.

B. Improved Particle Swarm Algorithm

Now the continuous PSO algorithm is the focus, however, study on discrete particle swarm optimization (DPSO) algorithm is relative less. In [5] James Kennedy and Russell

14

Eberhart proposed a binary DPSO in which the position is represented by 1 and 0, meanwhile the velocity is continuous. in [6] PSO is used to solve car route problem with time window, in which the position and the velocity are integers, however, the computational rules are continuous and the result numbers are rounding upward with taking boundary value if exceeding range. It is a pity that these methods don't take consideration to the features of discrete combinatorial optimization problems, so inevitably they have deficiency such as redundant representation or low searching efficiency.

The important key to DPSO is the representation of position, velocity and relevant computational rules. In the following we briefly introduce the algorithm proposed in [7].

a) Position, Velocity of Particle and Computational Rules

1) Position of Particle

Denote the position of particle by x = (Xl,X2,'" ,Xd), where Xl, X2, . . . , Xd is a permutation of natural numbers 1,2, . . . ,d, d is the number of IP kernels. Xi represents that the Xith IP kernel is located on the ith position.

2) Velocity of Particle

Denote the velocity of particle by V = (VI, V2, . . . ,Vd), o -s: Vi -s: d, i = 1,2"" ,d. From (7) we know velocity can change the position of the particle. Different from x, every component Vi has two meanings: Vi = 0 denotes null operation, that is, taking the velocity into a position X doesn't affect the corresponding data Xi; on the other hand, if Vi is not zero, it changes the corresponding data Xi to be Vi. Of course, to protect the feasibility of solution in which X also is a permutation of the d positions in any operation v, it is actually a swap which exchanges the position Xi of X and Vi·

3) Addition of position and velocity

The addition of position and velocity makes particles move to a new position, denoted by X = X + v. Assume we denote swapping the component Xi of X and Vi by swap(xi,vi)' then every component of a new position is as follows:

{<I> , swap(Xi' Vi),

4) Substraction of position

if Vi = 0 otheTwise

(8)

The result of substraction of two positions X2 and Xl is velocity, denoted by V = X2 - Xl. It is computed through the following operations: compare every component Xl,i of Xl with that X2,i of X2, if they are equal, then set the corresponding velocity Vi = 0; otherwise, substitute Xl,i with X2,i in Xl. For example, Xl = (2,5,3,1,4), X2 =

(2,5,4,3,1), then V = (0,0,4,3,1). From the definition we know that the substraction and the addition of position and velocity are not reciprocal.

5) Scalar-multiplication of velocity

Scalar-multiplication of velocity is denoted by V2 = C· VI where C E [0,1] is a constant. In the actual computation, for every component velocity Vl,i of VI, generate a random number Tand E [0, 1]. If rand < c, then set the component V2,i of V2 equal to Vl,i, otherwise V 2,i = O. That is, V2,i can

be computed by

v' . - {VI'i' 2,2 - 0, 6) Addition of velocity

if rand < c

otherwise (9)

The sum of two velocities is a new velocity v = VI + V2, where every component Vi of v is defined by

if (VI,i cJ 0/\ V2,i = o)v (VI,i cJ 0/\ V2,i cJ 0 /\ rand < 0.5)

otherwise (10)

(10) shows that for any i, if only one of VI,i and V2,i equals zero, then set Vi not zero; if both of them are not zero, then randomly choose one from them with equal probability. The introduced random number in addition of velocity help to maintain diversity of the particle swarm. Of course, addition of velocity doesn't have commutative law.

7) Motion Equation of Particle

It is necessary to modify the motion equation of particle as for the above definitions. First eliminate the inertia item which is the first term in (6), since the position is "one-step to the position" because of the definition of velocity. Moreover, the coefficients of the 2nd and the 3rd item in (6) are both simplified as the random numbers generated in the definition of scalar-multiplication of velocity. Meanwhile, integrate the local and global versions of PSO and take account into the information of other particles, in the 3rd item of (6), if the random number rand> t/itmax (t is the current iteration step, 'itmax the maximum iteration step), then randomly choose a particle Xj except for the best particle, substitute Xgbest with the best position Xjbest of Xj, and update the velocity of current particle; otherwise, update it with Xgbest [8]. In particularly, the motion equation of particle is as follows:

if rand < t/itmax otherwise

(11 )

(12)

The role of inertia item is to produce disturbance and maintain the diversity of particle swarm, otherwise the algorithm would be early maturing rapidly without it. To solve this problem, later we will define repulsion operator to maintain diversity of particle swarm and make the algorithm keep evolutional ability at later stage.

b) Optimization Operator

PSO algorithm is a random searching process. In this searching process, PSO algorithm not only tries its best to ensure the global superiority of solution space, but also makes full use of the known solution space to approximate the current local optimal solution. PSO algorithm neglects the auxiliary function of characteristic information although it is a general algorithm, thus this makes the algorithm weak in local refinement. Another widely concerned problem of PSO algorithm is its early maturing convergence. The researches show that the early maturing convergence has a

15

close relationship on homoplasy of particle swarm, or the rapid decrease of diversity of particle swarm. To balance between space searching and local refinement of the modified algorithm, we define the repulsion operator to maintain the diversity of particle swarm and use local searching operator to improve the local refinement.

1) Basic Concepts

• Similarity si,j of particle position: it is the similarity degree of any two positions Xi and Xj, defined as

1 d Si,j = d L sim(xi,k, Xj,k)

k=1 (13)

where d is the dimension of position, and the function sim represents that if two parameters are equal, then the function value is 1, otherwise O.

• Diversity di of particle: it is dissimilarity between Xi, its historical best position and the global best position, defined as 1

di = 1 - :3 (Si,ibest + Si,gbest + Sibest,gbest ) (14)

• Average diversity J of particle swarm: it is the average value of all diversities, defined as

_ 1 size d=-. L di

szze i=1

2) Repulsion Operator

(15)

It shows that the particle rapidly tends to local optimal solution as iteration goes. Therefore, when the diversity of individual particle decreases to a certain degree, it must be strengthened by some operators in order to maintain the evolution ability. For this reason we define the repulsion operator. For every component of the position, if it equals to the historical best value, then generate a new velocity v

randomly with certain probability and take the action on the position in (12).

3) Local searching operator

Intelligent optimization based on population generally has better global searching ability, but poor local refinement meanwhile. So we define local searching operator. A neighbor N (x) of position X is defined as

N(X)={x/ 1 3r,s,rcJs,s.t. x�=xs,x�=xn } andVI cJ r,l cJ s,x; = Xl

The so called local searching operator searches a better solution than X in N(x).

IV. ALGORITHM

1) Set parameters and initialize the particle swarm. 2) In the ith iteration, if i >itmax, then goto step 6),

else continue step 3). 3) Compute (11), (12) based on the operation rules and

generate a new position for every particle. 4) Update the new position of every particle by repulsion

operator and local searching operator, and compute the fitness for the updated position.

5) If the updated fitness at this iteration is better than the historical best fitness, then revise the historical best fitness and record the corresponding position; else goto step 2).

6) End of the iteration process, output the historical best fitness and the corresponding position, sketch the experimental result graph.

V. SIMULATION EXPERIMENTS

The IP kernel task graph is shown in figure lea). The experimental results of communication power consumption and communication delay are shown in figure 3(a) and 3(b), respectively.

5800 ,---,---�--�--�--�--�----,

5600

5400

5200

5000

4800

4600

4400

4200

4000 L __ � __ �_---===::::::====:::::;:::=--==d o 10 15 20 25 30

(a) Communication Power Consumption

0.7r---�--�--�--�--�-----'

0.6

0.5

0.4

0.3

0.2

O.10L---�--'�0--�'5:---�20:--------::2�5 -----'30

(b) Communication Delay

Fig. 3. Experimental Results

The data results of communication power consumption and communication delay are shown in table I and 2, respectively. In the tables the best fitness is the fitness of the optimal solution searched by the corresponding algorithm, the average fitness is the average value of 20 experimental results. We haven't found communication delay results for simulation figure lea), thus we only compare with random mapping. The best mapping method searched by the algorithm of this paper is shown in table 3, and the corresponding fitness is 4028.

16

TABLE I

COMPARISON OF COMMUNICATION POWER CONSUMPTION

fitness Ref[9] Ref[IO] Random Mapping Average fitness

Best fitness 4089.6 4060

4347 4060

9848.6 6843

TABLE II

COMPARISON OF COMMUNICATION DE LAY

fitness Average fitness

Best fitness

Random mapping 1.5716 0.5806

TABLE III

New 0.1439 0.1275

T HE OPTIMA L MA PPING SOLUTION

10 8 6 4 9 7 2 5

II 12 3 I 13 14 16 15

VI. CONCLUSION

New 4051.2 4028

In this paper we study that solving Noc mapping problem by improved DPSO, and regard the communication power consumption and communication delay as the optimization objectives in the model, respectively. From the simulation results we obtain the following conclusions:

(I) Regard the communication power consumption as the optimization objective, the algorithm in this paper has a better result than that in [9] and [10].

(2) Regard the communication delay as the optimization objective, the algorithm in this paper has a significant optimal result than that of random mapping.

REFERENCES

[I] M. Gao and G. Du, "NoC: next generation mainstream architecture for integrated circuits," Microelectronics. vol. 36, no. 4, pp. 461-466, 2006.

[2] R. Sun and Z. Lin, "NoC process elements mapping using genetic algorithm," Computer Science. vol. 35, no. 4, pp. 51-53, 2008.

[3] 1. Hu and R. Marculescu, "Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints," Design. Automation and Test in Europe Conj'erence and Exhibition, vol. I, pp. 234-239, 2004.

[4] S. Yang, L. Li, M. Gao and Y. Zhang, "An energy-and delay-aware mapping method of NoC," Acta Electronica Sinica. vol. 36, no. 5, pp. 937-942, 2008.

[5] 1. Kennedy and R. Eberhart, "Discrete binary version of the particle swarm algorithm," Proceedings oj' the IEEE International Conj'erence on Systems, Man and Cybernetics, vol. 5, pp. 4104-4108, 1997.

[6] N. Li, T. Zhou and D. Sun, "Particle swarm optimization for vehicle routing problem with time windows," Systems Engineering-Theory and Practice, vol. 24, no. 4, pp. 130-135, 2004.

[7] Y. Zhong and R. Cai, "Discrete particle swarm optimization algorithm for QAP," Acta Automatica Sinica, vol. 33, no. 8, pp. 871-874, 2007.

[8] c. Wang, J. Zhang, Y. Wang and J. Heng, "Modified particle swarm optimization algorithm for traveling salesman problem," Journal oj' North China Electric Power University, vol. 32, no. 6, pp. 47-51, 2005.

[9] H. Lin, L. Zhang, D. Tong, X. Li and X. Cheng, "A fast hierarchical multi-objective mapping approach for mesh-based networks-on-chip," Acta Scientiarum Naturalium Universitatis Pekinensis, vol. 44, no. 5, pp. 711-720, 2008.

[10] 1. Qi, H. Zhao, 1. Wang and Z. Li, "A new hierarchical genetic algorithm for low-power network on chip design," International Conj'erence on Intelligent Control and InjcJrrnation Processing. vol. 2, pp. 159-162, 2010.

[ieee 2013 sixth international conference on advanced computational intelligence (icaci) - hangzhou,...

Documents