hacs: a novel cost aware paradigm promising fault tolerance on mesh-based network on chip...

12
HACS: A novel cost aware paradigm promising fault tolerance on mesh-based network on chip architecture q Melika Tinati a,, Ahmad Khademzadeh b , Ali Afzali-Kusha c , Majid Janidarmian a a CE Deptartment, Science and Research Branch, Islamic Azad University, P.O. Box 14515-775, Tehran, Iran b Iran Telecommunication Research Center, P.O. Box 14155-3961, Tehran, Iran c Department of Electrical and Computer Engineering, Faculty of Engineering, University of Tehran, P.O. Box 14395/515, Tehran, Iran article info Article history: Received 27 January 2011 Received in revised form 30 January 2012 Accepted 1 February 2012 Available online 12 March 2012 abstract As the integration of transistors on today’s embedded systems scales, so does the shrinking size of chips, thus making the on-chip communication a challenging issue on the VLSI designs. However, network on chips have emerged as a promising technology to tackle the on-chip communication constraints. Likewise, the reliability issues have become the salient problem, since regarding to the inaccessible failures of on-chip elements, there must be some levels of embedded fault tolerance techniques. In this paper, an innovated technique is revealed providing fault tolerance in the on-chip networks over single and multiple permanent switch failures. The experimental results achieved by the system sim- ulation in SystemC TLM environment are validated with the mathematical analysis mod- eled for system reliability that we extend in this paper, which demonstrate the extensive reliability enhancement of this paradigm. Along with the system improvement, silicon area overhead is calculated utilizing VHDL low level simulation and Orion synthesis. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Rapid development in silicon technology of bus-based on-chip interconnects has become a bottleneck as they are unable to cope with the growing number of participating cores on a chip [1]. Since semiconductor technology continues its phenom- enal growth and follows the Moore’s Law, the amount of computation power and storage that can be integrated on a chip increases. Meanwhile, with growing of the computation logic, the performance of on-chip interconnections does not scale as well. These issues call for a well-structured design approach, modularized design methodology, clear programming model and predictable behavior of the system [2]. The increase in design complexity, on the other hand, has made the reusability of embedded cores imperative in order to reduce the productivity gap. Reduced feature sizes into the nanoscale regime, along with increasing transistor densities, have transformed the on-chip interconnect into a deciding factor in meeting the performance and power consumption budgets of the design [3]. Network- on-chips (NoCs), as a state-of-the-art, has emerged to overcome the inherently stringent resource constrains of system-on- chip (SoC) technology, and to handle the growing number of communicating components on a single chip. It is a promising paradigm due to its advantages like greater reusability, scalability, predictability in the electrical parameters in addition to its importance in bandwidth guarantee required applications. Network on chip has also changed the traditional computa- tions and sequential algorithms to a new paradigm, in which there is concurrency, communication and interaction in every aspect of design. It has shifted our focus from computation to communications [4]. Packet-based interconnection networks, 0045-7906/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.compeleceng.2012.02.004 q Reviews processed and approved for publication by Editor-in-Chief Dr. Manu Malek. Corresponding author. Tel.: +98 9123752148. E-mail addresses: [email protected] (M. Tinati), [email protected] (A. Khademzadeh), [email protected] (A. Afzali-Kusha), [email protected] (M. Janidarmian). Computers and Electrical Engineering 38 (2012) 963–974 Contents lists available at SciVerse ScienceDirect Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Upload: independent

Post on 11-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Computers and Electrical Engineering 38 (2012) 963–974

Contents lists available at SciVerse ScienceDirect

Computers and Electrical Engineering

journal homepage: www.elsevier .com/ locate/compeleceng

HACS: A novel cost aware paradigm promising fault toleranceon mesh-based network on chip architecture q

Melika Tinati a,⇑, Ahmad Khademzadeh b, Ali Afzali-Kusha c, Majid Janidarmian a

a CE Deptartment, Science and Research Branch, Islamic Azad University, P.O. Box 14515-775, Tehran, Iranb Iran Telecommunication Research Center, P.O. Box 14155-3961, Tehran, Iranc Department of Electrical and Computer Engineering, Faculty of Engineering, University of Tehran, P.O. Box 14395/515, Tehran, Iran

a r t i c l e i n f o a b s t r a c t

Article history:Received 27 January 2011Received in revised form 30 January 2012Accepted 1 February 2012Available online 12 March 2012

0045-7906/$ - see front matter � 2012 Elsevier Ltddoi:10.1016/j.compeleceng.2012.02.004

q Reviews processed and approved for publication⇑ Corresponding author. Tel.: +98 9123752148.

E-mail addresses: [email protected] (M. Tinati

As the integration of transistors on today’s embedded systems scales, so does the shrinkingsize of chips, thus making the on-chip communication a challenging issue on the VLSIdesigns. However, network on chips have emerged as a promising technology to tacklethe on-chip communication constraints. Likewise, the reliability issues have become thesalient problem, since regarding to the inaccessible failures of on-chip elements, theremust be some levels of embedded fault tolerance techniques. In this paper, an innovatedtechnique is revealed providing fault tolerance in the on-chip networks over single andmultiple permanent switch failures. The experimental results achieved by the system sim-ulation in SystemC TLM environment are validated with the mathematical analysis mod-eled for system reliability that we extend in this paper, which demonstrate the extensivereliability enhancement of this paradigm. Along with the system improvement, silicon areaoverhead is calculated utilizing VHDL low level simulation and Orion synthesis.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Rapid development in silicon technology of bus-based on-chip interconnects has become a bottleneck as they are unableto cope with the growing number of participating cores on a chip [1]. Since semiconductor technology continues its phenom-enal growth and follows the Moore’s Law, the amount of computation power and storage that can be integrated on a chipincreases. Meanwhile, with growing of the computation logic, the performance of on-chip interconnections does not scale aswell. These issues call for a well-structured design approach, modularized design methodology, clear programming modeland predictable behavior of the system [2]. The increase in design complexity, on the other hand, has made the reusabilityof embedded cores imperative in order to reduce the productivity gap.

Reduced feature sizes into the nanoscale regime, along with increasing transistor densities, have transformed the on-chipinterconnect into a deciding factor in meeting the performance and power consumption budgets of the design [3]. Network-on-chips (NoCs), as a state-of-the-art, has emerged to overcome the inherently stringent resource constrains of system-on-chip (SoC) technology, and to handle the growing number of communicating components on a single chip. It is a promisingparadigm due to its advantages like greater reusability, scalability, predictability in the electrical parameters in addition toits importance in bandwidth guarantee required applications. Network on chip has also changed the traditional computa-tions and sequential algorithms to a new paradigm, in which there is concurrency, communication and interaction in everyaspect of design. It has shifted our focus from computation to communications [4]. Packet-based interconnection networks,

. All rights reserved.

by Editor-in-Chief Dr. Manu Malek.

), [email protected] (A. Khademzadeh), [email protected] (A. Afzali-Kusha), [email protected] (M. Janidarmian).

964 M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974

known as NoC architectures, are thus increasingly adopted in today’s SoC designs, which support numerous homogeneousand heterogeneous functional modules. NoC integrates some heterogeneous resource (for example, CPU, ASIC, DSP, etc.) in ahomogeneous on chip network environment [5]. Embedded cores, as the heterogeneous resources, are predesigned and pre-verified intellectual properties (IPs) that maybe soft, firm, or hard, and are typically heterogeneous with respect to function-ality, size, and communication bandwidth [6]. A switch is typically embedded within each regular tile composing the NoCarchitecture in array form and thus, instead of routing design-specific global on-chip wires, the inter-tile communicationcan be achieved by routing packets [7].

A direct adaptation of network protocols to NoCs is impossible, due to the different communication requirements, costconsiderations, and architectural constraints [8]. Primarily, two types of faults can upset the NoC systems: permanent faultsand transient faults. Permanent faults, as the name shows, cause permanent damage to the circuit. These faults cause phys-ical changes in the circuits whose behavior does not change with time. Electromigration of a conductor, broken wires, dielec-tric breakdowns, etc. are a few examples of permanent failures on a chip. Permanent faults, at one level, are modeled asstuck-at faults, or as fail-stop model, where a complete module malfunctions and informs its neighbors about its out-of-or-der status. Transient failures also may occur on a chip for many reasons: alpha particles emitted by trace uranium and tho-rium impurities in packages and high-energy neutrons from cosmic radiations can cause soft errors in semiconductordevices. Similarly, low energy cosmic neutrons interacting with isotope boron-10 can cause soft errors. These events, gen-erally called single-event upsets, can affect the storage elements of a chip such as latches, memory and registers [9–12].

Aggressive technology scaling has accentuated the reliability issue due to rapid increase in the prominence of permanentfaults; these are mostly caused from accelerated aging effects such as electromigration, and manufacturing and testing chal-lenges. Furthermore, soft upsets caused by crosstalk, coupling noise and transient faults are also a concern to system reliabil-ity. The growing concern about reliability has prompted extensive research in this area. Nevertheless, a comprehensiveapproach encompassing all issues pertaining to NoC reliability has yet to evolve [4]. Although already available standarddiagnosis and fault tolerance tests may be applied to NoCs, they do not exploit any particular network properties like packetsbeing forwarded over the network or links or routers failing. Furthermore, relaxing the requirement of 100% correctness inthe operation of various components and on-chip channels profoundly reduces the manufacturing cost as well as cost in-curred by test and verification [10]. This argument strengthens the notion that chips need to be designed with some levelof built-in fault tolerance [9].

NoCs are designed over three key considerations which are evaluated in different aspects. Topology, mapping algorithmand routing strategy are fundamentally the basic scopes that an NoC system is considered with to minimize the systemrequirements and deal with the critical parameters to achieve an optimal design. The aim of NoC synthesis is to find a suit-able network infrastructure with minimum cost. Basically, the topology represents the network interconnection and indi-cates how the network nodes communicate over the physical layout. Mapping algorithm also demonstrates how the IPcores are mapped to the network tiles. In other word, such algorithm maps the core graph produced from system task man-ager to the topology graph determined by the topology algorithm performed on the physical interconnection. On the otherhand, routing strategy determines the mechanism in which the communication packets traverse through the network fromsources to destinations. These three fundamental aspects of NoC design compel reliability issues in different grains. Variousfault tolerant mechanism have been proposed through a widespread arena of the NoC design considerations. Each methodpersistently induces overhead for the system with fault tolerant facilities to promote system reliability. The system designershave to determine a graceful trade-off over the imposed overhead of different fault tolerant techniques and overall cost ofthe system implementation. Such overheads induce either hardware, software or time redundancies regarding the specialtythat the utilized fault tolerant mechanism enhances.

In this paper we present an innovated fault tolerant mesh-based NoC architecture improving the system reliability eval-uated with a novel analytical technique. This hierarchical architecture tolerates multiple faults of permanent switch failurescaused due to any reason. The hardware overhead is also calculated by analytical evaluations and the results are compared tothe previous fault tolerant mechanisms presented in the literature [13]. The simulation results of system implementationhave been validated by the analytical results of high transaction level simulation of SystemC TLM 2.0 and a general Math-ematica code. The results indicate an extensive improvement versus the previous methods regarding the system responsetime and reliability. In the proposed novel architecture we have obtained the sparing switch idea to present a noveltechnique.

The paper is organized as follows; in section two, we take a look to the related work in the literature of fault tolerant sys-tems of NoCs. Then, in section three, the preliminaries are discussed followed by the problem definition. The proposed faulttolerant architecture is then explained and the results of the simulation are illustrated later in section four. Finally, sectionfive concludes the paper and review the overall contribution of it.

2. Related work

Power consumption is one of the most challenging issues in NoC architecture due to the considerations in VLSI and ULSItechnologies. Thus, the fault tolerant mechanisms used in computer networks are not suitable for NoC systems and VLSI cir-cuits, since in the computer networks extensive hardware redundancy obtained in these techniques is not a critical challengefor the power resources. Hence, both research community and industrial manufacturers attend to study new fault tolerant

M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974 965

strategies compatible for the on-chip interconnections. On the other hand, widespread thematic arena of NoC design pro-vides a variety of aspects for academic researchers to evaluate the fault tolerance in these on-chip networks. Some of theseaspects are on-chip routing algorithms, NoC hardware, which consists of various subsections, and encrypting and coding oftransmission data on the network. Each fault tolerant technique imposes redundancies to the system, and the system de-signer is supposed to deal a tradeoff between the system reliability and implementation cost. In other word, along withthe improvements in the reliability, sometimes a fault tolerant mechanism induces an extensive overhead and redundancywhich is not efficient to perform in the system.

Since SoCs are constituted of similar repetitious components, it is common to utilize a simple inactivating mechanismwhen such a component fails. Though, this simple inactivating technique is not appropriate to overcome the router failuresin NoCs. Hence, to sparingly consideration of silicon area of chips and to minimize the latency in the network, in most of theNoCs, a distinct routing algorithm is beneficiary utilized with respect to the micro-network’s regularity. When a router failsin these regular networks, the inactivation concept does not solve the problem by itself, since the network is no more regular,and if the routing algorithm follows its ordinary rules, the network will be blocked. Thus, the routing algorithm must pro-gress considering the fault and adjust with the failure.

In [15], authors use the virtual channel concept to deal with the faulty routers in the network, in which a physical channelis communized for multi-transmissions in a time division multiplexing fashion. In this algorithm, the transmission packetsare routed around the faulty router through the specified virtual channels, while they can traverse following any of the threevirtual channels when they are not around failed routers. It is clear that the hardware cost of the routers must be low aspossible in NoCs. Thus, the algorithms using virtual channels due to intense complexity and area occupation experiences costrestrictions in NoCs. In [16], on the other hand, authors utilize three adaptive routing algorithms to turn a faulty region orrouter with no additional virtual channel. In this algorithm dead lock rings are broken with turn-based rules benefiting thechannel dependency graph. Meanwhile, a dynamic routing algorithm with lower complexity and source dependency is pre-sented in [17], where hard-coding predesigned routing tables used in common dynamic routing algorithms. When a com-ponent is failed here, it informs its neighbors so they dispatch the news with a special controlling packet and finally, allthe network routers update their routing tables. The routers transmit the data packets through the shortest path determinedwith a particular algorithm. Finally, the communication service progress while gracefully degrading the service level.

Utilizing a fault tolerance technique to deal with the inter-router faults is also feasible surveyed in [3]. In this paper fourdifferent router components are introduced to term the four-stage pipeline router architecture. For various fault scenarios ofeach component, a fault tolerant mechanism is established here. These mechanisms differ from hardware components addedto the system, to data coding techniques and software redundancies. For hardware redundancies, it has been tried to min-imize the complexity and power consumption by obtaining a simple non-sequential logic gate added to the system. Also, forsoftware and data redundancies the authors have used the optimum algorithms and techniques to moderate the overhead.On the other hand, fault tolerance can also be used to advance network topology and architecture. A mesh-based fault tol-erant NoC architecture has been proposed in [18] and [19]. In these two papers the authors utilize two different algorithmsto select a spare switch for each router in the case they go faulty. An extra link has been added to the system for each spareswitch. Although Exhaustive algorithm guarantees to present the best solution for the whole network regarding system re-sponse time, it is absolutely impossible to be executed for NoCs with moderate number of cores, due to extensively high runtime of the algorithm. To regulate the time complexity of Exhaustive algorithm, Greedy algorithm tries less iterations andgreedily selects the spare switches. Hence, the best network configuration is not guaranteed in Greedy algorithm, but therun time is much better than Exhaustive. In the FERNA algorithm [13], the spare selection is intensively improved thatnot only reveals the best result and solution equal to the one of Exhaustive algorithm, but also the response time of the algo-rithm scales with a linear order of network cores, which is incomparably lower than previous algorithms. FERNA presents thebest solution for whole network utilizing a hierarchical method based on completion of rings of cores, and expresses the re-sult regarding to system response time and extra communication cost. This algorithm obtains the best result to our bestknowledge. In the recently mentioned three algorithms, one switch failure is assumed at a certain moment, and in otherword, the NoC architecture tolerates one fault at a time. Here, in current work, an NoC architecture is revealed that is faulttolerant to multiple faults of switches, which is feasible in real chips.

3. Preliminaries and problem definition

In the following, some necessary definitions of NoC have been reviewed. As the first definition, the core graph is a direc-tional graph showing the network cores and the way they communicate with each other. The nodes indicate the core of theNoC and the weighted edges delineate the bandwidth that each two node communicate data through it together. Meanwhile,the topology graph is also a directional weighted graph that shows the layout and physical communication media. Once thecore graph is implemented on a chip, the topology graph shows how the cores take place on the chip and communicatethrough the physical media of the topology graph. A mapping algorithm is the function of mapping by which the nodesof core graph are mapped and correlated to the nodes of the topology graph. Fig. 1 illustrates a real application core graphand the topology graph of this application on a grid wiring. If dk is considered as an edge on the core graph, and vl(dk) denotesthe bandwidth needed for a core to communicate the other, while D is the set of edges of the core graph for a specific appli-cation, D can be defined as follows:

Fig. 1. Core graph of application MPEG-4 mapped on a 4 � 4 mesh to make the topology graph.

966 M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974

D ¼dk

: v lðdkÞ ¼ commi;j; k ¼ 1;2; :::; jEj; 8ei;j 2 E;

with sourceðdkÞ ¼ mapðv iÞ; destðdkÞ ¼ mapðv jÞ

( )ð1Þ

where the weight of edge ei,j is illustrated by commi,j, which shows the communication volume transferred from source vi tothe destination vj. We know that finding the best alternative switch as a spare is obtained, and the algorithm is adopted onlywhen the application cores are mapped on a chip according the criteria.

To find a best spare switch for each faulty router of the network, one feasible evaluation is applied by comparing the sys-tem response time simulated in SystemC TLM 2.0 libraries in [13]. The worst case of average response time for whole systemis considered as an evaluation parameter, then. The average response time of the system is calculated regarding to the max-imum delay of the last bit of a packet received in the destination core, which is also estimated with high level integration ofSystemC TLM simulation. For this computation, each individual switch is assumed faulty for each iteration of the simulationand the worst case of delay is calculated considering FIFO mechanism for the queues of the network. The average responsetime of the system for each faulty switch and sparing each one of its neighboring switches is inputted for the spare switchalgorithm. The other possible evaluation is according the extra communication cost that a spare switch selection imposes tothe system. In this evaluation, we try to minimize the extra communication bandwidth that the system needs while a switchis faulty and a spare switch selection algorithm is applied to select a spare switch. The communication cost of the bandwidthrequirement between source vi to the destination vj is calculated by Eq. (2) as follows:

comm cost ¼XjEjk¼1

vlðdkÞ � distðsourceðdkÞ;destðdkÞÞ� �

ð2Þ

where the function dist(i,j) demonstrates the node hops from i to j. The bandwidth required on each communication linefrom node a to the node b is evaluated in Eq. (3):

8link la;b BWðla;bÞ ¼XjEjk¼1

vlðdkÞ � f la;b; rsourceðdkÞ;destðdkÞ

� �f ðla;b; ri;jÞ ¼

1; ifla;b 2 Lðri;jÞ0; otherwise

� �ð3Þ

When there is a faulty switch on a path and the packets are consequently rerouted from an alternative path because ofthat fault, there might be some unused links in the former path that can be reused. This is a fraction of the above calculatedbandwidth of a communication line, which is declared by Avi,s(BW(lu,v)) as the available bandwidth in the rerouted paths.Meanwhile, if the total extra required bandwidth for a link denoted by TERBi,s(la,b) is including the rerouting paths, thenthe extra cost of the link is calculated regarding to the free bandwidth if available, as Eq. (4):

ECLi;sðla;bÞ ¼ Extra Cost of Linka;b ¼

0;Av i;sðBWðla;bÞP TERBi;sðla;bÞ;Av i;sðBWðla;bÞÞ ¼ Av i;sðBWðla;bÞÞ � TERBi;sðla;bÞTERBi;sðla;bÞ � Av i;sðBWðla;bÞÞ; otherwise

Av i;sðBWðla;bÞÞ ¼ 0

8>>><>>>:

9>>>=>>>;

ð4Þ

When a switch fails, the total extra communication cost for different possible spare selections are calculated. Then, thetotal extra cost for lines are computed and applied to the spare selection algorithm as inputs.

TECLi;s ¼ Total Extra Cost of Line ¼X

8la;b2LðRerouting:Path:of :rriðDÞÞECLi;sðla;bÞ ð5Þ

Different spare selection algorithms try to select the best spare switches regarding the following equations:

M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974 967

Final ECLðla;bÞ ¼ Max ECLi;candidateðla;bÞ� �

; i ¼ 1;2; . . . ;n2 ð6Þ

Extra comm cost ¼X8l2la;b

FinalECLðla;bÞ ð7Þ

In FERNA algorithm [13,14], the best spare selection is obtained while minimizing the extra communication cost of thesystem, and with respect to the system response time achieved from the best possible selection in Exhaustive algorithm [19].Thus, the time complexity of FERNA is much more moderated than Exhaustive algorithm due to the heuristic ring-basedtechnique. In these recent fault tolerant mechanisms, only one spare switch is selected for each router to preclude the com-plexity in system implementation and wirings. Hence, in some situations the spare switch is compulsorily located far fromthe cores that are communicating it in low bandwidth. Although FERNA achieves the best result of spare selection for wholesystem and it is an optimized algorithm, a comprehensive solution would minimize the low bandwidth communication andat the meantime improve the system response time.

For more clarity assume the configuration of Fig. 2 as the real application of Video Object Plane Decoder (VOPD) with 16cores mapped on a mesh topology and FERNA algorithm is applied to the system to making it fault tolerant confronting withthe single switch failures. In this figure the cores and switches are illustrated with circles and squares, respectively. The chro-matic connection between a core and a switch shows that the indicated switch is the local switch of the core in mesh topol-ogy. The pale connections on the other hand, depict the spare switches for each core selected by the FERNA algorithm. Thearrows on the pictures show the communication flows and the paths for these transmissions, while the numeric notationsdenote the bandwidth requirement of the communication. As an instant, if the spare switch of core number 8 was switchnumber 9 instead of switch number 7, although the paths with switch 7 as one end would be suffer more hops, the pathswith switch 9 as one end would be improved extensively. As another example, if the spare switch of core 6 that is now switch5, were switch 12, the two-hop data transmission of core 12 to core 6 would cost no bandwidth since the communicationwould take place directly between the cores. The purpose of this paper is to eliminate or degrade the bandwidth require-ments of the communication lines and decreasing the data path lengths. The overall concept of this idea is to providetwo spare switches for each core that work exclusively in a hot-and-cold fashion.

4. HACS fault tolerant architecture

The proposed architecture of this paper is aimed to improve the system reliability confronting with multiple switch fail-ures while minimizing the communication cost and data flow path lengths. This NoC architecture tolerates little hardwareredundancy and extensively improves the system reliability. It also reduces system response time and intensely degrades theextra communication cost of the system. To prepare such configuration on a mesh-based layout, we first apply a fault tol-erant spare selection algorithm like FERNA and determine the switch spares. We then classify these spares as hot spare class,which are prepared to be permanently active. After the selection of hot spares for the IP cores, the next step in this mech-anism is to select cold spare class of switches for the cores. An important point to mention here is that although the cores arestill not firmly located in the network layout, they have certainly lost half of the possible locations between the switch, due

Fig. 2. Topology graph of VOPD application while FERNA is applied for spare switch selection.

968 M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974

to the determined hot spare selection. The uncertain location of the core is arranged so to be able to alter the location for coldspare selection if necessary. The name of the architecture is derived from the recent definition of hot and cold spares which isHACS.

While applying FERNA to the mesh-based architecture of an application, a file is obtained containing the list of neighborsof each core ordered in priority of suitability for being spared with respect to the system response time and performance.This list file is utilized as an input on next step of HACS simulation. At this step of simulation, while the hot spares have beenselected earlier, the next proper switch for each core is selected as its cold spare according to the list file and the locations leftfor the core to take place. A cold spare switch is a spare switch that is in cold-stand-by status. When a packet is transmittingfrom a source core and the local switch of the core is faulty, a logic circuit will be activated to check the distance of desti-nation core from the hot and cold spare. If the cold spare switch is closer, the hot spare switch’s status will be changed tostand-by and the packet will be sent from the cold spare switch; otherwise, the cold spare switch keeps on being stand-by.The starting point among the network cores to select the cold spare, and the progressing through the cores for simulationsteps are determined according to the communication cost based on ranking equation below:

RankingðCiÞ ¼X

8j¼1;2;::jvji–j

ðcommi;j þ commj;iÞ ð8Þ

where commi,j and commj,i denote the bandwidth requirements for communicating cores vi to vj, and vi to vj, respectively. Inother word, a core with high communication volume is the first core to be served by the cold spare selection algorithm. Hotspare switch on HACS architecture behaves the same as spare switches on FERNA applied architecture. The only difference isthat in HACS architecture, a packet header containing the destination address is evaluated with a simple AND-OR logic gateson the source core to determine which one of the hot and cold spares are closer to the destination core. The hardware redun-dancy of this simple logic circuit is negligible and the energy it consumes is compensated with the shortened data flow path,while utilizing HACS architecture also degrades the ultimate power consumption of the whole system.

The other stimulant of the HACS concept is to improve NoC reliability confronting with multiple switch faults on the net-work. In the previous fault tolerant architectures that use one spare switch for each core, it was assumed that only one faulthappens at a moment of time. In those architectures, if both local switch and spare switch of the core failed, the networkwould fail like no fault tolerant technique had been planned. So, the core would be separated and inaccessible from othercores and the network would be blocked. We have evolved HACS to tolerate the faults of both switches of a core, since itis so common to have two close faulty switches due to physical effects of permanent faults like crash and impact. Thus, HACSarchitecture intensely improves system reliability in such cases and for multiple failures on the NoC switches.

4.1. Exceptions in HACS configuration

While applying the HACS procedures to a mapped application, we will see that some of those cores located in the marginof the mesh topology due to layout constraints are compulsorily located in outer border of the mesh. These locations chro-matically illustrated in Fig. 3 are named exceptions of HACS architecture. The infirmity of these locations causes the coreslocated there not to be able to have cold spare switches and connect only to their local and hot spare switches. Assuming

Fig. 3. HACS configuration for VOPD application.

M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974 969

an n � n NoC, the number of insider locations is (n � 1) � (n � 1), thus the number of cores out of mesh topology borders arecalculated as follows:

Nexceptions ¼ n2 � ðn� 1Þ2 ¼ 2n� 1 ð9Þ

Hence, one of the constraints when providing an NoC with HACS architecture is to locate the cores inside the borders ofthe mesh topology as possible. However, this constraint causes optimum utilization of topology layout which on the otherhand, provides chip area optimization on the NoC. Fig. 3 depicts the HACS architecture performed on the VOPD applicationmapped on a 4 � 4 regular mesh topology. The local, hot spare and cold spare switches are shown with chromatic, pale anddashed connections, respectively. As it mentioned previously, the exception cores which do not have cold spare switches arechromatically illustrated in this figure.

4.2. Reliability in HACS architecture

For reliability evaluation we are inspired by the mathematical analysis used in [18]. The analysis is based on reliabilityevaluation in serial and parallel circuit networks. In NoC systems, if we assume that faults only affect the switches, and theseswitches are atomic elements whose faults are evaluated by fail-stop model, then the network operates correctly if and onlyif all the switches work fault free. Thus, the NoC is modeled by a serial network of switches. However, if the system isequipped with a fault tolerant technique, then reliability of the system is calculated as the production of all the switches,when they are working fault free, plus the reliability of the system when some of them are faulty. Reliability of a path whensome of the switches are faulty is calculated as follows:

Ri;j ¼YS

k¼1

Rk þXS

k¼1

ð1� RkÞ � Ri;jjSWk Fails� �

ð10Þ

where, Rk shows the reliability of a switch SWk and (1 � Rk) indicates the probability of that switch to fail. The major differ-ence of the technique we propose in this paper and the one presented in [18] is that due to the obtained results of Sharpereliability software, if an element’s reliability power is more than one, it means that this element has more effect on overallsystem reliability, thus the power must be attended in dubitably, while in [18] it has been expressed that the switch powersthat exceed one is exchanged to one. To assert multiple fault tolerance of the HACS architecture, the failure of both local andhot spare or local and cold spare switches for each core is also considered calculating the system reliability. Fig. 4 depicts anexample of local and hot spare switch failures on a data path of Fig. 3. The circles and squares show the IP cores and switches,respectively. The chromatic squares are faulty switches, while the pale ones are healthy and fault free switches. This figureshows the communication path between core 12 and 13. First row demonstrates the case in which all the switches workcorrectly, while the second row is the case that switch 12 fails. This is the local switch of core 12, thus the communicationis forwarded through the hot spare switch of core 12, which is switch 11. In the third row, when both local and hot spareswitches of core 12 are faulty, the HACS configuration provides a third path for the packets via the cold spare switch 7.The reliability calculation of the path from core 12 to 13 is denoted in this figure.

The reliability improvement of HACS architecture comparing with FERNA and Greedy algorithms of spare switch selectionare delineated in Figs. 5 and 6. In the former figure the VOPD application is mapped on a 4 � 4 mesh with Greedy mappingfunction, while the latter is applied using Onyx [20] mapping algorithm. The results are obtained from high level simulationin Mathematica 7.0 software. In the guide line of the figures, the first word indicates the mapping algorithm of the applica-tion and the resting expression denotes the fault tolerant mechanism. The expression HACS and FERNA demonstrates thatFERNA has been applied to the system and then HACS is performed; likewise, HACS and Greedy shows that Greedy spareselection algorithm is formerly applied to the system. It worth mentioning that the system reliability is calculated assumingone or two faults on a system, even when these two switches are local and hot or cold spare switches of a core. While, theHACS configuration potentially can tolerate more than double failures, if only they are not all three switches connected to acore.

Fig. 4. An example of reliability calculation for a path in HACS configuration.

Fig. 5. System reliability calculation for VOPD application core graph mapped with Greedy algorithm.

Fig. 6. System reliability calculation for VOPD application core graph mapped with Onyx algorithm.

970 M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974

4.3. Component modifications for HACS configuration

Along with the additional links for connecting hot and cold spare classes to the network in HACS architecture, some mod-ifications are applied in the NoC components like IP cores, switches, and transmission packets. Some of these modificationsare similar to the ones declared in [13] and [18], but some are specialized for HACS architecture:

� IP cores; a simple logic is added to the network interface of each IP core to identify its local switch, hot, and cold spareswitches and to store the addresses of these switches. Also, a simple combinational comparator logic circuit is addedto the IP cores to determine whether the destination core is closer to the hot spare or cold spare, when the local switchis faulty. An IP core is assumed to utilize 3 ports to connect to its local, hot spare and cold spare switches. The networkinterfaces for these ports are thus expanded. Meanwhile, as mentioned in Section 4.1, 2n � 1 cores on an n � n NoC areinevitably located in outer borders of mesh topology. Hence, these cores are implemented with two ports and two net-work interfaces to connect to their local and hot spare switches, which by the way decreases and moderates the siliconarea of the chip in borders.� Switches; a built-in-self-test logic is needed in each switch to inform the neighbors its faulty condition, along with a sim-

ple logic to change the routing algorithm to overcome live-lock in the NoC, which is explained in later section in depth.The informing mechanism is applied by a controlling field of packet containing a 1-bit flag of FS, Faulty Status, and twofields of addresses evolving the alternative hot and cold spare switches of the faulty switch. Some of the switches in HACSconfiguration are implemented in 7 ports. One port connects the switch to the IP core as its local switch, 4 ports configurethe mesh topology of the switches, and other 2 ports are used to link the switch as a member of hot and cold spare class of

M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974 971

switches. Likewise exceptional IP cores, here are exceptional switches which are not selected by any core as the cold spareclass of switches. These switches, which are as many as exceptional cores, are implemented with 6 ports and are alsolocates in borders of the mesh topology that retracts the borders of the chip.� Packets; three fields of controlling data are needed in each packet: two n-bit fields for an n � n NoC named FSN, Faulty

Switches’ Numbers, restoring the addresses and locations of faulty switches, and a 1-bit flag named CR, Change Routing,indicating whether the routing strategy is needed to be changed from XY to YX routing algorithm to bypass the faulty rou-ters or region.

4.4. Routing in HACS architecture

The default routing strategy in HACS architecture, as in previous fault tolerant architectures [13,18], is XY routing algo-rithm. Identical to what is mentioned in [13] and [18], the routing strategy occasionally changes to YX routing algorithm toavoid live-lock of packets in the network. A live-lock is a condition in which a packet is misrouted through a loop path thatnever ends. The packet is then locked in the loop and will not reach the destination. On the other hand, a dead-lock is a sit-uation that a loop of network resource demands is resulted in which no progress will be raised in the network. Both live-lockand dead-lock are problematic issues in routing algorithm that must be considered and resolved so that the routing algo-rithm succeeds. In HACS architecture the adjusted routing algorithm is live-lock and dead-lock free. At the meantime, inthe proposed strategy used in HACS architecture, the shortest path is always selected, thus, the network performance isnot degraded due to extra hops and longer paths in the network.

In HACS architecture, live-lock situations happen when a packet is traversing toward the destination in an XY fashion.While the path has just finished the X-path and a turn is about to happen in a Y-path, if there is a faulty switch ahead,the path is again led through X-path to pass the faulty region. In this scenario, if the routing strategy continues as XY algo-rithm, it fails due to forwarding and back warding loop which is imposed by the deterministic nature of XY algorithm. Thus,the CR flag is set indicating the need of change in routing algorithm and the routing algorithm is changed to YX algorithm toovercome the live-lock situation. To resolve dead-lock situations in HACS architecture, we inspiringly utilize the idea of [21],in which two classes of virtual channels are presented. The inspiration is obtained due to the similar combinational use of XYand YX algorithm in the adjusted routing algorithm of HACS architecture and Sorena topology [21]. Each class of virtual chan-nels through the physical channels is used for each routing strategy. As a default, the class 1 is indicated for XY routing algo-rithm, and class 2 is specialized for YX algorithm. Two separate classes of virtual channels isolate the joint resource demandsof each routing strategy, and thus prevent the dead-lock loops.

4.5. Criteria Improvements and overheads of the HACS architecture

The area and hardware overhead of the HACS configuration is imposed by the extra ports of the routers and the extranetwork interfaces provided for routers and cores and also the extra links that connect the cold spares to the IP cores. ByVHDL low level simulation and Orion synthesis of the network, the area of router with different number of ports are as men-tioned in Table 1. Considering two 4 � 4 networks that FERNA or Greedy spare selection algorithm and HACS architecture areapplied on for VOPD application, the configurations of two systems are as follow in Table 2. Consequently, the overall over-head of HACS architecture comparing to the system implemented with FERNA or Greedy spare selection algorithm is about29.33% of area, while this overhead is calculated regarding to extra interconnection networks imposed to the system. It isnoticeable that the area overhead comparison here is particularly calculated and compared between two systems’ intercon-nection network comprising the routers, the network interfaces, and links connecting the IP cores to their switches, and it isnot according to the whole NoC area which consists of IP cores as well.

Table 1Router area comparison.

Port number of the routers

3 4 5 6 7

Router Area [(lm)2] 82,944 147,456 230,400 338,688 484,324Ration to 3-port router 1 1.78 2.78 4.08 5.84

Table 2System comparison of configurations in a 4 � 4 NoC for VOPD application.

System with FERNA or Greedy spare selection algorithm Router (4 � 4-port) + (8 � 5-port) + (4 � 6-port)IP core (NIs) All 16 IP cores are implemented with 2 Network Interfaces

System of HACS architecture Router (4 � 4-port) + (3 � 5-port) + (5 � 6-port) + (4 � 7-port)IP core (NIs) (7 � 2-NI) + (9 � 3-NI)

972 M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974

If assuming the reliability of a switch to be 90% for an instance, HACS architecture improves system reliability about 20%comparing to a system utilizing FERNA algorithm, while this development is about 60% when comparing with a system thatno fault tolerant technique is provided for. This difference is up to 80% with 95% of switch reliabilities. However, FERNA algo-rithm only covers the single failures of the switches, while HACS configuration tackles with single, double, and multiple fail-ures of nigh switches. This attribute is the preference that brings out the HACS configuration among other configurations thathave been ever proposed. This much improvement is so salient in terms of hardware redundancy that the HACS configura-tion suffers.

Comparing the extra communication cost of a system fault tolerated with FERNA algorithm and a HACS configured archi-tecture, we obtain that the system powered with HACS has degraded the cost for 23.6% for VOPD application. The results areachieved from high level simulation of SystemC TLM 2.0 libraries. Such an improvement had direct effect on system responsetime and power consumption. A system implemented by FERNA algorithm has also 27% and 17% developments for VOPDapplication comparing with Greedy algorithm of spare selection mapped with Onyx and Greedy mapping algorithms, respec-tively. Thus, the improvement of HACS approach comparing with Greedy spare selection algorithm reaches 50.6% and 40.6%,respectively for Onyx and Greedy mappings. C Pseudo codes of HACS configuration and the comparison logic circuit withineach switch is denoted in Appendix A, in depth.

5. Conclusion

Due to the inaccessibility of faults on chips, on this paper we focused on fault tolerant mechanisms with special attentionto permanent faults of switches of NoCs. We revealed a novel fault tolerant NoC architecture to tackle feasible double andmultiple faults of close switches on a mesh-based architecture. In this approach, two classes of switches as hot and coldspare classes were indicated by different priority algorithms to connect to the network cores. The architecture is thus namedHACS which is derived from hot and cold spares. While the hot spare of a core behaved like a spare switch in FERNA algo-rithm, the cold spare switch was keep in cold-stand-by status unless it was closer to the destination core when the localswitch of the core was failed or when both local and hot spare switches of the core were faulty. We considered the task graphof real application of VOPD with 16 cores for our case study on a 4 � 4 mesh topology. The area overhead regarding to extrainterconnection networks imposed to the system was 29.33% comparing to the FERNA algorithm, which was due to extraports in some routers, extra network interfaces and extra links of HACS configuration. But at the meantime, the HACS con-figuration intensely increase system reliability up to 80% considering the reliability of a switch about 95% comparing to asystem that no fault tolerant mechanism had attended in, and degrades the extra communication cost by 23.6% due to dataflow minimization through the network. While HACS configuration tackles with single, double, and multiple failures of nighswitches, FERNA only covers the single failures of the switches and this is the preference that brings out the HACS config-uration among other configurations. The area overhead of the system was calculated by VHDL low level simulation and Orionsynthesis of the network, while the results of extra communication cost of the systems are achieved from high level simu-lation of SystemC TLM 2.0 libraries. It would be desirable to continue studying of fault tolerant routing techniques in NoC toperform a comprehensive solution confronting with permanent faults on 3-dimentional on-chip networks. It would also beinteresting to investigate the effect of 3-D faults on NoC routers.

Appendix A

C Pseudo Code of HACS Configuration

1. Apply FERNA Algorithm to desired network on chip application mapped on Mesh topology;2. Let Comm_Flowi be the Total Communication Cost of PEi;3. for (i = 0 to Number_of_Cores , step 1) do

if (Comm_Flowi+1> Comm_Flowi) thentemp = Comm_Flowi+1;Comm_Flowi = Comm_Flowi+1;Comm_Flowi+1 = temp;

end if;end for;

4. for (i=0 to Number_of_Cores, step 1) doFind the highest ranked neighbor router for Corei except for Sparei, according to the band width utilized in between, and

call it Cold_Sparei , and call the Sparei as Hot_Sparei;if (there is no possible Cold Spare for Corei) then

Cold_Sparei = 0;end if;

M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974 973

end for;5. for (path=0 to Number_of_Pathes, step 1) do

if (Destinationpath is faulty router) thenif (Cold_SpareDestination is nearer to Sourcepath) then

Rout the packets through the path from Sourcepath to Cold_SpareDestination;else then

Rout the packets through the path from Sourcepath to Hot_SpareDestination;end if;

else if (Sourcepath is faulty router) thenif (Cold_SpareSource is nearer to Destinationpath) then

Rout the packets through the path from Cold_SpareSource to Destinationpath;else then

Rout the packets through the path from Hot_SpareSource to Destinationpath;end if;

end if;end for;

6. end

C Pseudo Code of the Comparison Logic Circuit on HACS Configuration

1. for (path=0 to Number_of_Pathes, step 1) doif (Destinationpath is faulty router) then

if (Cold_SpareDestination is nearer to Sourcepath) thenRout the packets through the path from Sourcepath to Cold_SpareDestination;

else thenRout the packets through the path from Sourcepath to Hot_SpareDestination;

end if;else if (Sourcepath is faulty router) then

if (Cold_SpareSource is nearer to Destinationpath) thenRout the packets through the path from Cold_SpareSource to Destinationpath;

else thenRout the packets through the path from Hot_SpareSource to Destinationpath;

end if;end if;

end for;2. end code;

References

[1] Benini L, De Micheli G. Networks on chips: a new SoC paradigm. In: Proceeding of Design, Automation and Test in Europe, Jan. 2002, p. 70–8.[2] Hu WH, Lee SE, Bagherzadeh N. DMesh: a diagonally-linked mesh network-on-chip architecture. In: Proceeding of NoCArc, First International

Workshop on Network on Chip Architectures to be held in conjunction with MICRO-41, 2008.[3] Park D, Nicopoulos C, Kim J, Vijaykrishnan N, Chita RDa. Exploring fault-tolerant network-on-chip architectures. In: Proceedings of DSN’06,

International Conference on Dependable Systems and Networks, Jun. 2006, p. 93–104.[4] Yalamanchili K, Pasalapudi A, Majeti D, Sunitha V. Design of optimal architectures using homogeneous routers for application specific network on chip.

In: Proceeding of ICETET ‘08. First International Conference on Emerging Trends in Engineering and Technology, Jul. 2008, p. 873–7.[5] Wenbiao Z, Zhang Y, Mao Z. Link-load balance aware mapping and routing for NoC. WSEAS Trans Circuits Syst 2007;6(11):583–91.[6] Harmanani HM, Farah R. A Method for efficient mapping and reliable routing for NoC architectures with minimum bandwidth and area. In: Proceeding

of Circuits and Systems and TAISA Conference, Jun. 2008, p. 29–32.[7] Li G, Wu J, Ma G. Mapping of irregular IP onto NoC architecture with optimal energy consumption. Tsinghua Sci Technol 2007;12:146–9.[8] Cidon I, Keidar I. Zooming in on network-on-chip architectures, Technical Report CCIT 565. Technion Department of Electrical Engineering; 2005.[9] Dumitras T, Marculescu R, On-chip stochastic communication. In: Proceeding of Design, Automation and Test in Europe, Mar. 2003, p. 790–5.

[10] Ali M, Welzl M, Hessler S, Hellebrand S. A fault tolerant mechanism for handling permanent and transient failures in a network on chip. In: Proceedingof Fourth International Conference on Information Technology, Apr 2007, p. 1027–32.

[11] Bushnell ML, Agarwal VD. Essentials of electronic testing for digital, memory and mixed-signal circuits. Frontiers in Electronic Testing, vol. 17. KluwerAcademic Publishers; 2000.

[12] Dally WJ, Towles B. Route packets, not wires: on-chip interconnection networks. In: Proceeding of DAC ‘01, Design Automation Conference, 2001, p.684–9.

[13] Janidarmian M, Khademzadeh A, Tinati M, Ghavibazoo M, Roshanfekr A. FERNA: a performance/cost aware spare switch selection algorithm for faulttolerant NoC architecture. In: Proceedings of The World Congress on Engineering and Computer Science, Oct 2009, p. 152–7.

[14] Janidarmian M, Tinati M, Khademzadeh A, Ghavibazoo M, Roshanfekr A. Special issue on a fault tolerant network on chip architecture. Iaeng Trans EngTechnol 2010;4:191–204 [Special Edition of the World Congress on Engineering and, Computer Science-2009].

974 M. Tinati et al. / Computers and Electrical Engineering 38 (2012) 963–974

[15] Rezazadeh A, Fathy M, Hassanzadeh A, If-Cube3 an improved fault tolerant routing algorithm to achieve less latency in Nocs. In: Proceeding of IEEEInternational Advance Computing Conference (IACC 2009), 2009, p. 278–83.

[16] Zhang Z, Greiner A, Taktak S. A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip. In: Proceeding of DAC ’08, 2008, p.441–46.

[17] Ali M, Welzl M, Hessler S. A fault tolerant mechanism for handling permanent and transient failures in a network on chip. In: Proceeding of FourthInternational Conference on Information Technology, Apr. 2007, p. 1027–32.

[18] Refan F, Alemzadeh H, Safari S, Prinetto P, Navabi Z. Reliability in application specific mesh-based NoC architectures. On-Line Testing Symposium,2008. IOLTS apos;08. 14th IEEE International Volume, Issue 7–9, Jul. 2008. p. 207–12.

[19] Renan F, Alemzadeh H, Kabiri P, Prinetto P, Navabi Z. Application specific configuration of a fault-tolerant NoC architecture. In: Proc. ElectronicsConference, BEC 2008. 11th International Biennial Baltic, Oct. 2008, p. 179–82.

[20] Janidarmian M, Khademzadeh A, Tavanpour M. Onyx: a new heuristic bandwidth-constrained mapping of cores onto tile-based network on chip. IEICEElectron. Express Jan. 2009;6(1):1–7.

[21] Janidarmian M, Bokharaie VS, Khademzadeh A, Tavanpour M. Sorena: new on-chip network topology featuring efficient mapping and simple deadlockfree routing algorithm. In: Proceeding of 10th IEEE International Conference on Computer and Information Technology, Jun.– Jul. 2010, p. 2290–9.

Melika Tinati received her B.Sc., degree in Computer Engineering from Islamic Azad University Central Tehran Branch, Tehran, Iran in 2007, and her M.Sc.degree in Computer Architecture Engineering from Science and Research Islamic Azad University, Tehran, Iran, in 2010. Her research interests includedesign and analysis of reliable and fault tolerant network-on-chips, wireless and 3-dimentional network-on-chip interconnections as a novel solution forfuture systems-on-chip.

Ahmad Khadem Zadeh received the B.Sc., M.Sc., and Ph.D. degrees, respectively, in applied physics from Ferdowsi University, Mashhad, Iran DigitalCommunication and Information Theory & Error Control Coding from the University of Kent, UK. He is currently the Head of Education, National &International Scientific Cooperation Department at Iran Telecom Research Center (ITRC), and also a Lecturer at Tehran University and a committee memberof Iranian Computer Society and Iranian Electrical Engineering Conference Permanent Committee. He been selected as the National outstanding researcherof the Iran Ministry of Information and Communication Technology.

Ali Afzali-Kusha obtained the B.Sc., M.Sc., and Ph.D. degrees from degree from the Sahrif University of Technology, Tehran, Iran, the University ofPittsburgh, USA, and the University of Michigan, Ann Arbor, MI, USA, respectively, all in electrical engineering. Since 1995, he has been with the Departmentof Electrical and Computer Engineering, the University of Tehran, Iran, where he is currently an Associate Professor. He has been a research fellow at theUniversity of Michigan, USA, from 1994 to 1995, at the University of Toronto, Ont., Canada, in 1998, and at the University of Waterloo, Ont., Canada, in 1999.

Majid Janidarmian received his M.Sc., degree in computer architecture field from Science and research brand of IAU in 2009. He directs research onNetwork-on-Chip at SRBIAU and his recent research interests include modeling and design of Networks-on-Chip with particular emphasis on application-specific architectures and low power chip multiprocessor design, especially in fault-tolerance router design for Network-on-Chip.