communication-centric debugging of systems on ... - tu...

Computer EngineeringMekelweg 4,

2628 CD DelftThe Netherlands

http://ce.et.tudelft.nl/

2007

MSc THESIS

Communication-centric Debugging of Systems onChip using Networks on Chip

Siddharth Umrani

Abstract

Faculty of Electrical Engineering, Mathematics and Computer Science

CE-MS-2007-11

The rapid technology scaling i.e. shrinking feature size means that alarge number of components can be integrated on a single IntegratedChip (IC). This increased complexity translates into an increase indesign effort and also potentially more design errors. Thus changesare required in the system-on-chip development which will reduceboth design effort and design errors. To reduce design effort, a mod-ular design methodology which promotes reuse of already designedIP cores rather than the design of IP cores themselves is used. Thusthe complexity of such a chip is resident in communication betweenthese cores rather than in the computation taking place in them. Theshriking feature size also introduces Deep Sub-Micron (DSM) effectsin on-chip interconnect wires. Networks on chip have since evolvedas a promising new type of interconnect which have the potential toalleviate these shortcomings.Effective debug aids in fast and accurate detection of majority of theerrors that may be present in the design thus reducing the numberof iterations in the design cycle (and effectively the time to market).Traditional debug is core-based, where each of the IP cores in a SoCare the locus of debug actions. Communication-centric debug hasbeen proposed as a complementary debug solution that uses the in-

terconnect to debug the chip. Combination of these debug strategies might help speed up accurate errorlocalization during debug and thus significant gains possible in reducing time to market. This thesis reportpresents a debug infrastructure that facilitates Communication-Centric Debug of System on Chip usingNetwork on Chip.


A Debug Infrastructure

THESIS

submitted in partial fulfillment of therequirements for the degree of

MASTER OF SCIENCE

in

COMPUTER ENGINEERING

by

Siddharth Umraniborn in Thane, INDIA

Computer EngineeringDepartment of Electrical EngineeringFaculty of Electrical Engineering, Mathematics and Computer ScienceDelft University of Technology


by Siddharth Umrani

Abstract

The rapid technology scaling i.e. shrinking feature size means that a large number of com-ponents can be integrated on a single Integrated Chip (IC). This increased complexitytranslates into an increase in design effort and also potentially more design errors. Thus

changes are required in the system-on-chip development which will reduce both design effortand design errors. To reduce design effort, a modular design methodology which promotes reuseof already designed IP cores rather than the design of IP cores themselves is used. Thus thecomplexity of such a chip is resident in communication between these cores rather than in thecomputation taking place in them. The shriking feature size also introduces Deep Sub-Micron(DSM) effects in on-chip interconnect wires. Networks on chip have since evolved as a promisingnew type of interconnect which have the potential to alleviate these shortcomings.Effective debug aids in fast and accurate detection of majority of the errors that may be presentin the design thus reducing the number of iterations in the design cycle (and effectively the timeto market). Traditional debug is core-based, where each of the IP cores in a SoC are the locusof debug actions. Communication-centric debug has been proposed as a complementary debugsolution that uses the interconnect to debug the chip. Combination of these debug strategiesmight help speed up accurate error localization during debug and thus significant gains possiblein reducing time to market. This thesis report presents a debug infrastructure that facilitatesCommunication-Centric Debug of System on Chip using Network on Chip.

Laboratory : Computer EngineeringCodenumber : CE-MS-2007-11

Committee Members :

Advisor: Kees Goossens, CE, TU Delft and NXP Semiconductors

Advisor: Georgi Gaydadjiev, CE, TU Delft

Member: Zaid Al-Ars, CE, TU Delft

Member: Rene van Leuken, CAS, TU Delft

i

Dedicated to my parents and my brother Aditya

iii

Contents

List of Figures ix

Acknowledgements xi

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Organization of Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Network-on-chip (NoC) 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Interconnect Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Timeline of Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Æthereal NoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Network Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Debug 173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Debug Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 Debug Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Communication Centric Debug 254.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Design choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Debug Strategy for SoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4 Locus of communication-centric debug control . . . . . . . . . . . . . . . . 274.5 DTL Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Debug Control Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Debug Hardware Infrastructure 395.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Event Distribution Interconnect (EDI) . . . . . . . . . . . . . . . . . . . . 415.4 Test Point Registers (TPRs) . . . . . . . . . . . . . . . . . . . . . . . . . . 465.5 Network Interface Shell (NI Shell) . . . . . . . . . . . . . . . . . . . . . . 545.6 Test Access Port (TAP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.7 Debug Flow Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

v

6 Debug Software Infrastructure 656.1 User programming via the TAP . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Use of Debug Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3 Debug Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7 Results 717.1 Programming the TPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.2 EDI stop pulse distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 727.3 Debug Control Actions in the shells . . . . . . . . . . . . . . . . . . . . . 737.4 Area Cost and Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

8 Conclusions 798.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Bibliography 85

A Constraints on External Stop Pulse 87

B List of Acronyms 89

vi

List of Figures

2.1 IP and its port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Transactions (Read and Write) . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Messages and Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Signal Handshake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5 Signal Groups and Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.6 Connection and Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.7 Communication - (a) Narrowcast (b) Multi-initiator . . . . . . . . . . . . 92.8 Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.9 Master and Slave IPs communicating with NoC as interconnect. . . . . . 102.10 Timeline of Interactions (MNI - Master Network Interface, SNI - Slave

Network Interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.11 Æthereal NoC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.12 Æthereal connection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.13 Æthereal Network Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . 152.14 Connections / channels in Æthereal. . . . . . . . . . . . . . . . . . . . . . 152.15 Visible granularities of Interactions . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Digital design flow(Source: [29]). . . . . . . . . . . . . . . . . . . . . . . . 173.2 Real-time debug approach. In this scenario, internal signals are observed

in real-time via external on-chip pins. . . . . . . . . . . . . . . . . . . . . 193.3 Scan-based debug approach. In this scenario, everytime the chip reaches

a quiescent state, the functional clocks can be stopped and the internalstate read out. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Traditional scan-based debug flow (Source: [29]). . . . . . . . . . . . . . . 213.5 Proposed scan-based debug flow . . . . . . . . . . . . . . . . . . . . . . . 213.6 (a) Granularity of internal NoC control. (b) Granularity of control be-

tween IP and NoC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1 (a) Computation-centric debug (b) Communication-centric debug(Source: [37]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Debug flow using Communication-centric debug . . . . . . . . . . . . . . . 274.3 Locus of communication-centric debug control. . . . . . . . . . . . . . . . 284.4 Debug control action interfaces (MNI-Master Network Interface, SNI-

Slave Network Interface). . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5 DTL Signals (Source: [30]). . . . . . . . . . . . . . . . . . . . . . . . . . . 304.6 Timeline for a Stop (MNI - Master Network Interface, SNI - Slave Network

Interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.7 Timeline for a Continue (MNI - Master Network Interface, SNI - Slave

Network Interface) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.8 Example illustrating the various debug actions over an IP-NoC interface . 344.9 Example SoC showing connections setup . . . . . . . . . . . . . . . . . . . 36

vii

5.1 The Debug Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2 Monitor Interface, where the monitor stop is connected to the

EDI, link data to the router link which is to be monitored andmonitor config to the monitorconfig TPR which specifies the breakpointcondition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3 Breakpoint Generation logic inside a Monitor . . . . . . . . . . . . . . . . 415.4 Monitor gate-level waveforms for breakpoint hit . . . . . . . . . . . . . . . 425.5 Standing wave creation in the EDI . . . . . . . . . . . . . . . . . . . . . . 435.6 Sub-sampling of a breakpoint hit pulse . . . . . . . . . . . . . . . . . . . . 445.7 Stop Module Interfaces, where N is the number of neighboring devices

(other Stop Modules and NIs . . . . . . . . . . . . . . . . . . . . . . . . . 445.8 Stop Module FSM, where stop in is the logical OR of all N neighbouring

input stop signals and stop out the output signal to all N neighbouringdevices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.9 Stop Module waveforms for monitor stop . . . . . . . . . . . . . . . . . . . 465.10 Stop Module waveforms for external user stop through TAP . . . . . . . . 465.11 Programming of the Monitor Config TPR . . . . . . . . . . . . . . . . . . 475.12 The internal structure of the NI-Shell TPR, which is imperative to know

during programming in order to be able to programme the right value forthe desired control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.13 Explains the function of Stop Enable field in the NI-Shell TPR . . . . . 495.14 Behaviour when Stop Condition field is de-asserted in the NI-Shell TPR 505.15 Behaviour when Stop Condition field is asserted in the NI-Shell TPR . . 515.16 Behaviour when Stop Granularity field is de-asserted in the NI-Shell TPR 525.17 Behaviour when Stop Granularity field is asserted in the NI-Shell TPR 535.18 Continue operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.19 Explains the function of Continue field in the NI-Shell TPR . . . . . . . 555.20 NI Shell FSM (Mirror State transitions) . . . . . . . . . . . . . . . . . . . 565.21 Narrowcast Shell (in the FIFO shown the channel IDs of unfinished read

requests are buffered) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.22 Narrowcast Shell FSM (Request channels) - ’FSM 1’ in Figure 5.21 . . . . 585.23 Narrowcast Shell FSM (Response channels) - ’FSM 2’ in Figure 5.21 . . . 595.24 Multiconnection Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.25 Multiconnection Shell FSM - ’FSM’ in Figure 5.24 . . . . . . . . . . . . . 615.26 TAP and its associated infrastructure . . . . . . . . . . . . . . . . . . . . 63

6.1 Setup for performing control actions via the IEEE 1149.1 TAP . . . . . . 656.2 Interesting SoC debug points (MNI-Master Network Interface, SNI-Slave

Network Interface). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.1 Programming of the Monitor Config TPR . . . . . . . . . . . . . . . . . . 717.2 Programming of the NI Shell TPR . . . . . . . . . . . . . . . . . . . . . . 727.3 Stop Module gate-level waveforms for monitor stop . . . . . . . . . . . . . 737.4 Stop Module gate-level waveforms for external user stop through TAP . . 737.5 Waveform for debug flow in a MNI . . . . . . . . . . . . . . . . . . . . . . 74

viii

7.6 Request Stop in a MNI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.7 Request Stop / Single-step / Continue in a MNI . . . . . . . . . . . . . . 767.8 Response channel stop in a MNI . . . . . . . . . . . . . . . . . . . . . . . 777.9 Response Stop / Single-step / Continue in a MNI . . . . . . . . . . . . . . 787.10 Example SoC used during simulation and synthesis . . . . . . . . . . . . . 78

8.1 Example registers that can be polled to decide on NoC quiescent state. [32] 808.2 Shows the scan-chain concatenation order for a stop-module network. . . 818.3 High-level back annotation from statedumps. . . . . . . . . . . . . . . . . 82

A.1 Timing diagrams showing minimum duration of external stop pulse . . . . 87

ix

Acknowledgements

This report concludes my Thesis project as part of my Master’s degree educationin Computer Engineering at the faculty of Electrical Engineering, Mathematics andComputer Science, Technical University of Delft, The Netherlands. The project titled”Communication-Centric Debugging of Systems on Chip using Networks on Chip : ADebug Infrastructure” was carried out from October 2006 till July 2007 at the SoCArchitectures and Infrastructures department of NXP Semiconductors, Eindhoven, TheNetherlands.I would like to thank Egbert Bol and Georgi Gaydadjiev who were instrumental in help-ing me with the financial assistance in order to help me pursue this study and devote allmy energies towards it. I am grateful to my supervisors:

• Kees Goossens (NXP Semiconductors, SoC Architectures and Infrastructure)

• Bart Vermeulen (NXP Semiconductors, SoC Architectures and Infrastructure) and

• Georgi Gaydadjiev (Technical University of Delft, Computer Engineering Labora-tory)

for providing me with the opportunity of working on this project. Their continuedguidance and support throughout the project duration has contributed to the success ofthis project. The meetings at NXP Semiconductors with Bart and Kees always providedme with ever-broadening horizons in the quest for the problem solution. I would like toespecially thank them both, as this valuable experience has been highly rewarding forme personally. Besides I would also like to thank Andreas Hanson and Martijn Coenenfor helping me with their expertise in the Æthereal network-on-chip and its automateddesign flow.Heartfelt thanks also to all my colleagues at NXP Semiconductors. I would like to thankall my colleagues and friends that I have made during these two years; Shiva Krishna,Andres Garcia, Benny Fallica, Patrick van Wijnen, Mitas Nikos, Ali Karimi, CatalinCiobanu, Bogdan Spinean, Arnoud van der Heijden among others. All those enjoyable/ stressful moments we shared whether studying late at night, drinking out in the bars,the barbecues or playing near the faculty parking; they will stay with me throughoutmy life.Last but not least, my sincere gratitude to my family without whose continued supportand encouragement this project and my master study would not have existed. I cannever thank you enough for this. This work is dedicated to you, my small way ofthanking you all.

Siddharth UmraniDelft, The NetherlandsAugust 25, 2007

xi

Introduction 11.1 Motivation

The rapid technology scaling i.e. shrinking feature size means that a large number ofcomponents can be integrated on a single Integrated Chip (IC). This increased complexitytranslates into an increase in design effort and also potentially more design errors. Thuschanges are required in the system-on-chip development which will reduce both designeffort and design errors. To reduce design effort, a modular design methodology whichpromotes reuse of already designed IP cores rather than the design of IP cores themselvesis used. Thus the complexity of such a chip is resident in communication between thesecores rather than in the computation taking place in them. The shriking feature size alsointroduces Deep Sub-Micron (DSM) effects in on-chip interconnect wires. Networks-on-chip have since evolved as a promising new type of interconnect which have the potentialto alleviate these shortcomings [12, 20, 40].Effective debug aids in fast and accurate detection of majority of the errors that maybe present in the design thus reducing the number of iterations in the design cycle (andeffectively the time to market). Traditional debug is core-based, where each of the IPcores in a SoC are the locus of debug actions. Communication-centric debug [17, 37] hasbeen proposed as a complementary debug solution that uses the interconnect to debugthe chip. Combination of these debug strategies might help speed up accurate errorlocalization during debug and thus significant gains possible in reducing time to market.

1.2 Goals

The principle objective of this project is to implement a debug infrastructure that willfacilitate Communication-centric debug. Philips’ network-on-chip solution Æthereal waschosen as the interconnect on which the debug infrastructure is based.Goals of the project:

• Define how Communication-centric debug is performed.

• Implement a debug infrastructure in order to achieve it.

• Integrate this infrastructure with the Æthereal design flow.

• Demonstrate the results i.e. implementation of the infrastructure by simulations.

1.3 Previous Work

Monitoring services for network-on-chip have been already proposed in [8, 9, 10]. In [22]a good overview of SoC debug is found. With regards to debug, scan-based approach

1

2 CHAPTER 1. INTRODUCTION

is used in [21] as is also done in our strategy. Present solutions for system-on-chipdebug are core-based e.g. ARM’s CoreSight [27], DAFCA’s Flexible Silicon DebugInfrastructure [11] and Philips’ Core-based Scan Architecture for Silicon Debug [29].But ours is communication-centric debug approach.

1.4 Organization of Report

This thesis report is organized as follows. In Chapter 2 we state various terminologiesthat are used to define communication taking place over the interconnect. Then wedescribe Network-on-Chip (NoC) and some important components of the Æthereal NoC.Chapter 3 reasons the need for debug in SoC design flow and the various debug flows thatare currently used. The concept of Communication-centric debug is detailed in Chapter4. We present a debug strategy for the SoC along with a proposal of how various debugactions will be performed in communication-centric debug. Our implemented debuginfrastructure is explained in Chapter 5 (hardware) and 6 (software). Experimentalresults are given in Chapter 7. Finally the report ends with some Conclusions anddirections for future work (Chapter 8).

Network-on-chip (NoC) 22.1 Introduction

The shrinking feature size means that a larger number of components can be integratedonto a single chip. This translates into integration of greater number of IP cores on asingle chip. The present day design methodology for the increasingly complex System-on-Chip (SoC) is a modular one which promotes reuse of already designed IP cores ratherthan the design of IP cores themselves. Thus the complexity of such a chip is residentin communication between these cores rather than in the computation taking place inthem. The shrinking feature size also allows for the on-chip interconnect wires to berouted even more closer to each other. But this causes two parallel routed wires to forma capacitive element introducing crosstalk, interference, etc. otherwise known as DSMeffects. Networks on chip have since evolved as a promising new type of interconnectwhich have the potential to alleviate these shortcomings [12, 20, 40].From a functional point of view, traditional interconnects have been serial arbitration-based [34], but with the evolution of SoC with multiple IP cores and the ever increasingdemand for more on-chip communication bandwidth, parallel arbitration-based inter-connects [2, 30] were developed. But these interconnects did not scale well to keep upwith the exponential rise in the demand for on-chip communication bandwidth. Hencefurther research and development lead to the design of concurrent interconnects likemulti-layer bus [3] and Network-on-Chip [19, 12]. These interconnects allow concurrentcommunication between various IP cores in the SoC, yet are scalable. They depict themost complex of interconnects both in terms of control (as there is no single point ofcontrol for the communication over the interconnect) and complexity (since the numberof elements involved in the interconnect itself are quite large). NXP Semiconductors hasdeveloped its own Network-on-Chip solution Æthereal [16]. In Section 2.4 we detail someof the important architectural components of the Æthereal NoC and their functionalitybut before that we define the terminology that is used for defining interaction over aninterconnect.

2.2 Interconnect Terminology

In this section we define certain terms which are key to understanding the communicationover an interconnect.

IP and its ports

Among two communicating IP blocks, the IP initiating the communication is known asthe Master IP, while the other responding IP is the Slave IP. As shown in Figure 2.1

3

4 CHAPTER 2. NETWORK-ON-CHIP (NOC)

every IP involved in communication does so via its port known as the IP port. For easeof illustration, we do not explicitly show the ports in further diagrams but are assumedto be present.

M a s t e r I P Co re

S l a v e I P C o r e

I P P o r t I P P o r t

c o m m u n i c a t i o n

Figure 2.1: IP and its port

Transaction

A Master IP core communicates with other IP cores in a SoC by way of read and writeoperations. We define that an external read or a write operation executed in an IPprocessor core takes place as a transaction over the interconnect. As shown in Figure2.2 a write transaction is composed of a write request followed by the write data (andoptional write acknowledgement). A read transaction consists of the read request thatis sent to the Slave IP and the read data sent in response by the Slave IP core back tothe Master IP.



W r i t e R e q u e s t a n d D a t a

W r i t e A c k n o w l e d g e m e n t ( i f r e q d . )

WRITE TRANSACTION



R e a d R e q u e s t

R E A D T R A N S A C T I O N

R e a d D a t a

Figure 2.2: Transactions (Read and Write)

2.2. INTERCONNECT TERMINOLOGY 5

Message

We define a message as a uni-directional communication in a transaction. From Figure2.3 we can see that every transaction consists of one or more messages. In case of a writetransaction, the entire transaction itself (request and data) is also the request message(in case of no write acknowledgements) or the response (write acknowledgement) is thesecond message. Whereas a read transaction is composed of two messages, the request(from the Master IP to the Slave IP) and the response i.e. read data (from the Slave IPto the Master IP).

- M e s s a g e

A c k



WRITE TRANSACTION



R E A D T R A N S A C T I O N

C o m m a n dD a t a 3 D a t a 2 D a t a 1

C o m m a n d

D a t a 3D a t a 2D a t a 1

- E l e m e n t

Figure 2.3: Messages and Elements

Protocol, Signal groups and Signals

Direct communication between two IPs takes place in a language which is understoodby both of them. This is known as the communication protocol. In majority of thecommunication protocols for on-chip communication, interactions are initiated and re-sponded to via handshaking between the communicating IPs. Figure 2.4 shows such ahandshake between two IPs. The IP which wants to initiate communication, signals thisby asserting the valid signal. The receiving IP acknowledges its acceptance by assertingthe accept signal. This means that the target is also ready for communication. Onlywhen both the valid and the accept signals are high, are the two IPs considered to be


communicating with each other.

In i t ia to r IP Co re

T a r g e t I P C o r e

va l i d

a c c e p t

H A N D S H A K E

T i m e ( t )

t 1

t 2

t 3

t 4

( t 1 < t2 <= t3 < t4 )

t 1 - I n i t i a to r IP asse r t s va l i d s igna l ( m e a n s I n i t i a t o r w a n t s t o s t a r t c o m m u n i c a t i o n )t 2 - T a r g e t I P s e e s v a l i d s i g n a l i s a s s e r t e d ( u n d e r s t a n d s t h a t I n i t i a t o r w a n t s t o s t a r t c o m m u n i c a t i o n )t 3 - T a r g e t I P a s s e r t s a c c e p t s i g n a l (means da ta i s t r ans fe r red )t 4 - I n i t i a t o r IP sees accep t s i gna l ( u n d e r s t a n d s t h a t T a r g e t h a s r e c e i v e d t h e d a t a i t s e n t )@ t 4 + 1 t h e I n i t i a t o r c a n d e a s s e r t t h e v a l u e o n d a t a .

va l i d

a c c e p t

d a t a

d a t a

Figure 2.4: Signal Handshake

The valid and accept signals together with certain other signals together perform aspecific function. They are known as a signal group. For example, the command signalgroup is used by the master IP to initiate a new transaction with a slave IP in the chip.It consists of valid, accept and data signals along with some other protocol specific ones.All the signals in the command signal group together signal the transaction initiation.Only when both the valid and accept signals are asserted, the value on the data lines isconsidered valid and taken as the slave’s address. Subsequently a different signal groupis used for the actual transfer of data. In case of a write transaction, the data is sentfrom the initiator (master IP) of a transaction to the target (slave IP) over a signal groupknown as write. For a read transaction, the data is sent as a response from the target


of a transaction (slave IP) to the initiator (master IP) over a signal group known asread. A signal group used for data transfer consists of mainly valid, accept ,data signalsamong others. Figure 2.5 shows a few signal groups and the signals they contain. Theexact names of the signals and the signal groups may change for each protocol and arespecific to it. So also are the number of signal groups and the exact signals which formeach group.

In i t ia to r IP Co re

T a r g e t I P C o r e

r e a d

w r i t e

c o m m a n d

s i g n a l g r o u p

=

s i g n a l

va l i d

d a t a

a c c e p t

SIGNAL GROUPS & SIGNALS

Figure 2.5: Signal Groups and Signals

Element

Further to the previous observable granularities, there is one other granularity which isobservable independent of the underlying interconnect or the communication protocolused. This is the element. An element is a single valid-accept handshake. In the write


transaction shown in Figure 2.3 a message from the Master to the Slave IP consists ofmultiple elements (viz. a command element and multiple data elements). Like a message,every element transfer is also a uni-directional communication.

Connection and Channels

In an interconnect, the interactions for a read / write transaction take place eithervia a connection (called connection-oriented interconnect) that is set up between theMaster IP core and the Slave IP core as is shown in Figure 2.6; or without one (calleda connection-less interconnect). In a connection-oriented interconnect (e.g. NoCs likeMango [6], Nostrum [26], Æthereal [16], FAUST [4]), the ordering of all communicationentering the interconnect is preserved when it comes out of the same. Whereas for aconnection-less interconnect (e.g. NoCs like [18, 5, 7]) it may not be the case. In short,in connection-oriented interconnects providing QoS guarantees and ordering is easiercompared to connection-less interconnects. As shown in Figure 2.6 a simple connectionconsists of two channels viz. Request and Response and every channel is uni-directional.



CONNECTION

R e q u e s t C h a n n e l

R e s p o n s e C h a n n e l

Figure 2.6: Connection and Channels

SoCs today have multiple IP cores and each IP core may be required to communi-cate with multiple other cores. As depicted in Figure 2.7(a) a single Master IP maycommunicate with multiple Slave IPs. In a connection-oriented interconnect this is doneby setting up a pair of channels for each master-slave pair and is known as narrowcastconnection [32]. Conversely, multiple Master IP cores may communicate with a singleSlave IP core. This will involve multiple (simple) connections being set up by multiplemasters to the same slave (Figure 2.7(b)).

Terminology Hierarchies

Figure 2.8 shows the compositional hierarchy of the various terminologies defined in thisSection. An IP communicates via its port and can have have multiple ports. A port ofan IP can have one or more connections established through it. For every connection,the master IP can initiate multiple transactions. In case of a simple connection alltransactions are with the same slave IP, whereas for a narrowcast connection they maybe with different IPs. Every transaction is composed of one or more messages and a


M S 1

S 3

S 2

- C o n n e c t i o n - I P C o r eM - M a s t e r S - S l a v e

M 1

M 2

M 3

S(a ) (b )

- C h a n n e l

N a r r o w c a s t C o n n e c t i o n S i m p l e C o n n e c t i o n

Figure 2.7: Communication - (a) Narrowcast (b) Multi-initiator

T R A N S A C T I O N

E L E M E N T

M E S S A G E

CONNECTION

CHANNEL

IP

P O R T

P R O T O C O L

SIGNAL GROUP

SIGNAL

Figure 2.8: Hierarchies


message in turn consists of one or more elements. A simple connection is made up oftwo channels viz. request and response, while a narrowcast connection has 2 channels (1request and 1 response) per master-slave pair (2N total channels, where N is the numberof master-slave pairs).An IP port has a protocol associated with it, using which it can communicate withother IPs which understand the same protocol. Protocols are composed of signal groups.These signal groups implement a handshake using valid / accept signals. A single valid/accept handshake corresponds to an element transfer. Hence an element could be acommand which is sent to initiate a transfer or a data value.

2.3 Timeline of Interactions

Figure 2.9: Master and Slave IPs communicating with NoC as interconnect.

Figure 2.9 shows a master and a slave IP communication, with the NoC as inter-connect. The master IP communicates with the master network interface (MNI). Thenetwork then routes the data to the slave network interface (SNI) which in turn com-municates with the slave IP. Figure 2.10 shows the timeline of a write transaction (andits messages / elements) and how valid and accept signals accomplish the completion oftransaction for the topology of Figure 2.9. The first two sets of traces are the handshakesthat take place over request channel (REQ (1)) from master IP - MNI and the remainingtwo correspond to handshakes over request channel (REQ (2)) from SNI - slave IP. Everyelement transfer is essentially a valid-accept handshake. Only when both are asserted, anelement transfer is said to be complete. A message transfer on an interface is completewhen all elements constituting that message have been transferred. Hence elements andmessage are defined on each of the four interfaces (1–4) of Figure 2.9. On the other handa transaction is defined end-to-end.

Shown in Figure 2.10 is a write transaction and how the command and write dataare transferred from the master to the slave IP. At 1, the master IP signals to the MNIthat it wants to initiate a transaction by asserting the cmd valid signal and puts theaddress of the target on its cmd data lines. This is the start of the transaction, request

2.3. TIMELINE OF INTERACTIONS 11

Sig

na

l G

ro

up

Ele

me

nt

Me

ss

ag

e

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

Master IP

MNI

SNI

Slave IP

wr2

wr3

wr1

ad

dr

wr2

wr3

wr1

ad

dr

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

wr2

wr3

wr1

ad

dr

wr2

wr3

wr1

ad

dr

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

12

34

56

78

91

01

11

41

21

3v

=a

=1

(H

an

ds

ha

ke

co

mp

lete

)

Tim

e (

t)c

om

ma

nd

ele

me

nt

writ

e e

lem

en

t

Re

qu

es

t m

es

sa

ge

(M

as

ter -

MN

I)R

eq

ue

st

me

ss

ag

e

(S

NI

- S

lav

e)

WRITE TRANSACTION

Figure 2.10: Timeline of Interactions (MNI - Master Network Interface, SNI - SlaveNetwork Interface)

message and the command element transfer. The MNI sees this, and when ready signalsit acceptance of the command by asserting the cmd accept signal (at 2). So whenboth master IP and MNI see cmd valid and cmd accept signals as being asserted,


the command message transfer is said to be complete (at 3 in the timeline). This isone signal group (command) of the communication protocol. The data signal group isused for the transfer of data elements. In our example, the master IP has asserted thewr valid signal of the write signal group, even before the command message transferis complete. But the data transfer starts only after the MNI asserts the wr acceptsignal. This takes place as follows. At 4 the MNI signals the acceptance of data byasserting the wr accept signal of the write signal group. Only then the transfer of thefirst data element is complete. The master IP then puts the second data element (ifany) on the wr data lines at 5. The data elements are transferred between 4 and 7.The first data element transfer starts at 1 and completes at 5, whereas the second dataelement transfer starts at 5 and completes at 6. Each of these intervals is the life-cycleof the respective elements. The point 7 signals the end of transfer of all data elementsand the request message over the master IP - MNI interface. Then at 8 we see that theslave network interface (SNI) starts the transfer of the command message to the slaveIP (cmd valid goes high). The slave IP when ready accepts this command by assertingthe cmd accept signal (point 9). Between 11 and 14 the transfer of the write data takesplace between the SNI and the slave IP. Hence on this SNI-slave IP interface, 8 – 10 iswhen the command element transfer takes place and the request message is transferredbetween 8 and 14. The point 14 also signals the completion of the write transactionbetween the master and the slave IP.

2.4 Æthereal NoC

Æthereal [33, 16, 14] is a connection-oriented NoC wherein the connections can beclassified depending on the services they provide. Resources are reserved for GuaranteedServices (GSs) which include real-time and streaming traffic. Thus Æthereal can provideguarantees on throughput, latency, jitter for such Guaranteed Throughput (GT) traffic[15, 14]. To prevent resource under-utilisation and to maximize resource usage, BestEffort Services (BESs) are also provided by means of Best-Effort connections. In theseconnections, data is sent whenever there are free resources available since slots are notreserved [15].

The basic infrastructure of the Æthereal NoC (Figure 2.11) consists of Routers andNetwork Interfaces. The network interface is the component of the network which com-municates directly with the IP cores. Different IPs may have different communicatingprotocols. On the other hand, internally the NoC routes the data in the form of flits (forformat refer to [14]). Hence the network interface is where the conversion of these twodifferent protocols takes place. The routers only perform the function of forwarding thedata through the network from the source to the destination.

A connection (Figure 2.12) as defined for Æthereal is set up between ports of two ormore Network Interfaces (NIs). The communication is initiated by the Master NetworkInterface Port (MNIP) and the receiving end is called the Slave Network Interface Port(SNIP). Further each connection consists of two channels viz. request and responsechannels. The communication from the MNIP to the SNIP takes place over the requestchannel and that back from the SNIP takes place over the response channel. Hence a

2.5. NETWORK INTERFACE 13

Chip

N e t w o r k

S l a v e I P C o r e 2M a s t e r I P C o r e 1

S l a v e I P C o r e 1 M a s t e r I P C o r e 2

N e t w o r k I n t e r f a c e 1



N I p o r t

N I p o r t

N I p o r t

N I p o r t

R o u t e r

R o u t e r

I P p o r tI P p o r t


Figure 2.11: Æthereal NoC.

transaction is on a connection whereas a message is sent over a channel. All connectionsand channels are virtual and configured over physical links connecting the various internalcomponents (routers, NIs) of the NoC. Multiple connections can be set up between amaster-slave IP pair with a single port at each end. These connections could e.g. providedifferent types of service (GT or BE).

2.5 Network Interface

Figure 2.13 shows a Network Interface (NI) of the NoC. On the one hand the networkinterface communicates with the IP Core and on the other with the Router. The commu-nication with the IP Core takes place between the IP Port and the Network Interface Port(NI Port) in the IP protocol format. The network interface is composed of two majormodules viz. the Network interface Shell (NiS) and the Network interface Kernel (NiK).A network interface has one network interface shell (NiS) per network interface port (NIPort) and only one network interface kernel (NiK). The communication between the NiSand the NiK takes place by way of messages. In the NiS it is the protocol adapters that


M a s t e r I P C o r e


M a s t e r N I S l a v e N I

M N I P S N I P

N o C

I P P o r t I P P o r t

M N I P - M a s t e r N e t w o r k I n t e r f a c e P o r tS N I P - S l a v e N e t w o r k I n t e r f a c e P o r tN I - N e t w o r k I n t e r f a c e

R e q u e s t C h a n n e l

R e s p o n s e C h a n n e l

R E Q U E S T R E Q U E S T

R E S P O N S ER E S P O N S E

Figure 2.12: Æthereal connection.

perform the conversion between the IP protocol signals and this message format. TheNiK then does the conversion between these messages and the Æthereal packet format.The Æthereal packets are then sent to the connected router. Every NiK has one port(Router port) over which it sends the Æthereal packets to the router and one or moreNI Kernel ports which are used for communication with the network interface shells.

As previously explained in Section 2.4, a simple Æthereal connection is set up betweentwo network interface ports. In case of a narrowcast connection, a narrowcast adapter isused. Consider Figure 2.14, IP Core 1 communicates with IP Core 3 and 4. This results ina narrowcast connection being set up in the NoC as shown (Connection 1). A narrowcastadapter is used in NiS 1 for this narrowcast connection. The narrowcast adapter convertsthe IP protocol signals into messages which are then routed to the correct destinationdepending on the target address. On the other hand, both IP Core 1 and 2 communicatewith IP Core 4. Thus two connections (Connection 1 and Connection 2) are set up inthe NoC, one each corresponding to a master-slave IP pair. This necessitates the useof a multi-initiator adapter in the network interface shell connected to IP Core 4. Themessages over the two connections are converted into the IP protocol format and sent toIP Core 4. The multi-initiator adapter serializes the transactions sent to IP Core 4. Inour example, every NI port has only one connection set up from / to it, but theoreticallymultiple connections can be set up.In the NiK, there is the notion of channels. For every master-slave IP pair for whomcommunication takes place through a particular kernel, there are two channels (requestand response). In Figure 2.14 the NiK in network interface 1 has 6 channels, 4 for thenarrowcast connection (connection 1) and 2 for the simple connection (connection 2).Similarly the the NiK in network interface 2 also has 6 channels.

Finally, in Figure 2.15 we show what granularities of an interaction are visible atvarious components / interfaces. This is vital to understanding our debug infrastructureand how the various debug actions are performed.

2.5. NETWORK INTERFACE 15

IP C o r e

IP C o r e

R o u t e r

N I K e r n e lN I S h e l l s

N e t w o r k I n t e r f a c e

N I P o r t s

P r o t o c o l A d a p t e r s

I P P o r t

N I K e r n e l P o r t s

R o u t e r P o r t

I P P r o t o c o l s i g n a l s M e s s a g e s A e t h e r e a l p a c k e t f o r m a t

I P P o r t

Figure 2.13: Æthereal Network Interface.

IP C o r e 1

IP C o r e 2

R o u t e r

N I K e r n e l

N e t w o r k I n t e r f a c e 1 N e t w o r k I n t e r f a c e 2

N I K e r n e lN I S h e l l s N I S h e l l s

IP C o r e 3

IP C o r e 4

N a r r o w c a s t a d a p t e r

M u l t i - i n i t i a t o r a d a p t e r

C o n n e c t i o n 1



C o n n e c t i o n 1 - N a r r o w c a s t C o n n e c t i o n

C o n n e c t i o n 2 - S i m p l e C o n n e c t i o n

- C o n n e c t i o n

N I P o r t s N I P o r t s

- C h a n n e l

R e q u e s t c h a n n e l

R e s p o n s e c h a n n e l

Figure 2.14: Connections / channels in Æthereal.


IP C o r e 1

IP C o r e 2

R o u t e r

N I K e r n e l

N e t w o r k I n t e r f a c e

N I S h e l l sva l i d

va l i d

va l i d

va l i d

a c c e p t

a c c e p t

a c c e p t

a c c e p t

p a c k e t

p a c k e t

d a t a

d a t a

d a t a

d a t a

m e s s a g e

m e s s a g e

m e s s a g e

m e s s a g e

c h a n n e l i d

c h a n n e l i d

V i s i b l e G r a n u l a r i t i e s o f I n te rac t i ons

I P P r o t o c o l ,S i g n a l G r o u p ,S i g n a l ,T r a n s a c t i o n ,M e s s a g e , E l e m e n t

M e s s a g e P a c k e tC o n n e c t i o n C h a n n e l

C o m m a n d g r o u p

W r i t e g r o u p

R e a d g r o u p

C o m m a n d g r o u p

Figure 2.15: Visible granularities of Interactions

Debug 33.1 Introduction

With the increasing complexity of present day Integrated Circuits (ICs), errors in designstages are unavoidable. Building an error-free design may thus require multiple itera-tions. This adversely affects the total design-time for an IC. Also decreasing productlife-cycles make it imperative to minimize time-to-market. Figure 3.1 shows the possibleerrors (left column) in the different design phases (middle column) and which verificationtechniques (right column) are used to locate them. Despite all the verification at thedesign and manufacturing stages, some of the errors remain undetected. Debug is thenused for localization of these errors. Effective debug can thus help fast and accuratedetection of majority of the errors that may be present.

manufacturing errors

undetected design &manufacturing errors

undetecteddesign errors

design errorssimulation,formal methods

high levelsource

synthesis errors(e.g. timing, logic)

simulation, formal methods,timing verification

gate-levelnetlist

design ruleviolations

DRC (Design Rule Checker),LVS (Layout Vs. Schematic)

layout

manufacturing test

debug

Figure 3.1: Digital design flow(Source: [29]).

Despite all the existing pre-silicon verification and test methods, more than 60% ofthe designs contain errors in their first-silicon prototype [29]. This high percentagehighlights the fact that existing methodologies aren’t efficient enough to locate design

17

18 CHAPTER 3. DEBUG

and manufacturing errors in the prototype. The following reasons are cited in [29]:

• The pre-silicon verification methods are applied to a model of the IC. This modelmay not completely /accurately represent its actual physical behavior.

• If an accurate model is indeed made, then the computational costs involved hinderthe exhaustive verification using the available methods.

Hence in order to minimize the time-to-market the location of these undetected designand manufacturing errors in first-silicon becomes important. Design-for-Debug (DfD) hasbeen proposed as an effective means to achieve this [13, 36]. The debugging of a chipcan be compared to manufacturing tests but there are some major differences as outlinedbelow which further emphasize the importance of debug. In the testing environment,the test engineer would apply pre-defined test patterns through an Automated TestEquipment (ATE). The advantage with this methodology is that it is a lot easier to createa deterministic behavior than on an application board. Simulating the chip behaviorand recording the responses is easier but the responses are obtained when the chip isin test-mode and not in functional-mode. This isn’t quite the best scenario for finding/ reproduction of errors because some functional errors may not be visible in the testmode. In contrast, debugging of the chip is done in functional mode when it is part ofthe application board, in its operating environment where the probability of occurrenceof errors is highest.Three IC requirements are listed in [36] for an efficient, structured debug methodology,viz.

1. Access to the functional pins of the chips.

2. Access to the internal signals and memories of the chip.

3. Controlled execution of the chip.

For effective debugging, controllability and internal observability are vital. DfD mod-ules as part of the structured debug methodology [36] provide the on-chip debug in-frastructure for these. The observability could be real-time (by way of on-chip pins) orscan-based (state of the internal registers, flip-flops, etc. is scanned out).

• In real-time observability (Figure 3.2), internal signals are captured through ex-ternal pins or in an on-chip memory trace. Examples are Philips’ SPY Method[38] and DAFCA’s Logic Debug Module [11]. Although this methodology givesthe most accurate and up-to-date view of the chip state it suffers from scalability.Keeping in view that we propose to observe the network behavior which may havecomplex interactions and hence number of observable signals may be quite large.This will require either a large number of chip pins which is costly in terms ofsilicon area (multiplexers and trace memories) or a significant effort in selectingthe appropriate signals which best represent the internal state of the chip.

• On the other hand, a scan-based approach (Figure 3.3) provides more internalobservability as well as allows the debug engineer to control the functional behavior

3.2. DEBUG FLOW 19

of the chip. This gives him greater flexibility which can speed up error location.The downside is that each time the state is scanned out only a snapshot of thestate is obtained. Hence multiple snapshots are required in order to understandthe functional behavior of the chip and this could be time-consuming and may bedifficult to recreate / read out the state at the exact moment of sampling.

Figure 3.2: Real-time debug approach. In this scenario, internal signals are observed inreal-time via external on-chip pins.

Considering the pros and cons of both real-time and scan-based debug, the greaterscalability and control coupled with the re-use factor (the manufacturing test scan-chainscan be re-used for debug scan of internal state) that scan-based debug offers makes itmore attractive. Hence we have chosen to follow a scan-based debug strategy in ourproposed debug infrastructure. This satisfies the second requirement in an IC for anefficient and structured debug methodology.The third IC requirement for an efficient, structured debug methodology is the func-tionality for the debug engineer to control the execution of the chip. In debug this isdone by way of debug control actions like stop, single-step and continue. Traditionalcore-based debug does this for instructions being executed on the IP cores. For ourCommunication-centric debug strategy, we implement these debug control actions forthe communication taking place over the interconnect (explained later in Section 4.6).

3.2 Debug Flow

A scan-based debug flow is shown in Figure 3.4. Note that the resetting of the chip isonly a functional reset. The breakpoint is programmed and then the user waits until a

20 CHAPTER 3. DEBUG

Figure 3.3: Scan-based debug approach. In this scenario, everytime the chip reaches aquiescent state, the functional clocks can be stopped and the internal state read out.

breakpoint hit takes place. Then he can read out the internal state like flip-flop valuesand memory content.

Now we present our proposed debug flow as shown in Figure 3.5 which is a modifiedversion of the scan-based debug flow. Instead of only programming the breakpoint(as in scan-based debug flow), in our proposed debug flow the user also programs thedebug control actions. Through these debug actions a controlled execution of the chipis possible. Before reading out any of the internal component values, it has to be madesure that the chip is in a quiescent state (i.e. there are no more ongoing interactions inthe chip). Only after this can the functional clocks be safely stopped (without affecting/ altering any functional behavior) and debug clocks switched on in order to scan outthe internal state. The internal state like flip-flop values and memory content is readout. Then the user can program more debug actions (if he wants to debug further) andrepeat the cycle else debugging is complete.

3.3 Debug Granularity

In this section we explore the various visible granularity levels of communication, andtheir usefulness in the bigger picture of SoC debug.

For communication-centric debug, the debugging of the SoC is done by controllingthe interactions between the various IP cores. Interactions are visible at different

3.3. DEBUG GRANULARITY 21

f u n c t i o n a l r e s e t

p r o g r a m b r e a k p o i n t s

d o n e

r e a d o u t i n t e r n a l s t a t e

wa i t un t i l b reakpo in t h i t

f u n c t i o n a l r e s e t ( o p t i o n a l )

Figure 3.4: Traditional scan-based debug flow (Source: [29]).

f u n c t i o n a l r e s e t

p r o g r a m b r e a k p o i n t a n dd e b u g c o n t r o l a c t i o n s

d o n e

wa i t un t i lq u i s c e n t s t a t e o f ch ip

f unc t i ona l r ese t ( op t i ona l )

s w i t c h f r o m f u n c t i o n a l t o debug c l ock

s w i t c h b a c k t o f u n c t i o n a l c l o c k

p r o g r a m b r e a k p o i n t a n d d e b u g c o n t r o l a c t i o n s

r e a d o u t i n t e r n a l s t a t e

Figure 3.5: Proposed scan-based debug flow

22 CHAPTER 3. DEBUG

granularities at various interfaces. [17] gives a detailed description of these. At theinterface between the NoC and the IP cores the interaction can be viewed at thefollowing granularities: cycle, instruction, element, message, transactions. From theviewpoint of the IP core a cycle or instruction level granularity is most relevant foruseful debug. On the other hand at the network side of the interface; interactions atclock, element, message, transaction and other levels can be observed. Figure 3.6(b)shows these various granularities of control between the IP and the NoC. Within thenetwork itself, the interactions can be observed at various granularities shown in Figure3.6(a) which are visible at different components of the network.

Figure 3.6: (a) Granularity of internal NoC control. (b) Granularity of control betweenIP and NoC.

3.3. DEBUG GRANULARITY 23

Further in Section 4.6 we will explain which of the granularities are useful with respectto the locus of our debug control.

24 CHAPTER 3. DEBUG

Communication Centric Debug 44.1 Introduction

With the increasing complexity of present-day System-on-Chip and the drive to integrateeven more components to keep up with Moore’s law, building first-time error free designsis difficult. As already explained in Section 3.1, the need for debug of silicon has beennecessitated in order to accurately identify these undetected design and manufacturingerrors. Furthermore the early detection of design errors reduces the number of re-spinsrequired and hence reduces the time-to-market. The increased number of IP cores on achip means that complexity of such a chip is resident in communication between thesecores rather than in the computation taking place in them. Thus in order to debug suchsystems more effectively and help quick localization of errors; a Communication-centricdebug strategy has been proposed [17, 37] which complements the traditional core-based(Computation-centric) strategy [22, 23, 25] that monitors and debugs the multiple IPs.Figure 4.1(a) shows the Computation-centric debug strategy. The monitors are attachedto the IP cores. Breakpoints are programmed in these monitors which generate anevent on a breakpoint condition hit. The debug control then exercises control over thefunctional execution of the IP blocks. The IP blocks are stopped, their internal state isinspected and then the execution continued. This process is repeated until errors havebeen located. Figure 4.1(b) illustrates the complimentary Communication-centric debugstrategy. Here the interconnect is the debug focus. The monitors trigger and generateevents on breakpoint condition hit. The debug control will then control the functionalbehaviour of the interconnect. The interconnect can be stopped, its state observed andthen the interactions continued. Thus instead of observing independent behaviour of thevarious IP blocks in the SoC, communication-centric debug allows the user to observe thedifferent states of the IPs together in one place by way of the interactions between the IPblocks. The interconnect is the locus of these interactions and this is where we enforcedebug control. Furthermore, as shown in 4.1(b) the IP cores can still be monitored anddebug control enforced if is required.

4.2 Design choices

In this section we will elaborate on why certain design choices with respect to debugtechnique and the interconnect were made. In the proposed Communication-centricdebug strategy, the interconnect is at the heart of the debug actions performed. Wechoose to use Network-on-chip as our interconnect for the following reasons:

• NoCs are commonly considered to be a promising new type of interconnect. Theyare a scalable solution both for issues related to SoC interconnect for deep sub-

25

26 CHAPTER 4. COMMUNICATION CENTRIC DEBUG

Figure 4.1: (a) Computation-centric debug (b) Communication-centric debug(Source: [37]).

micron technologies and for concurrency in interactions between SoC IP blocks.

• A NoC-based solution for effective and efficient debug communication control canbe more readily ported to a single or multi-layered bus system, than the other wayaround.

• A NoC poses the maximum complexity with respect to parallelism, latency andscheduling. Hence choosing a NoC helps magnify any problems related to actualdebug control mechanisms for the interconnect.

NXP Semiconductors has developed a Network-on-Chip called Æthereal which was takenas the interconnect on which we developed a debug infrastructure. A NoC is a complexinterconnect with a lot of internal registers. To observe its internal state, the amount ofinternal state data would be quite large. In real-time debug, the number of observablesignals would be too large and hence not scalable. Scan-based debug on the other handoffers scalability and re-use of manufacturing test scan-chains. The only worry is thetime taken to scan out the internal state using the IEEE 1149.1 TAP. The test clockruns at 10 Mhz, hence to scan out around 38000 registers in an example NoC would takeapproximately 4 milliseconds. This is an acceptable speed considering it is the user whois going to observe this and then take an action depending on his diagnosis, which willtake much longer comparatively. Considering the above arguments, a scan-based debugwas chosen.

4.3 Debug Strategy for SoCs

In this section we present a debug strategy for localization of errors usingCommunication-centric debug. Figure 4.2 shows a flow diagram of this. The user startsby observing the interactions over the interconnect (in our case the NoC). With thedebug architecture we have developed and also the various granularities (Section 2.2) ofobservability, he can localize the cause of the error as either the NoC itself or one or more

4.4. LOCUS OF COMMUNICATION-CENTRIC DEBUG CONTROL 27

IPs. From here on, he can go on and debug the NoC itself or the IP using their built-indebug infrastructure (traditional core-based debug). The big advantage with using theNoC as the starting point for debug is that the level of examination can be raised frombits and cycles to elements / messages / transactions. This makes it easier for the userto interpret what is going on inside the IC and co-relate the states of the various IPsthrough their interactions. The higher abstraction levels also allow for comparison ofthe simulation results at a level (transaction) that is consistent for both hardware andsoftware.

Figure 4.2: Debug flow using Communication-centric debug

4.4 Locus of communication-centric debug control

The interconnect is at the heart of the communication-centric strategy’s debug controlactions. The errors are located by observing and controlling the interactions between thevarious IPs over the interconnect. As shown in Figure 4.3 the communication betweenthe IPs and the interconnect occurs between the IP port and the network interface port.The network interface port is connected to the Network Interface Shell. Hence it is in thenetwork interface shell that we implement our debug control intelligence. Furthermorethe communication between the IPs and the network interface shell takes place in theIP protocol suite. Our debug control is implemented by gating the appropriate control


signals of these protocol suites which as a result enforces the required debug control overthe interactions. IPs use various protocol suites like DTL [30], AXI [24] and OCP[28]. For our simulations and results we have used IPs that communicate using theDTL protocol. In the following section we will give a brief introduction of the protocolcommunication and the various signal groups.

Chip

N

N e t w o r k

S l a v e I P C o r e 2

M a s t e r I P C o r e 1




N e t w o r k I n t e r f a c e 3N I p o r t

I P p o r t

N I p o r t

N I p o r t

R o u t e r


I P p o r t

N I p o r t

N I S

N I S

N I S

N I S

N I S - N e t w o r k I n t e r f a c e S h e l l . ( T h i s i s w h e r e t h e d e b u g c o n t r o l i n t e l l i g e n c e i s i m p l e m e n t e d . )

- I n t e r f a c e s o v e r w h i c h w e e n f o r c e d e b u g c o n t r o l a c t i o n s . ( T h e c o m m u n i c a t i o n o v e r t h e s e i n t e r f a c e s t a k e s p l a c e i n t h e I P p r o t o c o l s u i t e , t h e c o n t r o l s i g n a l s o f w h i c h a r e g a t e d w h i l e e n f o r c i n g d e b u g c o n t r o l . )

Figure 4.3: Locus of communication-centric debug control.

Since we implement the debug control functionality at the network boundaries (inthe network interface shells), we implement this control at the granularities visible inthe network interface shell. The network interface shell is the network’s window to itsconnected IP cores and vice-versa. A globally consistent view of the SoC is obtainedat transaction level [17], hence control at a transaction level between a master-slaveIP pair follows naturally. For a transaction, there are four interfaces over which debugcontrol actions can be performed using the interconnect. These are shown in Figure 4.4by numbers 1 – 4. At each of these interfaces, control at a message or an element-levelis possible. In Section 5.4 we interpret these granularities in terms of programming ofthe proposed debug infrastructure and show the implementation results in Section 7.

4.5. DTL PROTOCOL 29

Figure 4.4: Debug control action interfaces (MNI-Master Network Interface, SNI-SlaveNetwork Interface).

4.5 DTL Protocol

DTL is an on-chip communication protocol developed by Philips. In our setup the IPcores communicate with the NoC using DTL protocol and hence DTL Protocol Adaptersare used in the network interface shells. In DTL, communication is always initiated by aDTL Initiator (which is the master) with a DTL Target. As shown in Figure 4.5 there arevarious signal groups. Important among these are the command, write and read groups.The command group is used to initiate a communication while the write and read areused to transfer the write and read data respectively. DTL is a handshake-based protocoland each of the signal groups have their independent handshake signals. The initiatorwhen wanting to initiate a communication uses the valid signal of the command signalgroup and the target responds with an accept when it is ready. Only then are the valuesof the various other signals assumed to be valid. In the setup in Figure 4.4, the MasterIP and the network interface communicating with the Slave IP (SNI) would act as DTLInitiators whereas the Slave IP and the network interface connected to the Master IP(MNI) would be DTL Targets. [30] gives a detailed description of the various datatransfer modes and the timing diagrams involving DTL communication.

4.6 Debug Control Actions

With reference to the proposed novel Communication-centric debug strategy [17, 37],we provide the following debug control actions to the user:

• Stop

• Continue

• Single-step

• Scan in/out internal data


Figure 4.5: DTL Signals (Source: [30]).

The NoC is the interconnect in our chip and during communication-centric SoC debugthe debug control actions are performed on communication that takes place between theNoC and the IP cores. In Section 6.2 we show how the various debug actions can beprogrammed in our debug infrastructure with the available options and that they sufficeto enable effective SoC debug (Section 7).

4.6. DEBUG CONTROL ACTIONS 31

Stop

For a stop on a IP-NoC interface; contrary to stopping as implemented in traditional IPcore debug where the functional clocks are gated, we gate the valid-accept handshake ofthe protocol. Figure 4.6 shows how a stop on the REQ (2) interface is completed withrespect to the gating of a valid-accept handshake for topology of Figure 4.4. On thefirst interface between the master and MNI (REQ (1)), there is no stop and as shown inFigure 4.6, both command and write data elements are transferred. But on the interfacebetween the SNI and the slave IP (REQ (2)) a stop is obtained. This is done by gatingthe cmd valid signal from the SNI to the slave IP as shown in Figure 4.6. In this casethe slave IP is ready to accept (cmd accept is asserted) but since there is not cmd validfrom the SNI no command transfer takes place. Since we want to stop the interactionon the interface REQ (2), the wr valid signal is also gated. This ensures that on theREQ (2) interface no elements will be sent and a message-level stop is achieved.On every IP-NoC interface such a valid-accept gating can be done per message or elementlevel. Also a stop on only interface REQ (1) in Figure 4.4 imposes a transaction-levelstopping. To sum up, stop can be achieved on each of the four interfaces (1–4) in Figure4.4 allowing for a transaction / message / element-level stop.

Continue

Another important debug functionality is the ability to continue a stopped SoC. Thisis complimentary to the stop functionality and both together give the debugger controlover the functional execution of the chip. The chip is functionally continued by undoingthe gating of the control signals (which was done to stop the chip) and is achievableat each IP-NoC interface. In Figure 5.19 we show the timeline for a continue after astop has been achieved. When the SNI asserts the cmd valid signal to the slave IP, theREQ (2) interface in Figure 4.4 a continue is achieved. Also the write data is transferred(wr valid is asserted). In this way the entire message is transferred from the SNI to theslave IP. Also at this point we can say that the write transaction that was started bythe master IP is complete.

Single-Step

Single-stepping can be viewed as the combination of the above two functionalities. Asingle-step operation is equivalent to issuing a continue with an implicit stop. Tradi-tional single-stepping is at a clock cycle level granularity. Our debug infrastructureallows single-stepping at a message or element-level granularity for each of the interfacesbetween the NoC and the IP cores (1 – 4) in Figure 4.4. A transaction-level single-stepcan be achieved at interface (1) in Figure 4.4. Single-stepping is achieved by undoing ofthe gating of the valid-accept handshake and then gating them again. A single-step canbe achieved independently for each interface between the NoC and the IP cores.

Scan out internal data

With any debug methodology, lack of internal observability is a key issue. In the scan-based methodology, the internal state of all the registers, flip-flops, memories is dumped


Sig

na

l G

rou

p

No

cm

d_

va

lid

sig

na

l is

se

nt

by

th

e S

NI

to t

he

sla

ve

IP

.th

is e

ve

n t

ho

ug

h t

he

sla

ve

is

re

ad

y(c

md

_a

cc

ep

t =

1).

No

wr_

va

lid

sig

na

l is

se

nt

by

th

e S

NI

to t

he

sla

ve

IP

.th

is e

ve

n t

ho

ug

h t

he

sla

ve

is

re

ad

y(w

r_a

cc

ep

t =

1).

Th

us

a s

top

is

ac

hie

ve

d.

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

Master IP

MNI

SNI

Slave IP

wr2

wr3

wr1

ad

dr

wr1

ad

dr

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

wr1

ad

dr

wr2

wr3

wr1

ad

dr

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

v=

a=

1 (

Ha

nd

sh

ak

e c

om

ple

te)

Tim

e (

t)

Figure 4.6: Timeline for a Stop (MNI - Master Network Interface, SNI - Slave NetworkInterface)

4.6. DEBUG CONTROL ACTIONS 33

Sig

na

l G

rou

p

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

Master IP

MNI

SNI

Slave IP

wr2

wr3

wr1

ad

dr

wr1

ad

dr

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

wr1

ad

dr

wr2

wr3

wr1

ad

dr

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

cm

d_

va

lid

cm

d_

ac

ce

pt

cm

d_

da

ta

wr_

va

lid

wr_

ac

ce

pt

wr_

da

ta

ad

dr

wr1

wr3

wr2

wr2

ad

dr

wr1

wr3

wr2

v=

a=

1 (

Ha

nd

sh

ak

e c

om

ple

te)

Tim

e (

t)

No

in

tera

cti

on

s

Sto

pC

on

tin

ue

Wri

te t

ran

sa

cti

on

c

om

ple

teW

rite

tra

ns

ac

tio

n

sta

rte

d

Figure 4.7: Timeline for a Continue (MNI - Master Network Interface, SNI - SlaveNetwork Interface)


into either an on-chip memory or to an external memory from where it can be accessedby the debug software and the bits reconstructed back to the desired level of abstraction(register-level, transaction-level, application-level, processor instruction-level, etc.). Inorder to scan-out the data a dedicated interconnect can be used. Alternatively the NoCitself can be re-used or the manufacturing-test scan chains which are accessible via theIEEE 1149.1 TAP [35]. Scan chains are already used during the testing for manufac-turing defects [1]. Hence the IEEE 1149.1 scan-based manufacturing test infrastructure(which consists of the scan chains, the TAP and its controller) is re-used during debugfor scanning out the internal state of the NoC instead of reusing the NoC or a dedicatedinterconnect [21, 31, 39, 37].

Chip

N

N e t w o r k

M 1M 6 M 4M 5 M 3 M 2M 9 M 7M 8

N o CM a s t e r I P C o r e 1

A f t e r S t o p

M 1M 6 M 4M 5 M 3 M 2M 9 M 7M 8 S i n g l e S t e p

C o n t i n u e

S c a n o u t

M 1M 6 M 4M 5 M 3 M 2M 9 M 7M 8







N I p o r t

N I p o r t

N I p o r t

N I p o r t

R o u t e r

Figure 4.8: Example illustrating the various debug actions over an IP-NoC interface

In Figure 4.8 we show the various debug actions at a message-level granularity be-tween an IP-NoC interface. On the occurrence of a stop at the interface, the messagesthat are still to be transferred to the network will remain in the IP and those in transitionwill end up in the network. In the example shown, message M6 was in transition between

4.7. EXAMPLE 35

the IP Core and the network edge when a stop signal was distributed and received byall edges of the network. Hence the following message M7, will stay in the IP Core asillustrated. On a message-level single-step action, the next message, here M7 is sent overfrom the IP to the network before another stop is enforced on the interface. Later acontinue action will restart the normal functional behaviour of the chip. In betweenthese action when the chip is in a quiescent state i.e. there are no more interactions /communication taking place over and in the interconnect (NoC), the internal state ofthe NoC can be scanned out via the TAP port. In this way various debug actions canbe performed centric to the communication infrastructure at various granularities. InSection 7 we show how these actions at some of the useful granularity levels are enforcedduring debug using our debug infrastructure.

4.7 Example

In this section we will illustrate what the debug actions translate to in terms of interactionon the interface between the IP and the NoC and the various granularities of debugcontrol for the SoC.In a NoC, we show the communication between the connected master-slave IPs by thefollowing scenarios:

1. Scenario 1: A simple connection (1 Master – 1 Slave)

2. Scenario 2: A narrowcast connection (1 Master – ≥ 1 Slave)

3. Scenario 3: Multi-initiator communication (≥ 1 Master – 1 Slave)

Any other scenario is a combination of the above scenarios. For every connection presentin the above communication scenarios, a Transaction / Message / Element-level granu-larity of debug control can be enforced. In the topology shown in Figure 4.9 two mastercommunicate with two slaves by setting up connection through the NoC as shown. Con-nection 1 is a narrowcast connection, while connection 3 is an example of a simpleconnection. Master IP 1 and 2 communicate with Slave IP 2 by setting up connections1 and 2 respectively to network interface 2 (NI 2). This is the scenario of multi-initiatorcommunication.

Scenario 1

For a simple connection (connection 3), the user can exercise control over interfaces5, 6, 7, and 8. For a transaction-level debug the valid-accept handshakes on requestinterface (5) are controlled. Consider a stop, no further transactions from master IP 2are accepted by the network interface (NI 4) (done by gating accept signals to the IP) andall unfinished transactions (so those already accepted by NI 4) are allowed to completei.e. all write transactions will be completed and for all read transactions the read datawill return back to master IP 2. Only then do we say that a stop is complete. On thesame interface (5), a message / element-level stop can also be achieved. During sucha stop, an ongoing message / element transfer on that interface will be completed andthe next message / element stopped from entering the network. This is done by gating


Chip

N e t w o r k

N I 1

N I 2

N I 3 N I 4








R e q u e s t ( 6 ) R e s p o n s e ( 7 )

R e q u e s t ( 1 )

R e q u e s t ( 2 )

R e q u e s t ( 5 )

R e s p o n s e ( 4 )

R e s p o n s e ( 8 )

R e s p o n s e ( 3 )

Figure 4.9: Example SoC showing connections setup

the accept signal of the appropriate signal groups. Further, on each of the remaininginterfaces too (6 – 8) a message / element-level stop can be achieved. This is done bygating the accept signal of the appropriate signal groups from the NIs on interface 7,and the valid signal of the appropriate signal groups from the NIs on interfaces 6 and 8.From a SoC point of view:

• A transaction-level stop is possible only on one specific interface (the request in-terface from the Master IP).

• Any combination (as thought useful by the user) of message / element-level stopfor each of the four interfaces can be achieved independently.

From the NoC point of view:

• For a simple connection - a transaction level stop can be obtained.

• For each channel of a simple connection - a message / element level stop is possibleon each end of the channel.

Additionally, continue and single-step debug actions are also possible at the desiredgranularity.

Scenario 2

This scenario shows a narrowcast connection (connection 1) between a single master(master IP 1) and two slaves (slave IPs 1 and 2). In our debug infrastructure, for everymaster-slave IP pair in a narrowcast connection a transaction-level debug control actionis achieved. A transaction-level stop is done by gating accept signals from the networkinterface (NI 1) to the master IP 1(over interface (1) in Figure 4.9). For example,

4.7. EXAMPLE 37

the user can stop transactions only for slave IP 2. This is done by gating the acceptsignals from the network interface (NI 1) to the master IP 1 when a transaction withdestination address as Slave IP 2 arrives. But since the transactions over the interface (1)are serialized, a stop on first transaction for slave IP 2 also blocks all transactions after it(even if they are for slave IP 1). All unfinished transactions (those already accepted byNI 1) for both the slave IPs (1 and 2) are allowed to complete. Furthermore, a message/ element-level stop is possible on each of the interfaces 1, 2, 3, 4, 6, 7 independently(again by gating the appropriate valid / accept signal). From SoC point of view:

• A transaction-level stop is possible in a narrowcast connection for every master-slave IP pair. But this stop is not independent i.e. a stop for one pair implies astop for all other pairs as well since all pairs have one common specific interface(the request interface from the Master IP) on which this stop is achieved.

• Any combination (as thought useful by the user) of message / element-level stopfor each of the six interfaces can be achieved independently.


• For a narrowcast connection - a transaction level stop is possible for each master-slave IP pair of that connection.

• For each channel of a narrowcast connection - a message / element level stop ispossible on each end of the channel.

Additionally, continue and single-step debug actions are also possible at the desiredgranularity.

Scenario 3

This scenario shows multiple master IPs communicating with a single slave IP. Eachmaster IP sets up a simple connection to the slave (connection 1 and 2). In our debuginfrastructure, a transaction-level debug control is possible for each of these connections.For a transaction-level stop, no transactions from master IP 1 to the network interface (NI1) (over interfaces (1) in Figure 4.9) and that from master IP 2 to network interface (NI4) (over interface (5) in Figure 4.9) are accepted by the respective network interfaces.This is done by by gating accept signals from the network interfaces. All unfinishedtransactions (those already accepted by NI 1 and NI 4) are completed. This is likestopping two separate connection at the same time. But the two connections involved inthe above stop can also be stopped independent of each other i.e. a transaction-level stoponly on one connection while the other one continues normally. A message / element-level stop is possible on each of the interfaces 1, 2, 3, 4, 5, 8 independently (again bygating the appropriate valid /accept signal). Suppose a message / element-level stop isachieved over interface (3) for those messages meant for master IP 1. As the messagesare serialized, a stop on the first message / element for master IP 1 blocks all furthermessages for master IP 2 as well from being accepted by the network interface (NI 2).From a Soc point of view:


• A transaction-level stop is possible for every master-slave IP pair. This is indepen-dent i.e. a stop for one pair does not imply a stop for all other pairs as well.

• Any combination (as thought useful by the user) of message / element-level stopfor the six interfaces can be achieved independently.


• For a multi initiator communication - a transaction level stop is possible for eachmaster-slave IP independent of each other. This is because each pair has a separateconnection.

• For each channel in a multi initiator communication - a message / element levelstop is possible on each end of the channel.

Additionally, continue and single-step debug actions are also achieved at the desiredgranularity. This scenario is a case of multiple simple-connections with the same desti-nation.

Debug Hardware Infrastructure 55.1 Overview

In order to implement the ideas of the previous chapters we introduce the followingdebug infrastructure:

• Monitors with their Breakpoint generators

• Event Distribution Interconnect (EDI)

• Test Point Registers (TPRs)

• Network Interface Shell (NI Shell)

• Debug Control Interconnect (DCI) and

• Debug Data Interconnect (DDI)

Figure 5.1 shows a block-level view of our debug infrastructure with the designeddebug components (the dotted modules) and their location and interaction with othercomponents of the SoC. The proposed debug methodology is communication-centric,hence no wonder most of the debug components are located in the communication in-frastructure (in our case the NoC). The TPRs can all be programmed via the IEEE1149.1 TAP. A Monitor Config TPR is instantiated for every Monitor present, whichallows the user to program a different breakpoint condition in each of the Monitors. TheBreakpoint Generator (BP Gen) is the actual hardware inside each Monitor that gener-ates a breakpoint hit pulse which is fed to its attached Stop Module. The Stop Modulesare instantiated per Router in the network and follow the topology of the routers. Thestop modules with their distribution network form the Event Distribution Interconnect(EDI) which distributes the generated breakpoint pulse to all the network components(routers, kernels and shells). It is in the network interface shells (specifically the finitestate machines (FSMs)) where the debug action decision is made. Further the test in-frastructure which consists of the Test Access Port (TAP) and its controller is used toprogram the TPRs / give an external stop, also know as the Debug Control Interconnect(DCI). While the Debug Data Interconnect (DDI) which again consists of the TAP, itscontroller and the inserted scan chains is used to scan out the internal state of the NoC.In the following sections each of the debug components; their architecture, functionality,properties, etc. is detailed.

5.2 Monitors

In [10] a method for automatic insertion of monitors into the Æthereal design flowhas been proposed. This gives 100% channel observability and can monitor each of the

39

40 CHAPTER 5. DEBUG HARDWARE INFRASTRUCTURE

C h i p w i t h B o u n d a r y S c a n H WT A P C o n t r o l l e r

T e s t A c e s s P o r t ( T A P )

C h i p

N

N e t w o r k







N I S h e l l

N I p o r t

N I p o r t

N I p o r t

N I S h e l l

N I S h e l l

N I S h e l l

N I p o r t

R o u t e r

N I K e r n e l

N I K e r n e l

N I K e r n e l

F S M

F S M

F S M

F S M

t o T P R c h a i n

N I S h e l l T P R

N I S h e l l T P R

N I S h e l l T P R

N I S h e l l T P R

M o n i t o rC o n f i g T P R

S t o pM o d u l e

B P G e n

Monitor

Figure 5.1: The Debug Infrastructure

router links at four levels of abstraction viz. physical raw, logical connection-based,transaction-based and transaction event-based. In our infrastructure, we use a verysimplified version of the monitors in [8] which are automatically generated per routerin the Æthereal design flow. They can be attached to any one of the router’s linksand monitor the raw data over these links. When the breakpoint condition (which isprogrammed in their associated Monitor Config Test Point Register (TPR)) is metthe monitor generates an active-high pulse which stays high as long as the breakpointcondition remains true.

Figure 5.2 shows the interface for the monitor used in our debug infrastructure.The monitor has a clk and a rst n as the standard inputs. The link data input isconnected to the router link which is to be monitored. This along with the data on themonitor config input, which is connected to the monitorconfig TPR; together are usedto determine a breakpoint hit and produce a pulse on the output pin monitor stop.This is connected to the EDI which distributes the event.

The internal structure of the monitor can be visualized as similar to one shown inFigure 5.3. A comparator compares the monitor config and the link data values, and

5.3. EVENT DISTRIBUTION INTERCONNECT (EDI) 41

c l k

r s t _ n

m o n i t o r _ c o n f i g [ 3 2 : 0 ]

l i n k _ d a t a [ 3 3 : 0 ]

m o n i t o r _ s t o pM o n i t o r

Figure 5.2: Monitor Interface, where the monitor stop is connected to the EDI,link data to the router link which is to be monitored and monitor config to themonitorconfig TPR which specifies the breakpoint condition.

outputs a ’1’ on monitor stop as long as the condition remains true.

m o n i t o r _ c o n f i g [ 3 2 : 0 ]

l i n k _ d a t a [ 3 2 : 0 ]

m o n i t o r _ s t o p

c l k

C o m p a r a t o r

Figure 5.3: Breakpoint Generation logic inside a Monitor

Figure 5.4 shows gate-level traces for a monitor. Here, on a breakpoint condition(when the link data is equal to the specified monitor config value, a pulse is generatedon monitor stop, which in this case is one clock cycle wide. The clock pulse is generatedone clock cycle after the specified data is seen on the router link since the output signalis generated on the internally registered value. In our example, a value of hx10000015Bhas been programmed as the breakpoint condition (value on monitor config). Whenthe link data value matches the one programmed a breakpoint hit pulse is generatedon monitor stop.

5.3 Event Distribution Interconnect (EDI)

The Event Distribution Interconnect (EDI) is used to distribute events from the eventgenerators (e.g. monitors, TAP controller), to various components of the SoC whichneed to respond to such events. The distribution of an event should take place as fast aspossible (ideally single-cycle delay) for the response to be immediate. This is required


Page 1 of 1

Printed by SimVision from Cadence Design Systems, Inc.Printed on Tue Jun 05 11:16:58 CEST 2007

Monitor − Breakpoint Hit

Cursor−Baseline = 63,911,365fs

Baseline = 45,336,000,018fs

Cursor = 45,399,911,383fs

clk

link_data

link_data_r

monitor_config

monitor_stop

rst_n

0

’h20000015C

’h100800001

’h10000015B

0

1

100800001 20000015A 10000015B 000000106 100800001 20000015C

000000106 100800001 20000015A 10000015B 000000106 100800001

10000015B

Marker 1 = 45.360533736us

45.34us 45.35us 45.36us 45.37us 45.38us 45.39us

Baseline = 45.336000018us

TimeA = 45.399911383us

Figure 5.4: Monitor gate-level waveforms for breakpoint hit

in order that the components responding do so as close in time to the actual eventtriggering them. But such an implementation suffers from scalability. Our implementedEDI broadcasts events through stop modules which are present per monitor. The EDIdistributes events at the functional frequency of the interconnect and the worst-casedelay (in number of cycles) is equal to the maximum depth of the stop module network(Delay of 1 cycle per stop module). The stop module network has the same topologyas the interconnect elements (in our case the routers of the NoC) which are monitored.This is required so that scalability is preserved and also to prevent any complex layoutand routing constraints in silicon.

Now we explain in detail our EDI implementation and its properties with theÆthereal NoC as the interconnect for the SoC. In our debug infrastructure, we use theEDI to distribute the stop signals to all the Network Interface Shells and to stop themfunctionally. This signal is locally (at every NiS) interpreted to be a stop at level ofmessage / element or just ignored. The stop signal is an active high pulse which isgenerated by one of the monitors on detection of a breakpoint hit or when an externalpulse is given through the TAP. This is then broadcast by the network of stop modulesto ensure the quick distribution to the network edges. A Finite State Machine (FSM)in the stop modules ensures that the stop signal wave travels only in one direction andoccurrence of multiple concurrent events does not create a standing wave as explainedlater in this Section.

EDI Properties:

• The stop modules are connected to the output links of all of its neighbours. Eachstop module on detecting an incoming pulse sends out a pulse on each of its outgo-ing interfaces one clock cycle later. But it is required that the stop modules ignoresome of the incoming pulses in order to prevent a standing wave. Consider thetopology shown in Figure 5.5. On a breakpoint hit (time cycle 1), the attachedstop module (SM 1) broadcasts this on all its outgoing interfaces in the next cycle(2). The connected stop module (SM 2) will see this incoming pulse and thenin the next cycle (3) broadcast a pulse on all its outgoing interfaces. Further on


in cycle 4, SM 1 will respond to an incoming pulse (broadcast in 3 by SM 2) bybroadcasting a pulse later in cycle 5. Thus the two stop modules will keep feedingeach other in a loop, even though the actual breakpoint hit condition may havegone. This is the creation of a standing wave. To prevent this the stop moduleswere designed to cancel out this wave. Each of the stop modules after respondingto an incoming pulse ignore any pulse received in the following cycle (as explainedbelow with the stop module FSM).

S M 1 S M 2N I 1

N I 2 N I 3

N I 4

N I - N e t w o r k I n t e r f a c e S M - S t o p M o d u l e

B r e a k p o i n t

2

2

2

3

3

3

4

54

4

5

5

1

Figure 5.5: Standing wave creation in the EDI

• If the condition for a breakpoint hit stays true for multiple clock cycles then atrain of pulses will be generated, though this train represents one event (Figure5.6). The generation of a train is a consequence of the way stop modules aredesigned in order to prevent creation of a standing wave. For every 3 clock cycles(the 3 clock cycles is due to the stop module FSM design and implementation) thebreakpoint hit remains high, a stop module generates one active-high pulse whichis one clock cycle long. Hence a breakpoint hit pulse which remains high for morethan 3 clock cycles is sub-sampled resulting in the generation of a train of pulses.These multiple pulses though correspond to a single event.

Figure 5.7 shows a generic stop module. The stop modules besides having clk andrst n signal inputs, are connected to the monitors via a monitor stop input. Themonitors use this to signal the occurrence of an event. Every stop module receives anincoming pulse either via monitor stop or from one of its neighbouring stop modules via


S t o p M o d u l e f u n c t i o n a l c l o c k

B r e a k p o i n t h i t

S u b - s a m p l e d E D I p u l s e

Figure 5.6: Sub-sampling of a breakpoint hit pulse

the input stop signals (stop in 0...N). The stop module then broadcasts this stop eventoccurrence to all its neighbouring modules (other stop modules and NIs) via output stopsignals (stop out 0...N). Also a stop module can receive a user stop signal, given viathe TAP through the jtag stop signal. But only one stop module is connected to theTAP and the remaining stop modules have their jtag stop signal tied low.

c l k

r s t _ n

m o n i t o r _ s t o p

j t a g _ s t o p

s t o p _ i n _ 0

s t o p _ i n _ N

s t o p _ o u t _ N

s t o p _ o u t _ 0

S t o p M o d u l e

Figure 5.7: Stop Module Interfaces, where N is the number of neighboring devices (otherStop Modules and NIs

The stop module FSM ensures that the:

1. Distribution of the stop signal behaves like a wave which travels only in one direc-tion.

2. Multiple breakpoint hits in time and / or place (if separated by three clock cycles)are distributed as separate pulses and the debug component in the NiSs (FSM)has the intelligence to interpret them as a stop signal or ignore it depending onwhether a previous stop and / or continue signal has arrived.

Figure 5.8 shows the Stop Module FSM. The stop module responds to an incomingstop signal only in state ’00’ and ’11’. State ’00’ is also the reset state for the stopmodule. After reset the stop module is in state ’00’ and it detects an incoming stopsignal. On the next clock cycle it transitions to state ’01’ and sends a signal to all itneighbours signalling the detected stop signal. Then in the next clock cycle it transitionto state ’10’ unconditionally. This state ’10’ ensures that the stop wave distributioncontinues only in one direction and cancels out any response wave due to broadcast.


Further the stop module FSM transitions to state ’11’ in the next clock cycle. Here itcan again respond to an incoming stop signal.

0 0

1 0

0 1

1 1

a

b

c

d

! r e s e t

a - ! ( m o n i t o r _ s t o p O R j t a g _ s t o p O R s t o p _ i n ) / s t o p _ o u t < = ’ 0 ’b - ( m o n i t o r _ s t o p O R j t a g _ s t o p O R s t o p _ i n ) / s t o p _ o u t < = ’ 0 ’c - nex t c l ock cyc le / s t op_ou t <= ’ 1 ’d - nex t c l ock cyc l e / s t op_ou t <= ’ 0 ’e - ( m o n i t o r _ s t o p O R j t a g _ s t o p O R s t o p _ i n ) / s t o p _ o u t < = ’ 0 ’f - ! ( m o n i t o r _ s t o p O R j t a g _ s t o p O R s t o p _ i n ) / s t o p _ o u t < = ’ 0 ’

0 0 - W a i t i n g f o r s t o p s i g n a l0 1 - S e n d o u t s t o p s i g n a l 1 0 - D o n o t h i n g 1 1 - D e t e c t m u l t i p l e c y c l e / e v e n t b r e a k p o i n t s

f e

Figure 5.8: Stop Module FSM, where stop in is the logical OR of all N neighbouringinput stop signals and stop out the output signal to all N neighbouring devices.

Figure 5.9 shows a trace for the stop module shown in Figure 5.1. The stop modulereceives a stop signal (a pulse on monitor stop), from its associated monitor whichgenerates a signal on a breakpoint hit. The stop signal stays high for one clock cycle andgenerates a pulse on each of its outputs (pulse on each of stop out) which is distributedby the EDI. The connected components then respond depending on the state they arein. If it is the stop module, then it either responds by further broadcasting a stop pulseor ignores the incoming pulse. The network interface shells receiving the stop pulse havethe intelligence to interpret it in the right way.Figure 5.10 shows the stop module behavior when a stop signal is given by the userthrough the TAP. Here since the pulse is given externally by the user the incoming stopsignal (jtag stop) may stay higher for multiple clock cycles. In this case a train of stoppulses is generated which are distributed to the network interfaces shells. The train isdue to the fact that the FSM of the stop module generates a pulse everytime it detectsan active-high input pulse (either on monitor stop or on jtag stop) separated by threeclock cycles. In this case, it is not two separate breakpoint hits but a single one whichremains high for multiple clock cycles. The minimum duration for which the user hasto assert the external stop is two functional clock cycles, but there is no constraint assuch for the maximum time. In Appendix A, we explain how these constraints have


Page 1 of 1


Stop Module − Monitor Stop


Baseline = 45,343,955,424fs

Cursor = 45,400,102,659fs

clk

jtag_stop

monitor_stop

rst_n

state_r

stop_out0

stop_out1

stop_out2

1

0

0

1

’b00

0

0

0

00 01 10 11 00

Marker 1 = 45.360476909us

45.34us 45.35us 45.36us 45.37us 45.38us 45.39us


TimeA = 45.400102659us

Figure 5.9: Stop Module waveforms for monitor stop

been calculated. As in the previous case, the connected network interface shells / stopmodules have the intelligence to interpret them in the right way.

Page 1 of 1


Stop Module − JTAG Stop


Baseline = 40,379,829,748fs

Cursor = 40,519,995,648fs

clk

jtag_stop

monitor_stop

rst_n

state_r

stop_out0

stop_out1

stop_out2

0

0

0

1

’b00

0

0

0

00 01 10 11 01 10 11 01 10 11 01 10 11 00

Marker 1 = 40.400025695us

40.4us 40.44us 40.48us


TimeA = 40.519995648us

Figure 5.10: Stop Module waveforms for external user stop through TAP

5.4 Test Point Registers (TPRs)

The Test Point Registers (TPRs) provide the programmability of the debug infrastruc-ture. By programming the TPRs, the user (the debugger) can program various break-points and also control the debug environment (like stopping, single-stepping and con-tinuing) during communication-centric debug. The TPR programming is a very potenttool which controls the underlying debug architecture which inturn controls the SoCfunctional behaviour in its target environment. In our debug infrastructure two TPRtypes are present viz.

1. Monitor Config TPR

2. Network Interface Shell TPR (NI-Shell TPR)

Now we will delve into the exact structure of these TPRs and their functions.

5.4. TEST POINT REGISTERS (TPRS) 47

Monitor Config TPR

The Monitor Config TPR is used to program the breakpoint generation hardware (themonitor) with the breakpoint condition. The programming is done via the IEEE 1149.1test access port (TAP) which is already present and used for manufacturing tests. Themonitor config port (which is the output port) of the Monitor Config TPR is connectedto the Monitor which uses it as explained in the Section 5.2. Figure 5.11 shows the mon-itor config TPR programming as done through the IEEE 1149.1 TAP. When tpr holdgoes low the value on tdi of the TAP starts shifting into the TPR (tpr enable is high)via tpr tdi. This is the start of the TPR programming (Marker 1 in the Figure). Theshifting of the value takes place synchronous to the debug clock (tck). As soon as theshifting phase is complete the tpr hold goes high which indicates that the shifting phaseis over. More importantly the value will remain stable as long as tpr hold is high. Thevalue is then programmed when both tpr hold and tpr update are high (Marker 2 inthe Figure) and is the update phase of the programming. This is reflected by the changein the value of monitor config at precisely this point in time. The value is then seenby the monitor which runs on the NoC functional clock. The separation between theshifting and the update phases allows for this safe crossover between clock domains andmeans that the Monitor Config TPR can be programmed when the NoC is functionallyrunning without causing glitches or false breakpoint triggers. A more detailed descrip-tion of programming via the IEEE 1149.1 TAP can be found in [35]. This is how theactual programming takes place in hardware. In Section 6.1 we explain how the userprograms this TPR.

Page 1 of 1

Printed by SimVision from Cadence Design Systems, Inc.Printed on Thu Jul 12 16:06:12 CEST 2007

Programming the MonitorConfig TPR

Cursor−Baseline = 13,216,729,030fs

Baseline = 25,784,519,078fs

Cursor = 39,001,248,108fs

JTAG Port

tck

tdi

tdo

tms

trstn

Monitor Config TPR

monitor_config

tpr_se

si

so

tpr_bypass

tpr_config

tpr_enable

tpr_hold

tpr_update

tpr_tdi

tpr_tdo

tpr_tck

ip_stop

se

si

so

stop_condition

stop_enable

stop_granularity

continue

link_data

link_data_r

link_data_r_34

monitor_config

monitor_stop

se

si

so

dtl_rst_n

0

0

Z

0

1

’ h 1 0

0

0

0

0

0

1

1

0

0

0

0

z

0

0

0

’ b 0 0

’ b 1 1

’ b 0 1

’ b 0 0

’ b 0 1

’ b 0 0

’ b 0 1

’ b 1 0

0

0

0

0

1

000000000 100000357

0000

1100

0100

0000

000000000000000000000000000000000 100000000000000000000001101010111

Marker 2 = 37.021587186usMarker 1 = 26.183299695us

26us 28us 30us 32us 34us 36us 38us


TimeA = 39.001248108us

Figure 5.11: Programming of the Monitor Config TPR


NI-Shell TPR

The NI-Shell TPR is a data register which provides the user with all the debug controlover the interconnect interactions (in our case the NoC). Every network interface shell hasan NI-Shell TPR associated with it. In each of these NI Shell TPRs, every channel of thenetwork interface has one bit associated with it for every field of the NI Shell TPR (Figure5.12). By programming the various NI-Shell TPRs the user can achieve transaction /message / element-level debugging by programming operations like stop, single-step andcontinue on a per channel granularity. Although the decisions for the debug actions aretaken in the NI shell FSM (in Section 5.5 we explain this), the programmed values in theNI-Shell TPR dictate them. The NI-Shell TPRs are programmed in the same way as theMonitor Config TPR also through the IEEE 1149.1 test access port (TAP). In Section6.1 we explain the user programming of this TPR. The structure of an NI-Shell TPR

s t o p _ e n a b l e [ 1 : 2 N ] s t o p _ g r a n u l a r i t y [ 1 : 2 N ]s t o p _ c o n d i t i o n [ 1 : 2 N ] c o n t i n u e [ 1 : 2 N ]

# o f R e q u e s tC h a n n e l s

# o f R e s p o n s e C h a n n e l s

W I D T H = 1

i p _ s t o p







N = # R e q u e s t c h a n n e l s = # R e s p o n s e c h a n n e l s

S t o p E n a b l e C o n t i n u eS t o p C o n d i t i o n I P S t o p S top

G r a n u l a r i t y

1 : N N + 1 : 2 N 2 N + 1 : 3 N 3 N + 1 : 4 N 4 N + 1 : 5 N 5 N + 1 : 6 N 6 N + 1 : 7 N 7 N + 1 : 8 N 8 N + 1

Figure 5.12: The internal structure of the NI-Shell TPR, which is imperative to knowduring programming in order to be able to programme the right value for the desiredcontrol.

can be visualized as shown in Figure 5.12. It consists of 5 main fields:

1. Stop Enable: This field dictates whether or not interactions / data on a particularchannel are stopped. This field is as wide as the total number (request + response)of channels in its associated NI-Shell. One bit is coupled with the stop behaviorfor each channel.

• A ’0’ means that the communication on the channel does not stop evenwhen a stop hit (i.e. a stop pulse is received from the EDI (stop r) ora software stop has been programmed (stop condition[i] as explained in’Stop Condition’ field later) for that particular channel occurs. Scenarios ’A’and ’C’ in Figure 5.13. In scenario ’A’ there is no stop enabled for that channel(stop enable[i]is low) and no stop hit (stop is low), hence communicationis not stopped (accept is still high). In scenario ’C’ even though there is astop hit (stop is high), communication continues (accept is still high) becausestop is not enabled (stop enable[i] is low).

• A ’1’ on the other hand enables stop, for the channel that the bit correspondsto i.e. the interactions / data on that particular channel can be stopped;


depicted by scenarios ’B’ and ’D’ in Figure 5.13. In scenario ’B’ stop has beenenabled (stop enable[i] is high), but a stop does not occur (accept is stillhigh) because a stop hit has not occurred (stop is low). In scenario ’D’, a stopoccurs (accept goes low) since both stop has been enabled (stop enable[i]is high) and stop hit has occurred (stop is high).

0 1 0 2 0 3 0 4 0 5 0 6

c l o c k

va l i d

a c c e p t

d a t a

s t o p

s t o p _ e n a b l e [ i ]

STOP ENABLE

0 7 0 8 0 9 1 0

A B C D

A

B

C

D

- N o S t o p o c c u r s

- N o S t o p o c c u r s e v e n w h e n s t o p e n a b l e i s a s s e r t e d , b e c a u s e n o S t o p h i t o c c u r r e d

- N o S t o p o c c u r s e v e n w h e n S t o p H i t h a s o c c u r r e d , b e c a u s e S t o p e n a b l e i s n o t a s s e r t e d

- S t o p o c c u r s o n l y w h e n b o t h S t o p E n a b l e i s h i g h a n d S t o p H i t h a s o c c u r e d

s t o p = s t o p _ r O R s t o p _ c o n d i t i o n [ i ]

Figure 5.13: Explains the function of Stop Enable field in the NI-Shell TPR

2. Stop Condition: Provided the stop has been enabled (stop enable[i] is high)for the channel, the channel will stop either in response to a stop pulse from theEDI (stop r) or even in the absence of such a pulse. This depends on the valueprogrammed in this field (stop condition[i]). Like the Stop Enable field, this isalso as wide as the number of channels present in its associated shell and reservesone bit for each channel.

• A ’0’ means the channel stop occur only after a pulse from the EDI has beenreceived. Scenarios ’A’ and ’B’ in Figure 5.14. In scenario ’A’ no stop occurs(accept is still high) because no pulse is received from the EDI (stop r islow). On the other hand in scenario ’B’ a stop takes place (accept goes low)as a stop pulse is received from the EDI (stop r is high).

• A ’1’ means the channel will be stopped unconditionally. So the first elementto occur on that channel after the programming of the Stop Condition field


(stop condition[i] is high) will be stopped irrespective of whether or nota stop pulse arrived from the EDI (stop r), provided only that the stop isenabled for that channel (stop enable is high). Scenario ’A’ in Figure 5.15.Here a stop occurs (accept is low) since an unconditional stop has beenprogrammed (stop condition[i] is high). This is despite the absence of astop pulse from the EDI (stop r is low).

This field gives the user the flexibility to either wait for a stop pulse from the EDI(i.e. on a breakpoint hit or an external stop) before the stop happens or programa channel to be stopped (a software programmed stop), which is an unconditionalstop. There are two purposes / reasons for this providing field:

• Incase of a really long transaction, the user can stop NoC by programmingthis field without waiting for transaction to complete.

• A single-step consists of a continue followed by an implicit unconditional stop.This field is used to achieve the implicit stop as explained in Section 6.2.

0 1 0 2 0 3 0 4 0 5 0 6

c l o c k

va l i d

a c c e p t

d a t a

s top_enab le [ i ]

STOP CONDITION

0 7

A B

A

B

- N o S t o p o c c u r s w h e n t h e r e i s n o E D I p u l s e ( s t o p _ r i s l o w )

- S t o p o c c u r s o n l y a f t e r a p u l s e f r o m t h e E D I i s r ece i ved ( s top_ r i s h i gh )

s t o p _ r

s top_cond i t i on [ i ]

A b o v e t w o r e f e r e n c e p o i n t s i l l u s t r a t e t h a t w h e n S t o p C o n d i t i o n f i e l d i s d e a s s e r t e d , s t o p o c c u r s o n l y a f t e r a p u l s e i s r e c e i v e d f r o m t h e E D I .

s t o p


Figure 5.14: Behaviour when Stop Condition field is de-asserted in the NI-Shell TPR


c l o c k

va l i d

a c c e p t

d a t a

s top_enab le [ i ]

STOP CONDITION

0 1

A

A - N o S t o p o c c u r s w h e n t h e r e i s n o E D I p u l s e

s t o p _ r

s top_cond i t i on [ i ]

A b o v e r e f e r e n c e p o i n t i l l u s t r a t e s t h a t w h e n S t o p C o n d i t i o n F i e l d i s a s s e r t e d , s t o p c a n o c c u r e v e n w h e n t h e r e i s n o p u l s e f r o m t h e E D I .


s t o p

Figure 5.15: Behaviour when Stop Condition field is asserted in the NI-Shell TPR

3. Stop Granularity: In addition to the functionality of stopping channels, our de-bug infrastructure provides the user with the option of programming the granular-ity of the stop. In other words on what granularity should the ongoing interactionbe interrupted and stopped.

• A ’0’ allows the ongoing interaction to complete at the message level. Thismeans that the entire ongoing message is accepted before a stop occurs. Sce-nario ’B’ in Figure 5.16. A stop hit occurs at ’A’ (stop goes high), but astop occurs (accept goes low) only after the ongoing message transfer is com-plete and the next message is not accepted (’B’). This is because the stopgranularity is message-level (stop granularity[i] is low).

• A ’1’ can be programmed for a more urgent stop. This will mean that a stopoccurs at a much lower granularity (element-level). Scenario ’B’ in Figure5.17. A stop hit occurs at ’A’ (stop goes high), and since the stop granularityis element-level (stop granularity[i] is high) a stop occurs immediately(accept goes low) at ’B’.


m e s s a g e

A B

c l o c k

va l i d

a c c e p t

d a t a

d a t a _ l a s t

S T O P G R A N U L A R I T Y

B

s top_enab le [ i ]

s t o p

A b o v e r e f e r e n c e p o i n t i l l u s t r a t e s t h a t w h e n S t o p G r a n u l a r i t y f i e l d i s d e a s s e r t e d , s t o p o c c u r s o n l y a f t e r o n g o i n g m e s s a g e t r a n s f e r i s c o m p l e t e .

s top_g ranu la r i t y [ i ]

0 10 1 0 2 0 3 0 4 0 5 0 6


A

- S t o p o c c u r s o n l y a f t e r o n g o i n g m e s s a g e t r a n s f e r i s c o m p l e t e

- s t op h i t occu rs

Figure 5.16: Behaviour when Stop Granularity field is de-asserted in the NI-Shell TPR

4. Continue: Besides the ability to stop the interactions, the counter-ability to con-tinue stopped interactions is equally important in the debug scenario. Both to-gether give the user the power to observe the functional behavior of the SoC in acontrolled fashion during debug. The Continue field also has one bit reserved perchannel. The Continue field is interpreted differently from the three before. Inthe previous cases a ’0’ or a ’1’ written in the TPR is treated as the value itselfand registered as the same value inside the shell as well (specifically in the FSM).But in case of continue, writing a ’1’ in the TPR causes an active-high signal(continue ni[i]) to be fed to the shell. On continuing, the shell then resets thesignal value through the set-reset logic. This high pulse on the continue ni[i]signal is interpreted as a single continue pulse for that channel.

With reference to Figure 5.18, when a ’1’ is programmed in the continue field for aparticular channel (continue[i]), the set-reset logic (Set-Reset Logic (1)) out-puts a ’1’ to the NI Shell (signal continue ni[i] is high). The shell FSM thenresponds to this and continues the channel when all functional conditions for a con-tinue are true. It also sends an active high reset pulse (signal continue reset[i])to the set-reset logic. This pulse is one clock pulse wide and it resets the output ofthe set-reset logic (signal continue ni[i]) to ’0’. As a result, every time a userwants to continue a particular channel he has to program a ’1’ at the appropriatebit in the continue field. Also when a continue takes place, the registered value


0 1 0 6

c l o c k

va l i d

a c c e p t

d a t a

d a t a _ l a s t

S T O P G R A N U L A R I T Y

B

B

s top_enab le [ i ]

s t o p

A b o v e r e f e r e n c e p o i n t ( B ) i l l u s t r a t e s t h a t w h e n S t o p G r a n u l a r i t y f i e l d i s a s s e r t e d , s t o p o c c u r s i m m e d i a t e l y , e v e n b e f o r e t h e o n g o i n g m e s s a g e t r a n s f e r i s c o m p l e t e .

s top_g ranu la r i t y [ i ]

0 2 0 3 0 4 0 5


A

A

- S t o p o c c u r s a f t e r c o m p l e t i n g c u r r e n t e l e m e n t , w i t h o u t t h e o n g o i n g m e s s a g e t r a n s f e r c o m p l e t i n g

- s t op h i t occu rs

Figure 5.17: Behaviour when Stop Granularity field is asserted in the NI-Shell TPR

which indicates that a stop hit has occurred (signal stop) is reset. This is done inorder that the continued channel will now stop again only if, either another pulseis received from the EDI(stop r) or an unconditional stop has been programmed(stop condition[i] is asserted).Scenario ’A’ and ’B’ in Figure 5.19 explain the continue behaviour. At ’A’,no continue happens (accept is still low) as no continue pulse has been re-ceived (continue ni[i] is low). But as soon as a continue pulse is received(continue ni[i] is high) and functional conditions are valid (valid is high anddata is available on data) a continue takes place (accept goes high). Also a resetpulse is sent out (continue reset[i] goes high)as depicted in Scenario ’B’. Thispulse (continue reset[i]) also resets the continue (continue ni[i] goes low).

5. IP Stop: Every NI-Shell TPR also has one IP Stop bit which enables the NI Shellto gate the clock domains in the connected IP Core. This will also functionallystop the IP core and will allow us to stop all the components (the interconnectand the IPs) of a SoC at a state which are much closer in time with respect toeach other. Otherwise only stopping the interconnect without the IPs means that


N E T W O R K

N I S h e l l

F S M

N I S h e l l T P R

con t i nue_n i [ i ]

con t i nue_ rese t [ i ]

s e t

r e s e t

o u t p u t

. . . . . . . .

con t inue [ i ]

S e t - R e s e t L o g i c ( 1 )

con t i nue_ tp r [ i ]

s e t

r e s e t o u t p u t

S e t - R e s e t L o g i c ( 2 )

s t o p

c l k

c l k

s t o p _ r O R s top_cond i t i on [ i ]

Figure 5.18: Continue operation

the state of the IPs will have advanced far ahead as they have continued internaloperations.

• A ’1’ programmed allows for the IP Core connected to the NI Shell associatedwith the NI-Shell TPR to be stopped functionally.

• A ’0’ on the other hand would not stop the clock domains of the connectedIP Core. This could also be a scenario when a continue action is desired ifthe core was previously stopped.

5.5 Network Interface Shell (NI Shell)

The Network Interface Shell (NI Shell) is one of the most important component of thedebug infrastructure. It is the NI Shell which actually implements the debug controlaction which is programmed by the user. As seen in Figure 5.1 both the NI Shell TPRand the EDI (the stop module) connect to the NI Shell. The stop module feeds intothe network interface shell and signals every pulse on the EDI to it. In our debuginfrastructure, a pulse on the EDI means a stop (either via a breakpoint hit or givenby the user via the TAP). But because of the broadcast nature of the EDI (i.e. it onlybroadcasts every pulse it receives and does not take a decision), the use of the EDI canbe extended in future. The NI Shell TPRs (as explained in Section 5.4) are programmed

5.5. NETWORK INTERFACE SHELL (NI SHELL) 55

0 3

c l o c k

va l i d

a c c e p t

d a t a

s top_enab le [ i ]

CONTINUE

A B

A

B

- N o C o n t i n u e w i t h o u t c o n t i n u e p u l s e- C o n t i n u e o c c u r s o n l y a f t e r c o n t i n u e p u l s e h a s b e e n r e c e i v e d a n d f u n c t i o n a l c o n d i t i o n s a r e t r u e

s t o p

con t i nue_n i [ i ]

A b o v e t w o r e f e r e n c e p o i n t s i l l u s t r a t e t h a t c o n t i n u e c a n t a k e p l a c eo n l y w h e n a c o n t i n u e p u l s e i s g i v e n .

0 4 0 5

con t i nue_ rese t [ i ]


Figure 5.19: Explains the function of Continue field in the NI-Shell TPR

by the user with the debug control action desired. The network interface shell, makesa decision depending on the values fed by these two components and implements thedebug control. It is in the Finite State Machine (FSM) that this decision is made.The FSM cyclically steps through various functional states. For some decision-makingfunctional state there is a corresponding mirror state (Figure 5.20). When a stopdecision is made, the FSM steps into the mirror state (which is a pause state). Thisis the transition from state A to A’. During this transition, the control signals on theNI-IP interface are gated and hence no further communication can occur with the IP aslong as the FSM stays in the mirror state (A’). In order for the FSM to come out of itspause state the user has to program a continue action (detailed in Section 5.4). This isthe transition from state A’ to B. A continue action forces the FSM to transition backinto a functional state so that the communication on the NI-IP interface can proceed as


in normal functional mode. Besides a continue action being programmed by the user,the behavioral conditions for a transition to the functional state should also be satisfied(i.e. the conditions for transition on edge 1).

A

B C

A’

A, B, C

A’

- F u n c t i o n a l S t a t e s o f t h e F S M

- M i r r o r S t a t e

1

s

1’

3

2

1 , 2 , 3 - Cond i t i ons fo r func t iona l t rans i t ion

s - Cond i t i on fo r t r ans i t i on to m i r ro r s ta te( s t o p a c t i o n p r o g r a m m e d a n d s t o p c o n d i t i o n s a r e s a t i s f i e d )

1 ’ - Cond i t i on fo r t r ans i t i on ou t o f m i r ro r s ta te( c o n t i n u e a c t i o n p r o g r a m m e d a n dc o n d i t i o n s f o r t r a n s i t i o n o n e d g e 1a re sa t i s f i ed )

Figure 5.20: NI Shell FSM (Mirror State transitions)

Below we detail the FSMs in a Narrowcast Shell and a Multiconnection Shell. Anarrowcast shell is the most generic case of a shell in a master network interface (MNI)while a multiconnection shell is the one for a shell in a slave network interface (SNI).

Narrowcast Shell

Figure 5.21 shows a block diagram view of the master network interface of Figure 4.4.On the request channel, there is a finite state machine (FSM 1) which decides whether toaccept a command / write data element offered by the master IP depending on the valuesprogrammed (for request channels) in its associated NI Shell TPR and the presence /absence of a pulse on the EDI. Another finite state machine (FSM 2) is present on theresponse channel. This decides whether to send a read data element to the master IPwhich requested it depending on the values programmed (for response channels) in itsassociated NI Shell TPR and the presence / absence of a pulse on the EDI. The FIFO inthe shell is used to buffer the channel ID of the requests sent by the master IP for whicha response is expected. This ensures that the responses are sent back to the master IPin the same order that requests for them were sent.

The FSM for request channels of a narrowcast shell is shown in Figure 5.22. Itconsists of some functional states and some mirror states.The following are the states corresponding to those in Figure 5.22:


Figure 5.21: Narrowcast Shell (in the FIFO shown the channel IDs of unfinished readrequests are buffered)

A: waiting for command message from master IPB: received command valid from master IPC: command accept is given to the master IPD: request command message is a read (read data is expected as response from theslave)E: request command message is a write (hence write data must follow). If there aremultiple data elements to be transferred then FSM stays in this state till last dataelement transfer is complete.B’: Stop mirror state (for a command element)E’: Stop mirror state (for a write data element)The transitions:f1 - new command element i.e. cmd valid is highf2 - new command element transfer is complete i.e. cmd accept is highf3 - new command is a read requestf4 - message transfer complete (command element only, since it is a read command)f5 - new command is a write requestf6 - write data element transfer complete (wr valid and wr accept are high) but requestmessage has multiple write data elementsf7 - write data element transfer complete (wr valid and wr accept are high) and wasthe last data element of request messages1 - new command element for channel to be stopped AND stop has arrived i.e.(cmd valid is high, stop enable [i] = 1) AND (stop = 1) where stop = (stop r ORstop condition[i])c1 - continue pulse is sent for the stopped channel AND f2 i.e. (continue ni [i] = 1)


AND f2s2 - new write data element for channel to be stopped AND stop granularity = elementAND stop has arrived i.e. (data valid is high, stop enable [i] = 1) AND (stop granularity= 1) AND (stop = 1)c2 - continue pulse is sent for the stopped channel AND f6 i.e. (continue ni [i] = 1)AND f6c3 - continue pulse is sent for the stopped channel AND f7 i.e. (continue ni [i] = 1)AND f7

Figure 5.22: Narrowcast Shell FSM (Request channels) - ’FSM 1’ in Figure 5.21

The FSM for response channels of a narrowcast shell is shown in Figure 5.23. Itconsists of some functional states and some mirror states.

The transitions:f1 - next channel ID has been read from the FIFO.valid-accept handshake with FIFO complete.f2 - read data element transfer complete (rd valid and rd accept are high) but responsemessage has multiple read data elementsf3 - read data element transfer complete (rd valid and rd accept are high) and elementwas the last data element of response message


Figure 5.23: Narrowcast Shell FSM (Response channels) - ’FSM 2’ in Figure 5.21

s1 - new read data element for channel to be stopped AND stop has arrived AND (stopgranularity = element OR first element of message)i.e.(stop enable [i] = 1) AND (stop = 1) AND (stop granularity = 1 OR blk size =max blk size) where stop = (stop r OR stop condition[i]), blk size = number of elementsof the message still to be transferred, max blk size = total number of elements in themessage.c1 - continue pulse is sent for the stopped channel AND f2 i.e. (continue ni [i] = 1)AND f2c2 - continue pulse is sent for the stopped channel AND f3 i.e. (continue ni [i] = 1)AND f3

Multiconnection Shell

Figure 5.24 shows a block diagram view of the slave network interface of Figure 4.4.There is just one finite state machine (FSM) in this shell. On an incoming message fromthe NiK, the FSM decides whether to offer it as a command / write data element to theslave IP depending on the values programmed (for request channels) in its associated NIShell TPR and the presence / absence of a pulse on the EDI. If the command message isa read request, then the FSM will wait until it receives the read data elements from the


slave IP. Then again depending on the values programmed (for response channels) in itsassociated NI Shell TPR and the presence / absence of a pulse on the EDI. As there isonly one FSM, the requests which will initiate a response from the slave IP cannot bebuffered.

Figure 5.24: Multiconnection Shell

The FSM for response channels of a narrowcast shell is shown in Figure 5.25. Itconsists of some functional states and some mirror states.

The following are the states corresponding to those in Figure 5.25:A: waiting for message from NiK to send to slave IPB: send command valid to the slave IPC: received command valid from slave IP and request is a writeSince request command message is a write hence write data must follow. If there aremultiple data elements to be transferred then FSM stays in this state till last dataelement transfer is complete.D: received command valid from slave IP and request is a readSince request command message is a read hence wait for read data from the slave IP. Ifthere are multiple data elements to be transferred then FSM stays in this state till lastdata element transfer is complete.B’: Stop mirror state (for a command element)C’: Stop mirror state (for a write data element)D’: Stop mirror state (for a read data element)The transitions:f1 - new message received from the NiK and new command element sent to slave IP(cmd valid is high)f2 - IP responds by accepting it (cmd accept is high) and the command sent is a write


Figure 5.25: Multiconnection Shell FSM - ’FSM’ in Figure 5.24

request.f3 - IP responds by accepting it (cmd accept is high) and the command sent is a readrequest.f4 - write data element transfer complete (wr valid and wr accept are high) but requestmessage has multiple write data elementsf5 - write data element transfer complete (wr valid and wr accept are high) and wasthe last data element of request messagef6 - write data element transfer complete (rd valid and rd accept are high) but requestmessage has multiple write data elementsf7 - write data element transfer complete (rd valid and rd accept are high) and was thelast data element of request messages1 - new command element for channel to be stopped AND stop has arrived i.e.(cmd valid is high, stop enable [i] = 1) AND (stop = 1) where stop = (stop r ORstop condition[i])s2 - new write data element for channel to be stopped AND stop granularity = elementAND stop has arrived i.e. (data valid is high, stop enable [i] = 1) AND (stop granularity= 1) AND (stop = 1)s3 - new write data element for channel to be stopped AND stop granularity = elementAND stop has arrived i.e. (data valid is high, stop enable [i] = 1) AND (stop granularity


= 1) AND (stop = 1)c1 - continue pulse is sent for the stopped channel AND f2 i.e. (continue ni [i] = 1)AND f2c2 - continue pulse is sent for the stopped channel AND f3 i.e. (continue ni [i] = 1)AND f3c3 - continue pulse is sent for the stopped channel AND f4 i.e. (continue ni [i] = 1)AND f4c4 - continue pulse is sent for the stopped channel AND f5 i.e. (continue ni [i] = 1)AND f5c5 - continue pulse is sent for the stopped channel AND f6 i.e. (continue ni [i] = 1)AND f6c6 - continue pulse is sent for the stopped channel AND f7 i.e. (continue ni [i] = 1)AND f7

5.6 Test Access Port (TAP)

The IEEE 1149.1 TAP is not a part of the debug infrastructure that has been designed,but nevertheless plays a vital role in allowing the user to fully exploit the debug controloptions provided. The TAP along with the TAP controller (Figure 5.26) is used as theDebug Control Interconnect (DCI). On the other hand the Debug Data Interconnect(DDI) consists of the TAP, its controller along with the manufacturing scan chains.The DCI is used for programming of the TPRs or when providing an external stoppulse via the attached stop module. On the other hand, when the internal state of theinterconnect is being read out, the TAP along with the scan chains act as the DDI. TheTAP is the only window for the user to observe / program various internal componentsof the SoC. Also the reuse of the TAP and the manufacturing scan chains, meansthat the actual cost of the debug architecture in terms of area is only limited to thecomponents like Monitors, TPRs, Stop Modules and additional logic in the NetworkInterface shells.

Now we illustrate in further detail the connectivity of TAP to the various debugcomponents in the SoC (Figure 5.26). The TAP controller is essentially the componentwhich converts the TAP signals (tck, trst n, tms, tdi, tdo) to and into the appropriatechip internal signals (tck, tdi, jtag stop, tpr tdo, tcb tdo, dbg so, etc.) so thatcorrect debug actions are performed according to the TAP instruction given by the user.The user programming via the TAP in order to achieve below described actions by usingthe TAP instructions is explained later in Section 6.1.

• Programming a TPR: All the TPRs in our debug infrastructure form a single chainwhich starts (signal tdi) from the TAP controller and also ends (signal tpr tdo) atthe TAP controller as shown in Figure 5.26. When programming a particular TPRin the chain of TPRs, the value on the tdi signal of the TAP is shifted throughuntil it is in the correct TPR (the shifting phase). Then this value is actuallyprogrammed in the update phase as previously explained in Section 5.4. With the

5.6. TEST ACCESS PORT (TAP) 63

Figure 5.26: TAP and its associated infrastructure

clock domain crossing taken care of (also explained in Section 5.4), the TPRs canbe programmed both when the NoC is running in functional mode and in debugmode. If a Monitor Config TPR is programmed, the monitor will then generatean event trigger when the programmed breakpoint condition is met. Incase of aNI Shell TPR, the various fields are interpreted as explained in Section 5.4 andappropriate action taken in the NI Shell FSMs.

• Giving an external stop pulse: An external pulse given by the user is fed by theTAP controller to the attached stop module which distributes it through the EDI.The TAP feeds the external pulse to the stop module over the jtag stop signal.This external stop pulse is given when the NoC is in functional mode. There arecertain constraints on the duration of this external stop pulse as stated in Section5.3.

• Switching from / to debug and functional clocks: The TAP controller is also con-nected to a Test Control Block (TCB) and the Clock Control Slice (CCS) as shownin Figure 5.26. The CCS is the module that provides the clock for the NoC. Ittakes as input both the clock from the clock generator and the debug clock (tck).Depending on the value programmed in the TCB either one of these clocks is fed to


the NoC. For detailed information on what are the different programmable valuesof the TCB, how they are programmed and its resultant behaviour please refer toSection 7 in [29].

• Scanning out internal state of NoC: The DDI is used to scan out the internal stateof the NoC. This action takes place in the debug mode i.e. NoC is fed the debugclock (tck) by the CCS instead of it functional clock. In this mode, all the internalscan chains are concatenated into one long shift register. Since NoC is in debugmode, the TAP’s tck signal is applied to the functional flip-flops in the NoC. Thiscauses the shift register to shift out its state on the tdo output pin of the TAP onsubsequent tck clock cycles.

5.7 Debug Flow Automation

The generation, instantiation and connecting all the debug components with the otherSoC components has been integrated with the automated Æthereal design flow. TheNI-Shell TPRs are instantiated with their width depending on the number of channelsin each NI shell. The Stop Modules are generated for every Router that is instantiatedin the network. One of the Stop Module is connected to the TAP. Also Monitor ConfigTPRs are generated for every monitor that is specified in the network.

Debug Software Infrastructure 66.1 User programming via the TAP

Figure 6.1: Setup for performing control actions via the IEEE 1149.1 TAP

In our setup, we use the Philips tool Incide to perform all our actions via the TAP.Figure 6.1 shows this setup. Incide runs, on the PC which is hooked up to the TAPof the SoC. The user specifies the actions desired in a Tcl script (in the form of TAPinstructions), which is invoked by Incide and as a result programs the appropriate hard-ware inside the SoC. Numerous instructions are available to the user through which hecan specify the control actions as described below. The various actions that the user canperform via the IEEE 1149.1 TAP are:

• Reset: In order to repeat the debug session, the user should be able to control thereset of the chip via the debugger software. The DBG RESET instruction achievesthis. When this instruction is given the TAP generates a reset signal which is

65

66 CHAPTER 6. DEBUG SOFTWARE INFRASTRUCTURE

combined with the functional reset of the chip through some additional logic, thusenabling the debug user control over the functional reset of the chip.

• Programming the breakpoint: Programming of the breakpoint is done by program-ming the Monitor Config TPR. The user specifies the breakpoint condition (inour case the link value) which is written to this register. This is achieved throughthe PROGRAM TPR instruction. Here the user has to specify the TPR name and thevalue to be programmed as arguments with this instruction.

• Programming the debug control actions: The various debug control actions (stop,single-step and continue) are programmed by programming the NI-Shell TPR. ThePROGRAM TPR instruction achieves this. Like in the previous case, the user has tospecify the NI-Shell TPR name and the value to be programmed in it.

• Giving an external stop pulse: The user can also give an external stop pulse viathe TAP. This gives the user the flexibility to send a stop pulse externally via theTAP. He can then stop the SoC without programming a breakpoint and waitingfor a breakpoint hit to happen before a stop is initiated in the NoC. Using theJTAG STOP instruction, the user can give a stop pulse which is fed by the TAPcontroller to the Stop Module connected to it. This stop pulse will be one debugclock (tck) cycle wide. The EDI then distributes this stop pulse to all connectedcomponents of the NoC.

• Switching from / to debug and functional clocks: The TAP controller is also con-nected to the Test Control Block (TCB) and the Clock Control Slice (CCS). TheCCS feeds the clock to the various NoC functional components. Through properprogramming of the TCB, the user can choose which clock is fed to the NoC (thefunctional clock which comes from the clock generators or the debug clock whichcomes from the TAP). [39] gives a detailed description of exact nature of program-ming the TCB. The PROGRAM TCB instruction is used to program the TCB withthe value to be programmed as the argument.

• Scanning out internal state of NoC: The scanning out of the internal state of theNoC (flip-flop and memory content) is done by re-using the scan chains that areinserted during manufacturing tests. In order for this, first the functional clocks areswitched off. This is necessary so that any functional communication is not upset.Then the internal state is scanned out from the scan chains via the TAP. Activatingthe scan chains when the functional clock is running may cause glitches in the NoCdata which will corrupt the system state. After switching off the functional clockthe debug clock (tck) is fed to the NoC by the CCS. The DBG SCAN instruction isthen used which results in scanning out the internal data.

6.2 Use of Debug Infrastructure

We control the SoC functional behavior by controlling the interaction of the IP coreswith the communication infrastructure, hence only the communication at the boundarybetween the network and the IP cores needs to be controlled. The Network Interfaces

6.2. USE OF DEBUG INFRASTRUCTURE 67

(more specifically the Network Interface shells) form this interface between the networkand the IP cores. The NI Shells are built such that they have the necessary intelligenceto control the interaction in the presence / absence of an EDI pulse (stop r) accordingto the actions (stop, single-step, continue) which have been programmed by the user.The user can program these actions via the NI-Shell TPR which is instantiated for everyNI Shell. The programming of the NI Shell TPR is done by the user via the DCI usingthe IEEE 1149.1 TAP. We will further detail the exact nature of this programming andthe granularity of control this imposes.

• Program a breakpoint: This is done in the Monitor Config TPR. The breakpointcondition will be programmed (monitor config) by the user via the DCI using theIEEE 1149.1 TAP.

• Stop: In order for a stop to occur, first and foremost the Stop Enable field in theNI-Shell TPR should be set (stop enable[i] =’1’) for that particular channel.But just enabling a stop is not sufficient. A stop occurs after a stop pulse from theEDI is received or unconditionally for that channel depending on Stop Conditionfield (stop condition[i]). So if the Stop Condition is set (stop condition[i]=’1’) then a stop for the channel will occur even in the absence of an EDI stoppulse else (stop condition[i] =’0’) a stop will take place after a stop pulse isreceived from the EDI. The urgency or granularity of stop depends on the StopGranularity field (stop granularity[i]). If this is not set (= ’0’) then a message-level stop will occur (i.e. the ongoing message transfer if any, will be allowed tocomplete) else in the situation that it is set (=’1’) an element-level stop occurs.This means that the ongoing element transfer if any is allowed to complete beforea stop occurs.

• Continue: A channel is continued (or transitioned out of its stop (mirror) stateinto functional state) by setting the corresponding Continue field (continue =’1’).When the NI-Shell detects that a continue is programmed, it transitions the FSMout of the stop state, into functional mode.

• Single-step: As explained previously in 4.6, a single-step is a continue actionfollowed by an implicit stop. This is achieved as follows. A continue happens onsetting the appropriate bit of the Continue field (continue[i] =’1’). If at thatmoment the Stop Condition field is also set (stop condition[i] =’1’) then thismeans an unconditional stop follows this continue action which in effect is a single-step. Like in a stopping scenario, the granularity of a single-step depends on theStop Granularity field (stop granularity[i]). For an element-level single-step(i.e. allow one more element to be transferred before stopping again) this field isset (stop granularity[i] =’1’) otherwise a message-level single-step (i.e. allowone more message to be transferred before stopping again) occurs.

• Scanning out internal state of the NoC: Since we use a scan-based debugapproach, the internal flip-flop and memory content in NoC is scanned out inorder to observe the state of the NoC. This is done using the DDI. After the NoCis in a quiescent state (there is no more communication taking place inside the


NoC), the functional clocks are switched off by programming the TCB through theTAP. Then the internal state of the NoC is scanned out via the DDI which runsat debug clock frequency.

Figure 6.2: Interesting SoC debug points (MNI-Master Network Interface, SNI-SlaveNetwork Interface).

The interfaces numbered 1–4 in Figure 6.2 denote the key points of debug control inour infrastructure for SoC debug. For each port of a network interface an NI-Shell TPRis associated which defines the debug actions (stop, continue and single-step) for each ofthem independently. If a transaction-level debug is required then only the interactionat request (REQ) interface between the Master IP and the TNI (denoted by 1) shouldbe controlled. This will give a master-view of the traffic over the entire SoC. Incase ofmultiple masters, all such request interfaces for each of the master should be controlled.Moreover a message-level as well as an element-level debug view is possible at each ofthe interfaces. A very fine granularity (per-channel) of control is available to the userdue to the presence of control bits per channel in the NI-Shell TPRs. Though presentinfrastructure only allows controllability of interactions of the NoC with its externalcomponents (the IP cores), this is sufficient for initial debug of the SoC.

6.3 Debug Flow

1. SoC Running (Initial Condition)

2. Programming of TPRs through TAP controller. (user)

• NI-Shell TPR - to program the various debug actions

• Monitor TPR - to program breakpoint hardware inside the monitor.

3. Update TPRs. (user)

4. Reset the chip (functional reset) - optional. (user)

5. Breakpoint condition occurs (detected by the monitor) / pulse given through theTAP controller, and EDI communicates this to all NIs by pulses. (hardware)

6.3. DEBUG FLOW 69

6. NI Shells detect the pulse (for stopping in this case) and take action depending onthe state they are in (FSM state) and the values programmed in the Stop Enableand Stop Granularity fields of the NI-Shell TPRs. (hardware)

7. NI Shells which have stop enabled, transition to stop (mirror) state. (hardware)

8. Detect / interpret whether or not the NoC has stopped all communication i.e. itis in a quiescent state. (user)

9. While NoC hasn’t stopped do Steps 10 to 15.

10. Reprogramming of the TPRs. (user)

• Set Stop Enable (=1 for all channels) and Stop Granularity (=1 for allchannels, fastest possible stop) fields in all NI-Shell TPRs.

11. Update TPRs. (user)

12. Give a pulse through the TAP controller to the stop module. (user)

13. This pulse is distributed by the EDI to all the NI Shells. (hardware)

14. NI shells detect the pulse and those that haven’t stopped transition to stop (mirror)state while those which have stopped do not react. (hardware)

15. Detect / interpret whether or not the NoC has stopped all communication. (user)

16. If all the NoC communication has stopped (i.e NoC is in a quiescent state) thenthe internal state of the NoC can be scanned out via the IEEE 1149.1 TAP. (user)This is done as follows:

• First, switch the functional clocks to debug clock. This is done by program-ming the Test Control Block (TCB) using the TAP controller.

• Then the internal state is scanned out through the TAP.

17. Reprogramming the TPRs (user)

• Program a 1 in the Continue field for those channels which the user wantsto continue functionally, and by programming the Stop Condition field anormal continue or a single-step can be enforced.

• Also Stop Enable and Stop Granularity fields for various channels can bereprogrammed according to the action that is desired.

Note: when the Continue bits for the various channels are programmed by theuser, he /she has to make sure not to stall the network and ensure that the desiredcontinue action can actually complete.

18. Update TPRs (user)


19. NI Shells detect the change in the various NI-Shell TPRs and depending on thevalue that has been programmed in the Continue field, transition out of the stopstate (provided the data for transfer does not stall it further) and resume furtheroperation. (hardware)

20. Continue action (hardware)

• If a Single-step has been programmed (Stop Condition = 1) then dependingon the Stop Granularity the NI Shells transition through the FSM andreturn back to the stop state.GoTo Step 15.

• If a normal continue was programmed (Stop Condition = 0) then

* If Stop Condition = 0 for all the channels (and a Continue was pro-grammed for all channels then GoTo Step 1.

* Else GoTo Step 5.

Results 7In this chapter we will show the results.

7.1 Programming the TPRs

In this section we show the gate-level traces of programming for the Monitor Config TPRas done via the IEEE 1149.1 TAP. Figure 7.1 below shows how this is done.

Page 1 of 1

Printed by SimVision from Cadence Design Systems, Inc.Printed on Thu Jul 12 16:06:12 CEST 2007

Programming the MonitorConfig TPR

Cursor−Baseline = 13,216,729,030fs

Baseline = 25,784,519,078fs

Cursor = 39,001,248,108fs

JTAG Port

tck

tdi

tdo

tms

trstn

Monitor Config TPR

monitor_config

tpr_se

si

so

tpr_bypass

tpr_config

tpr_enable

tpr_hold

tpr_update

tpr_tdi

tpr_tdo

tpr_tck

ip_stop

se

si

so

stop_condition

stop_enable

stop_granularity

continue

link_data

link_data_r

link_data_r_34

monitor_config

monitor_stop

se

si

so

dtl_rst_n

0

0

Z

0

1

’ h 1 0

0

0

0

0

0

1

1

0

0

0

0

z

0

0

0

’ b 0 0

’ b 1 1

’ b 0 1

’ b 0 0

’ b 0 1

’ b 0 0

’ b 0 1

’ b 1 0

0

0

0

0

1

000000000 100000357

0000

1100

0100

0000

000000000000000000000000000000000 100000000000000000000001101010111

Marker 2 = 37.021587186usMarker 1 = 26.183299695us

26us 28us 30us 32us 34us 36us 38us


TimeA = 39.001248108us

Figure 7.1: Programming of the Monitor Config TPR

When tpr hold goes low the value on tdi of the TAP starts shifting into the TPR(tpr enable is high) via tpr tdi. This is the start of the TPR programming (Marker1 in the Figure). The shifting of the value takes place synchronous to the debug clock(tck). As soon as the shifting phase is complete the tpr hold goes high which indicatesthat the shifting phase is over. More importantly the value will remain stable as long astpr hold is high. The value is then programmed when both tpr hold and tpr updateare high (Marker 2 in the Figure) and is the update phase of the programming. This isreflected by the change in the value of monitor config at precisely this point in time.The value is then seen by the monitor which runs on the NoC functional clock. Theseparation between the shifting and the update phases allows for this safe crossover

71

72 CHAPTER 7. RESULTS

between clock domains and means that the Monitor Config TPR can be programmedwhen the NoC is functionally running without causing glitches or false breakpointtriggers. A more detailed description of programming via the IEEE 1149.1 TAP can befound in [35]. This is how the actual programming takes place in hardware. The usercan enforce this by using the PROGRAM TPR instruction as explained in Section 6.1.Further in Figure 7.2 we show the programming of a NI Shell TPR.

Figure 7.2: Programming of the NI Shell TPR

7.2 EDI stop pulse distribution

Figure 7.3 shows the traces for a stop module when it receives a stop pulse from themonitor, while Figure 7.4 shows the waveforms for a stop module when an externalstop is given to it via the TAP. A stop module reacts to an incoming stop signal(monitor stop is high) only when it is in state ’00’ (state r)and then it transitions tostate ’01’. Then in the next clock cycle it transitions to state ’10’ and then outputs asignal one clock pulse wide on the output ports viz. stop out0..N.

7.3. DEBUG CONTROL ACTIONS IN THE SHELLS 73

Page 1 of 1


Stop Module − Monitor Stop


Baseline = 45,343,955,424fs

Cursor = 45,400,102,659fs

clk

jtag_stop

monitor_stop

rst_n

state_r

stop_out0

stop_out1

stop_out2

1

0

0

1

’b00

0

0

0

00 01 10 11 00

Marker 1 = 45.360476909us

45.34us 45.35us 45.36us 45.37us 45.38us 45.39us


TimeA = 45.400102659us

Figure 7.3: Stop Module gate-level waveforms for monitor stop

Page 1 of 1


Stop Module − JTAG Stop


Baseline = 40,379,829,748fs

Cursor = 40,519,995,648fs

clk

jtag_stop

monitor_stop

rst_n

state_r

stop_out0

stop_out1

stop_out2

0

0

0

1

’b00

0

0

0

00 01 10 11 01 10 11 01 10 11 01 10 11 00

Marker 1 = 40.400025695us

40.4us 40.44us 40.48us


TimeA = 40.519995648us

Figure 7.4: Stop Module gate-level waveforms for external user stop through TAP

7.3 Debug Control Actions in the shells

We will show debug control at various granularities in a Master Network Interface(MNI). Figure 7.5 shows an overall picture of the debug flow as seen for gate-levelsimulations in its shell. The NI Shell TPR is initially programmed (stop is enabled forsome channel). Then after a functional reset the NoC restarts communication. Thisfunctional reset is done in order to ensure that a breakpoint condition that has beenprogrammed is not missed. When the breakpoint hit occurs the EDI distributes this toall the network interfaces in the network. On arrival of a stop pulse from the EDI, thecommunication through the shell shown in Figure 7.5 stops due to the gating of valid /accept handshake. The shell is then in a quiescent state. Now the user can reprogramthe TPRs, read out internal state via the TAP or do a combination of these steps.In thewaveform shown in Figure 7.5, a continue is programmed and hence as can be seen, theshell continues in its functional behavior.

In Figure 7.6, we show a stop on a request channel from master to MNI from theMNI’s point of view. The stop has been enabled by programming the appropriate bitin the NI Shell TPR. On arrival of a stop pulse from the EDI, the accept signals for the


Figure 7.5: Waveform for debug flow in a MNI

command and write signal groups are gated (i.e. no more accepts are sent to the masterIP). But the communication on the response interface continues. Hence a stop does notoccur immediately. This will depends on the number of commands that have alreadybeen accepted and are pending, and also how the interfaces between the SNI and theslave IP has been programmed.

Figure 7.7 is the most complex scenario and shows almost all the possible debugactions programmable. First, stop is enabled for one of the request channels. Then onarrival of an EDI stop pulse the shell stops. Then the stop condition field is set toenforce an unconditional stop. But the shell is already stopped so only after a continuepulse is given, the shell steps ahead by communication unit and then stops again. Thisis a single-step. Single-stepping at a message and element-level granularity is obtaineddepending on the the value programmed in the stop granularity field as is shownin Figure 7.7. Finally when the stop granularity field is reset, then the functionalbehavior continues normally when another continue pulse is given. This will continueuntil another EDI pulse is seen or stop condition field is programmed again.

In Figure 7.8 we show how only the response channel between the MNI and masterIP is stopped when the requst channel continues normally. Also it can be seen that ,even though the stop from EDI arrives when a response message is being sent, the stop

7.4. AREA COST AND SPEED 75

Figure 7.6: Request Stop in a MNI

does not occur immediately since the stop granularity is set to message (stop granularity= ’0’).

Finally in Figure 7.9 we show actions similar to those shown in Figure 7.7 but thistime on the response channel.

All the above described scenarios are also implemented at the interface between theSNI and the slave IP. The shells there (in the SNI) also have the same intelligence andare able to perform these debug control actions.

7.4 Area Cost and Speed

Figure 7.10 shown below was used during simulations and synthesis to obtain thevarious waveforms shown in previous sections of this chapter. The example SoC consistsof 4 IP cores (2 masters and 2 slaves) which communicate by setting up the connectionsas shown in the figure. I synthesized the SoC, with the NoC running at 125 MHzand 250 MHz both with and without debug infrastructure that has been designed anddeveloped.


Figure 7.7: Request Stop / Single-step / Continue in a MNI

In Table below we show the actual running speed after synthesis and area numbersfor some different cases.

Target Speed and De-bug for SoC

Area (no. of blocks)Running Speed of NoC(Mhz)

125 MHz without debug 991893.44 (core area) 142.186125 MHz with debug 1005559.62 (network area) 135.943250 MHz without debug 1009201.50 (core area) 250.013250 MHz with debug 1033722.94 (network area) 250.000

Since the level at which the synthesis takes place is different for SoC with and withoutdebug we cannot give actual percentage increase in area when the debug infrastructureis included. For SoC without debug the core is synthesized, whereas for one with debughardware inserted the network is synthesized. Eventhough we cannot comment on thepercentage increase in area of the NoC / SoC when the designed debug infrastructure isadded, we can certainly comment on the location and complexity of area cost. Most ofthe additional area cost is associated with the NI-Shell TPRs and the additional statesand logic that has been added in the shells. For every channel in the network, we add 8bits in the NI-Shell TPR plus another 8 registers in the shells. Hence for the TPRs andthe shells, the increase in area cost is linear with the increase in the network interfaces.

7.4. AREA COST AND SPEED 77

Figure 7.8: Response channel stop in a MNI

Additional area cost is also due to the EDI which consists of the stop modules. The stopmodules have the same topology as the router network and one stop module associatedper router. Hence the increase in area complexity for the stop modules is also linear withthe increase in the number of routers in the network. So overall for the entire network,the increase in area is linear in the increase in the network size.


Figure 7.9: Response Stop / Single-step / Continue in a MNI

Chip

N e t w o r k

N I 1

N I 2

N I 3 N I 4








R e q u e s t ( 6 ) R e s p o n s e ( 7 )

R e q u e s t ( 1 )

R e q u e s t ( 2 )

R e q u e s t ( 5 )

R e s p o n s e ( 4 )

R e s p o n s e ( 8 )

R e s p o n s e ( 3 )

Figure 7.10: Example SoC used during simulation and synthesis

Conclusions 88.1 Conclusions

The ever increasing complexity of present day Integrated Circuits (ICs) means thaterrors in the first design iteration are unavoidable. Building an error-free design maythus require multiple design iterations. Effective debug can aid in reducing the number ofiterations (and time to market) with fast and accurate detection of majority of the errorsthat may be present. Additionally shrinking feature size means that greater number ofIP cores can be integrated on a single IC, effectively shifting the complexity of the ICfrom the IP cores to the interconnect. Communication-centric debug has been proposedas a debug strategy where the interconnect of the SoC is used to debug the ICs. In thisstrategy, raising the abstraction level from clock cycles to a higher level (like transactions)allows for a consistent view in both hardware and software. This makes it easier tointerpret the actual functional behaviour of the IC and thus might help locate errorsfaster.

A debug infrastructure is built in order to facilitate communication-centric debug.This infrastructure allows the user to both control and observe the functional behaviourof the chip. By reuse of some of the manufacturing test infrastructure we have triedto limit the increase in SoC area. The generation of debug hardware components thathave been designed is integrated with the Æthereal design flow. Finally, we have shownby simulation how this infrastructure is used to actually perform communication-centricdebug. In this thesis we have also proposed a debug flow for SoCs which combines bothCommunication-centric debug and the traditional core-based debug. Further we proposea few thoughts for future work, one of which is necessary for the complete debug flow tobe implemented while the others merely facilitate a richer user experience.

8.2 Future Work

The present debug infrastructure allows for Communication-centric debug of the SoC.But in order to have a comprehensive debug setup a few additional features can beincorporated.

• After a breakpoint hit, a stop pulse is distributed throughout the network to stopthe communication taking place in the SoC. After a stop has been detected, allthe interfaces of the NoC may not stop interactions depending on the actionsprogrammed for each of them in their corresponding TPRs. Even if they do stopthen it may not all happen at the same instance in time. Hence a polling mechanismis required which can poll the network and determine whether or not all interactions

79

80 CHAPTER 8. CONCLUSIONS

in the network have stopped and that it is in a quiescent state. In this polling stage,NoC internal registers which reflect end-2-end flow control for the connections canbe polled. The NI internal registers like credits / remote buffer space (Figure 8.1can be read out during polling and a decision made based on these values. Onlythen can we safely switch from functional clock to debug clock in order to obtaina statedump.

Figure 8.1: Example registers that can be polled to decide on NoC quiescent state. [32]

• The existing monitors used can only observe raw link data. Though this is useful ata very low debug abstraction level (like clock-level), a higher level programming ofprogramming is desired especially when the debug itself is at a higher abstractionlevel. Hence monitors which allow breakpoints to be set for transactions on variousinter-IP communications will allow for this higher-abstraction level of programmingof breakpoints.

• The debug infrastructure that has been designed allows only for NoC externaldebug (between the NoC and the IP interfaces). This is useful for the SoC debug.The various IPs will have their own debug architectures to debug them stand-alone. But as yet there is no explicit debug infrastructure to debug the NoC itself.A debug infrastructure to debug the NoC itself needs to be developed. This canbe achieved by way of extensions to the present one. For example, the routersand network kernels can also be designed to respond to pulses from the EDI. Withbuilt-in intelligence like the NI Shells; stop, single-step and continue functionalitiescan be incorporated in these components. TPRs similar to the NI-Shell TPR canhelp the user program the debug control actions for each of the routers and networkinterfaces.

• Although the TPRs and stop modules are instantiated per NI-Shell/Monitor androuter respectively, it is notable that their concatenation into a single scan chainmay not always be optimal with respect to minimal routing length of wires.Forexample, Figure 8.2 consists of a four stop module network. The numbering of the

8.2. FUTURE WORK 81

stop module indicates their instantiation order. Presently, during concatenationinto a single scan chain the order along the red-line (numbering order 1-2-3-4)would be followed. Instead it would be more efficient to follow the shortest pathalong the topology, which would minimize the routing length. In our case theorange-line (numbering order 1-2-4-3). The same also holds true for concatenationof all NI-Shell TPRs. A more topology-aware algorithm may be implemented infuture to iron out this inefficiency.

S t o p M o d u l e 1

S t o p M o d u l e 3 S t o p M o d u l e 4

S t o p M o d u l e 2

Figure 8.2: Shows the scan-chain concatenation order for a stop-module network.

• The statedump file presently obtained can be back-annotated to the abstractionof the various internal registers. It would be interesting to investigate the back-annotation to the level of messages and transactions. Here the debugger wouldfor example get a view of where certain transactions / messages / elements are inthe network. Figure 8.3 what this could possible look like. The network has twoconnections viz. 1 and 2. Then a back-annotation would tell the debugger in whichcomponent each of the message for a connection are. In our example, message 1of connection 1 (M 11) is in network interface 2 (NI 2). The second message ofthe same connection (M 12) is partly in router 2 (R 2) and the rest in networkinterface 1 (NI 1). The third message (M 13) is in network interface 1 (NI 1).

82 CHAPTER 8. CONCLUSIONS

Chip

N e t w o r k

S l a v e I P C o r e 2M a s t e r I P C o r e 1


N I 1

N I 2

N I 3N I p o r t

N I p o r t

N I p o r t

N I p o r t

R 2

R 1

M 1 1

M 2 4M 2 3

M 2 2

M 2 1

M 1 2

M 1 3 M 1 2

C o n n e c t i o n 1 - M a s t e r I P C o r e 1 t o S l a v e I P C o r e 2C o n n e c t i o n 2 - M a s t e r I P C o r e 2 t o S l a v e I P C o r e 2

M 1 2 - 2 n d M e s s a g e o f C o n n e c t i o n 1 .

N I - N e t w o r k I n t e r f a c eR - R o u t e r

Figure 8.3: High-level back annotation from statedumps.

Bibliography

[1] M Abramovici, M.A. Breuer, and A.D. Friedman, Digital Systems Testing andTestable Design., 1990.

[2] ARM, AMBA specification. rev. 2.0, 1999.

[3] , Multi-layer AHB. overview, 2001.

[4] Edith Beigne, Fabien Clermidy, Pascal Vivet, Alain Clouard, and Marc Renaudin,An asynchronous NOC architecture providing low latency service and its multi-leveldesign framework, Proc. Int’l Symposium on Asynchronous Circuits and Systems(ASYNC), 2005.

[5] Davide Bertozzi and Luca Benini, Xpipes: A network-on-chip architecture for gi-gascale systems-on-chip, IEEE Circuits and Systems Magazine (2004), 18–31.

[6] Tobias Bjerregaard, The MANGO clockless network-on-chip: Concepts and imple-mentation, Ph.D. thesis, Informatics and Mathematical Modelling, Technical Uni-versity of Denmark, DTU, 2006.

[7] Evgeny Bolotin, Israel Cidon, Ran Ginosar, and Avinoam Kolodny, QNoC: QoS ar-chitecture and design process for Network on Chip, Journal of Systems Architecture50 (2004), no. 2–3, 105–128, Special issue on Networks on Chip.

[8] Calin Ciordas, Basten, Twan, Andrei Radulescu, Kees Goossens, and Jef van Meer-bergen, An Event-Based Network-on-Chip Monitoring Service, Proc. of the High-Level Design Validation and Test Workshop (HLDVT), November 2004, pp. 149–154.

[9] Calin Ciordas, Kees Goossens, Twan Basten, Andrei Radulescu, and Andre Boon,Transaction Monitoring in Networks on Chip: The On-Chip Run-Time Perspective,Proc. of the IEEE Symposium on Industrial Embedded Systems (IES), October2006.

[10] Calin Ciordas, Andreas Hansson, Kees Goossens, and Twan Basten, A Monitoring-aware NoC Design Flow, Proc. of the EUROMICRO Symposium on Digital SystemDesign (DSD), August 2006.

[11] DAFCA, DAFCA In-Silicon Debug: A Practical Example, June 2005.

[12] William J. Dally and Brian Towles, Route Packets, Not Wires: On-Chip Intercon-nection Networks, Proc. of the 38th Design Automation Conference (DAC), June2001.

[13] Wilco de Boer and Bart Vermeulen, Silicon Debug:Avoid Needless Respins, Proc.Electronics Manufacturing Technology Symposium, July 2004, pp. 277 – 281.

83

84 BIBLIOGRAPHY

[14] John Dielissen, Andrei Radulescu, Kees Goossens, and Edwin Rijpkema, Conceptsand Implementation of the Philips Network-on-Chip, IP-Based SOC Design, Novem-ber 2003.

[15] Kees Goossens, John Dielissen, and Andrei Radulescu, The Æthereal network onchip: Concepts, architectures, and implementations, IEEE Design and Test of Com-puters 22 (2005), no. 5, 21–31.

[16] Kees Goossens, John Dielissen, and Andrei Radulescu, The Æthereal Network onChip: Concepts, Architectures, and Implementations, IEEE Design and Test ofComputers 22 (2005), no. 5, 414–421.

[17] Kees Goossens, Bart Vermeulen, Remco van Steeden, and Martijn Bennebroek,Transaction-based communication-centric debug, Proc. Int’l Symposium on Net-works on Chip (NOCS), May 2007, pp. 195–206.

[18] Pierre Guerrier and Alain Greiner, A Generic Architecture for On-Chip Packet-Switched Interconnections, Proc. Design, Automation and Test in Europe Confer-ence and Exhibition (DATE), 2000, pp. 250–256.

[19] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Oberg, M. Millberg, andD. Lindqvist, Network on chip: An architecture for billion transistor era, Proc.of the IEEE NorChip Conference, November 2000.

[20] Jorg Henkel, Wayne Wolf, and Srimat T. Chakradhar, On-chip networks: A scal-able, communication-centric embedded system design paradigm., Proc. of the 17thInternational Conference on VLSI Design (VLSID), 2004, pp. 845–851.

[21] Kalon Holdbrook, Sunil Joshi, Samir Mitra, Joe Petolino, Renu Raman, andMichelle Wong, microSPARCTM: A Case Study of Scan-Based Debug., ITC, 1994,pp. 70–75.

[22] A. Hopkins and K. McDonald-Maier, Debug support for complex systems on-chip:A review, IEE Proceedings Computer and Digital Techniques 153, no. 4.

[23] R. Leatherman and N. Stollon, An Embedded Debugging Architecture for SoCs,IEEE Potentials 24, no. 1.

[24] ARM Limited, AMBA AXI Protocol Specification. Version 1.0, March 2004.

[25] K.D. Maier, On-Chip Debug Support for Embedded Systems-on-Chip, Proc. Int’lSymposium on Circuits and Systems (ISCAS), 2003, pp. 565–568.

[26] Mikael Millberg, Ernald Nilsson, Rikard Thid, and Axel Jantsch, Guaranteed Band-width Using Looped Containers in Temporally Disjoint Networks within the NostrumNetwork on Chip, Proc. Design, Automation and Test in Europe Conference andExhibition (DATE), 2004.

[27] William Orme, Debug IP for SoC Debug, December 2005.

BIBLIOGRAPHY 85

[28] OCP International Partnership, Open Core Protocol Specification. Version 2.0,September 2003.

[29] Philips Semiconductors, CoReUse 4.1: Core-based Scan Architecture for SiliconDebug. Version 1.4, February 2003.

[30] , CoReUse 4.2: Device Transaction Level (DTL) Protocol Specification. Ver-sion 2.4, February 2005.

[31] G.J. Rootselaar and B. Vermeulen, Silicon Debug: Scan Chains Alone Are NotEnough, Proceedings IEEE International Test Conference (ITC) (Atlantic City, NJ,USA), September 1999, pp. 892–902.

[32] Andrei Radulescu, John Dielissen, Kees Goossens, Edwin Rijpkema, and PaulWielage, An efficient on-chip network interface offering guaranteed services, shared-memory abstraction, and flexible network programming, Proc. Design, Automationand Test in Europe Conference and Exhibition (DATE) (Washington, DC, USA),vol. 2, IEEE Computer Society, February 2004, pp. 878–883.

[33] Andrei Radulescu and Kees Goossens, Æthereal Services, July 2003.

[34] Philips Semiconductors, The i2c-bus specification, January 2000.

[35] IEEE Computer Society, IEEE Standard Test Access Port and Boundary-ScanArchitecture-IEEE Std 1149.1-2001., 2001.

[36] Bart Vermeulen and Sandeep Kumar Goel, Design for Debug: Catching DesignErrors in Digital Chips, IEEE Des. Test 19 (2002), no. 3, 37–45.

[37] Bart Vermeulen, Kees Goossens, Remco van Steeden, and Martijn Bennebroek,Communication-centric SOC debug using transactions, Proc. European Test Sym-posium (ETS), May 2007.

[38] Bart Vermeulen, Steven Oostdijk, and Frank Bouwman, Test and debug strategy ofthe PNX8525 NexperiaTM digital video platform system chip., ITC, 2001, pp. 121–130.

[39] Bart Vermeulen, Tom Waayers, and Sandeep Kumar Goel, Core-Based Scan Archi-tecture for Silicon Debug., ITC, 2002, pp. 638–647.

[40] Paul Wielage and Kees Goossens, Networks on Silicon: Blessing or Nightmare?,Proc. of the EUROMICRO Symposium on Digital System Design (DSD) (Dort-mund, Germany), September 2002.

86 BIBLIOGRAPHY

Constraints on External StopPulse APresent-day SoCs have multiple clock domains and hence there are bound to be clock-domain crossings. These crossings have to be taken into account and the designer has tomake sure that there are no timing violations. As a result certain constraints are oftenimposed to ensure the correct functional behaviour.One such clock-domain crossing takes place in the stop modules discussed in Section 5.3.In our case the stop modules operate at the functional clock frequency of the NoC whilethe external stop which is given through the IEEE 1149.1 TAP is at debug clock (tck).To ensure safe clock domain crossing in the stop modules certain constraints are imposedon the duration of the external stop pulse which is given by the user.

The minimum duration of the stop pulse is two functional clock cycles of the clockon which the stop module operates. This is obtained as follows. In Figure A.1 the first

S t o p M o d u l e C l o c k

E x t e r n a l s t o p p u l s e

a

b

c

S a m p l i n g T i m e s

T i m e ( t )t 1 t 2

Figure A.1: Timing diagrams showing minimum duration of external stop pulse

waveform is the functional clock of the stop module. At times t1 and t2 the externalstop pulse is sampled (i.e. every rising edge of the stop module clock).

• In the first scenario ’a’ the external stop pulse is already high when it is sampled

87

88 APPENDIX A. CONSTRAINTS ON EXTERNAL STOP PULSE

at time t1. Hence the external stop pulse will be sampled correctly.

• In scenario ’b’ the external stop pulse has not yet reached a value which is consid-ered high at time t1. Later this pulse goes low before it can be sampled a secondtime (at t2), hence this pulse given will be missed.

• In the third scenario ’c’, at the first sampling time t1 the external pulse value isstill low. But since it stays high for atleast two functional clock cycles of the stopmodule, this pulse is detected high on the next rising edge (at t2). In this way thepulse will not be lost.

There is no strict constraint on the maximum duration for the external pulse. But it isinteresting to note that for every 3 clock cycles of the stop module functional clock, thestop module generates one pulse on the EDI due to reasons as previously explained inSection 5.3.

List of Acronyms BATE Automated Test EquipmentBE Best EffortBES Best Effort ServiceBP-TPR BreakPoint TPRDCI Debug Data InterconnectDDI Debug Data InterconnectDfD Design-for-DebugDSM Deep Sub-MicronDTL Device Transaction LevelE2EFC End-to-End Flow ControlFIFO First-In, First-OutGL Gate LevelGS Guaranteed ServiceGT Guaranteed ThroughputIEEE Institute of Electrical and Electronics EngineersIP Intellectual PropertyJTAG Joint Test Action GroupMNI Master Network InterfaceMNIP Master NIPNI Network InterfaceNIP Network Interface PortNiK Network interface KernelNiS Network interface ShellNoC Network-on-ChipOCP Open Core ProtocolR RouterRTL Register Transfer LevelSNI Slave Network InterfaceSNIP Slave NIPSoC System-on-ChipTAP Test Access PortTCB Test Control BlockTLM Transaction Level ModelTPR Test Point RegisterVHDL VHSIC Hardware Description Language

89

90 APPENDIX B. LIST OF ACRONYMS

communication-centric debugging of systems on ... - tu...

Documents