improving polynomial datapath debugging with heds

6
Improving Polynomial Datapath Debugging with HEDs Somayeh Sadeghi-Kohan, Payman Behnam, Bijan Alizadeh , Masahiro Fujita and Zainalabedin Navabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran VLSI Design and Education Center (VDEC), University of Tokyo, Tokyo, Japan {sm.sadeghi79, payman.behnam, b.alizadeh, navabi}@ut.ac.ir, [email protected] AbstractIn this paper, we introduce a formal and scalable debugging approach to derive a reduced ordered set of design error candidates in polynomial datapath designs. To make our debugging method scalable for large designs, we utilize a Modular Horner Expansion Diagram (M-HED), which has been shown to be a scalable high level decision model. In our method, we extract data dependency graphs from the polynomial datapath designs using static slicing. Then we combine backward and forward path tracing to extract a reduced set of error candidates. In order to increase the accuracy of the method in the presence of multiple design errors, we rank the error candidates in decreasing order of their probability of being an error using a proposed priority criterion. In order to evaluate the effectiveness of our method, we have applied it to several large designs. The experimental results show that the proposed method enables us to locate even multiple errors with high accuracy in a short run time. Keywords— Verification, Debugging, Error Ranking, HED, RTL. I. INTRODUCTION Verification is the process to check whether there is a discrepancy between a given design and its specification. Debugging aims to find the location of the observed error(s) in the verification phase, while repairing or correction means how error candidates can be rectified to make the design function in the way it was desired based on the specification. With increasing size and complexity of digital system designs, automated debugging especially in the case of multiple errors, becomes increasingly significant. In addition, in order to cope with large sizes of real world designs, reducing run times in verification and debugging is important. Despite the advances in debugging techniques, the processes of finding minimal potential error locations, ranking them, and finding true design errors from them, require a large processing time and ad-hoc manual efforts [1]. There are bodies of work for verification and debugging at the gate level [5][6][7]. In fact, most hardware debugging tools are based on bit-level methods like BDD (Binary Decision diagram), SAT (SATisfiability), or MaxSAT (Maximum SAT). These methods suffer from space and time explosion problems when dealing with large designs with multiple errors. Nowadays, due to faster design changes and design complexity, designers tend to move up in the level of design abstraction from gate level to Register Transfer Level (RTL) or Electronic System Level (ESL). On the other hand, the lack of scalable and powerful RTL error diagnosis tools significantly increases time to market and consequently reduces designer’s productivity. To address this problem, some techniques have been proposed that work directly at the RTL[2][8][9][15]. The techniques presented in [8] employ a software analysis approach that implicitly uses multiplexers (MUX) to identify which statements in the RTL code are error candidates. Drawback of this technique is in producing a large potential error sites. One way to overcome this problem is to explicitly insert MUXes into the HDL code [2]. This approach makes use of hardware analysis techniques and improves the accuracy of error diagnosis. This technique requires simulating the whole circuit for debugging and correction, and is very time consuming. There have been approaches based on SMT (Satisfiability Modulo Theories) solvers to debug RTL designs [9][15]. In [9] a diagnosis method is presented that extends the SAT-based diagnosis for RTL designs. The design description as well as error candidate signals are specified at the word-level. Therefore, word-level MUXes are added into error signals. Finally, to solve the resulting formula, a word-level SAT solver is used. To find a single error location and fix it, a method is proposed in [15] that utilizes SMT solvers to check whether there is any replacement that corrects a given set of counterexamples. This method works only for single error conditions. The method presented in [16] works for ESL designs and can find error locations based on dynamic slicing and correct them based on mutation technique. Debugging approaches in this ([16]) and other works such as [5], [6], [9][15] are all based on counterexamples which use a specific input stimuli. Because of this, they may fail when new counterexamples (that are not considered during debugging) are added. In [11], a method called rank ordering of error candidates has been proposed to accelerate finding errors in the extracted reduced potential error location set. In this work, a new Probabilistic Condence Score (PCS) has been suggested. This method takes the masking error situation into consideration in order to provide a more reliable and accurate debugging priority to reduce error searching process in the derived potential error set. This method is based on the test sets that are generated by simulation. If the generated test set does not have enough coverage, the design error cannot be detected and hence, efficiency of the debugging will be reduced significantly. In addition, it cannot efficiently address a design with more than one error. The ever growing usage of DSP and multimedia applications necessitate an efficient method to handle their debugging problem. For such applications, some works have been suggested in [12] and [14].The work [14] is for verification of pipeline processors with reconfigurable functional units. In addition, it utilizes complex formal methods such as Correspondence Checking and Positive Equality. The work[12] can find and correct errors in an RTL design automatically. Although it uses a strong high level decision diagram, it cannot debug a design with more than two errors in a short time. This is because it uses exhaustive techniques for debugging a buggy design. To enhance the efficiency of [12] a method was presented in [18] to reduce debugging time using a simple heuristic approach to avoid computing all mutants. Practically, this method cannot handle more than two bugs in a short run time. Dealing with the debugging of datapaths dominated designs, requires a scalable representation model. In the recent years, a strong and scalable high level decision diagram called Horner Expansion Diagrams (HED) has been proposed[13].In order to verify polynomial datapath designs over bit-vectors, the authors of [4] have enhanced HED to manipulate modular arithmetic circuit and called it Modular-HED (M-HED).This decision diagram has a compact and canonical form, and is close to high-level descriptions of a design. The other properties of M-HED such as: facility for expressing primary outputs of a

Upload: independent

Post on 13-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Improving Polynomial Datapath Debugging with HEDsSomayeh Sadeghi-Kohan, Payman Behnam, Bijan Alizadeh †, Masahiro Fujita‡ and Zainalabedin Navabi†

†School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran

‡VLSI Design and Education Center (VDEC), University of Tokyo, Tokyo, Japan †{sm.sadeghi79, payman.behnam, b.alizadeh, navabi}@ut.ac.ir, ‡[email protected]

Abstract—In this paper, we introduce a formal and scalable debugging approach to derive a reduced ordered set of design error candidates in polynomial datapath designs. To make our debugging method scalable for large designs, we utilize a Modular Horner Expansion Diagram (M-HED), which has been shown to be a scalable high level decision model. In our method, we extract data dependency graphs from the polynomial datapath designs using static slicing. Then we combine backward and forward path tracing to extract a reduced set of error candidates. In order to increase the accuracy of the method in the presence of multiple design errors, we rank the error candidates in decreasing order of their probability of being an error using a proposed priority criterion. In order to evaluate the effectiveness of our method, we have applied it to several large designs. The experimental results show that the proposed method enables us to locate even multiple errors with high accuracy in a short run time. Keywords— Verification, Debugging, Error Ranking, HED, RTL.

I. INTRODUCTION

Verification is the process to check whether there is a discrepancy between a given design and its specification. Debugging aims to find the location of the observed error(s) in the verification phase, while repairing or correction means how error candidates can be rectified to make the design function in the way it was desired based on the specification.

With increasing size and complexity of digital system designs, automated debugging especially in the case of multiple errors, becomes increasingly significant. In addition, in order to cope with large sizes of real world designs, reducing run times in verification and debugging is important. Despite the advances in debugging techniques, the processes of finding minimal potential error locations, ranking them, and finding true design errors from them, require a large processing time and ad-hoc manual efforts [1].

There are bodies of work for verification and debugging at the gate level [5][6][7]. In fact, most hardware debugging tools are based on bit-level methods like BDD (Binary Decision diagram), SAT (SATisfiability), or MaxSAT (Maximum SAT). These methods suffer from space and time explosion problems when dealing with large designs with multiple errors.

Nowadays, due to faster design changes and design complexity, designers tend to move up in the level of design abstraction from gate level to Register Transfer Level (RTL) or Electronic System Level (ESL). On the other hand, the lack of scalable and powerful RTL error diagnosis tools significantly increases time to market and consequently reduces designer’s productivity. To address this problem, some techniques have been proposed that work directly at the RTL[2][8][9][15].

The techniques presented in [8] employ a software analysis approach that implicitly uses multiplexers (MUX) to identify which statements in the RTL code are error candidates. Drawback of this technique is in producing a large potential error sites. One way to overcome this problem is to explicitly insert MUXes into the HDL code [2]. This approach makes use of hardware analysis techniques and improves the accuracy of error diagnosis. This technique requires simulating the whole circuit for debugging and correction, and is very time consuming.

There have been approaches based on SMT (Satisfiability Modulo Theories) solvers to debug RTL designs [9][15]. In [9] a diagnosis method is presented that extends the SAT-based diagnosis for RTL designs. The design description as well as error candidate signals are specified at the word-level. Therefore, word-level MUXes are added into error signals. Finally, to solve the resulting formula, a word-level SAT solver is used. To find a single error location and fix it, a method is proposed in [15] that utilizes SMT solvers to check whether there is any replacement that corrects a given set of counterexamples. This method works only for single error conditions. The method presented in [16] works for ESL designs and can find error locations based on dynamic slicing and correct them based on mutation technique. Debugging approaches in this ([16]) and other works such as [5], [6], [9][15] are all based on counterexamples which use a specific input stimuli. Because of this, they may fail when new counterexamples (that are not considered during debugging) are added.

In [11], a method called rank ordering of error candidates has been proposed to accelerate finding errors in the extracted reduced potential error location set. In this work, a new Probabilistic Confidence Score (PCS) has been suggested. This method takes the masking error situation into consideration in order to provide a more reliable and accurate debugging priority to reduce error searching process in the derived potential error set. This method is based on the test sets that are generated by simulation. If the generated test set does not have enough coverage, the design error cannot be detected and hence, efficiency of the debugging will be reduced significantly. In addition, it cannot efficiently address a design with more than one error.

The ever growing usage of DSP and multimedia applications necessitate an efficient method to handle their debugging problem. For such applications, some works have been suggested in [12] and [14].The work [14] is for verification of pipeline processors with reconfigurable functional units. In addition, it utilizes complex formal methods such as Correspondence Checking and Positive Equality. The work[12] can find and correct errors in an RTL design automatically. Although it uses a strong high level decision diagram, it cannot debug a design with more than two errors in a short time. This is because it uses exhaustive techniques for debugging a buggy design. To enhance the efficiency of [12] a method was presented in [18] to reduce debugging time using a simple heuristic approach to avoid computing all mutants. Practically, this method cannot handle more than two bugs in a short run time.

Dealing with the debugging of datapaths dominated designs, requires a scalable representation model. In the recent years, a strong and scalable high level decision diagram called Horner Expansion Diagrams (HED) has been proposed[13].In order to verify polynomial datapath designs over bit-vectors, the authors of [4] have enhanced HED to manipulate modular arithmetic circuit and called it Modular-HED (M-HED).This decision diagram has a compact and canonical form, and is close to high-level descriptions of a design. The other properties of M-HED such as: facility for expressing primary outputs of a

design in terms of its primary inputs in a polynomials form, presenting state variables in terms of integer equations in a formal model, and availability of arithmetic operations in a word-level, has made it a powerful and scalable platform for verification of datapath dominated applications as mentioned in [3][4][12][13][18].

In this paper, we propose a novel debugging approach for datapath dominated applications with multiple design errors. Fig. 1 shows the proposed equivalence checking and debugging approach.

In our method we take a buggy implementation, and an algorithmic level specification of the design as inputs. First, the equivalence between the implementation and specification is checked using M-HED. If they are not equivalent, debugging is started. The debugging process includes three phases: 1) equivalence checking and looking for buggy outputs using M-HED, 2) finding a reduced ordered set of error candidates by using enhanced static slicing, and 3) ranking error candidates through a priority criterion.

Fig. 1. Proposed method for debugging of polynomial datapath designs

In summary, this paper makes the following key

contributions: • We combine backward and forward path tracing and

incorporate M-HED to create an efficient way for debugging polynomial datapaths.

• We propose a scalable and formal debugging technique for datapath dominated designs. Our technique derives the reduced and ordered set of potential error candidates taking advantage of M-HED. Our technique is not limited to counterexamples, but we formally check the implementation against the specification using M-HED. Therefore, we guarantee to locate all error candidates.

The rest of this paper is organized as follows: In Section II the program slicing is described. Section III presents our proposed debugging method. In Section IV experimental results are given, and finally conclusion and future directions are presented in Section V.

II. PROGRAM SLICING Program slicing [17] is a software engineering technique for

extracting parts of a program that have an impact on a selected set of variables. Portions of the program that cannot affect these variables are discarded and hence, a reduced set of statements is obtained. This reduced set is called a slice. Program slicing can help reduce the size of debugging problem significantly, and can be categorized into static slicing and dynamic slicing [17]. The static slicing considers dependency between all statements in the program, while dynamic slicing considers statements that are actually executed with a particular set of inputs.

Intend to use the concept of software program slicing for debugging hardware descriptions. In order to formally represent the description of a design, we use data dependency graph for datapath designs. To make these models clear, we briefly define some terminologies that will we use in the rest of the paper. Definition 1 (Flowgraph):A flowgraph is a structure (N, E, n0), where N is a set of nodes(ni), E is set of edges, (N, E) is a digraph, and n0 is a member of N such that there is a path from n0 to all other nodes in N. Definition 2 (Path and Distance): A path in a digraph from a node n1 to a node n2 is a list of nodes p0, p1,...,pk such that p0 = n1, pk = n2, and for all i, 1 ≤ i ≤ k – 1, (pi, pi+1) is in E. The least number of edges between n1 and n2 determines distance between n1 and n2[17]. Definition 3(Hammock Graph): A hammock graph asa special case of flowgraph, is a structure (N, E, n0, ne) with the property that (N, E, n0) and (N, E-1, ne) are both flowgraphs.n0 is the initial node and ne is the end node. Note that, as usual, E-1 = {(a, b)| (b, a) is in E} [17]. There is a path from n0 to all other nodes in N. From all nodes of N, excluding ne, there is a path to ne. Definition4 (Data Dependency Graph): A data dependency graph is a structure (N, E, Z), where N is a set of nodes, E is a set of edges, and Z ⊆ N is set of output variables. The edge between two nodes (m, n) show that n is dependent on m. Unlike Hammock graph that is a single-entry single-exit graph, this graph can have several inputs/outputs. Definition5 (Impaction):Node ’A’ has impact on node ‘B’ in the data dependency graph if there is a path between nodes ‘A’ and ‘B’.

To construct a data dependency graph, we perform a symbolic simulation to extract a sequence of RTL lines and interaction between them. Then a static slicing is started from output lines of this sequence. The result of this slicing is a data dependency graph. Hence, concurrency of RTL code cannot impact on our approach.

Example1: Fig. 2 shows a simple Verilog code in the HDL code column. Suppose that tmr, tmi, len, c, s and windex are inputs and out is an output. Moreover suppose that the value of windex is zero, while other inputs have specific values. The second and third columns show static and dynamic slicing for line out. Filler circles in these columns show that statements influence line out, while hollow circles show statements that do not affect line out. As shown, node n3 (len1= len), does not affect variable out, so it is excluded from the slice. Because windex is zero, the if part of the conditional statement is executed, then nodes n4 and n5 are excluded from dynamic slicing. The last two columns of fig.2 represent the Hammock and Data Dependency graphs. As the dependency graph shows, all nodes have impact on the out node. It is clear that static slicing involves more statements of the original (see Fig. 2) than dynamic slicing. It is due to the fact that static slicing keeps the behavior of the original program for all possible input values. Hence, it is possible that after static slicing the size of the program and the original RTL program will be the same. Therefore, many debugging approaches use dynamic slicing instead of static slicing to narrow down the search space of the causes of design errors in their Design Under Verification (DUV). However, if we use dynamic slicing, we cannot ensure that the result of equivalence checking is still valid, when new counterexamples are used for DUV. To use advantages of static slicing and alleviate some of its disadvantages, after static

slicing and constructing a data dependency graph, we perform a backward path tracing to find backward potential error locations. Then a forward path tracing is done to find forward potential error locations. The intersection of these path tracings forms a reduced list of potential error locations. The details are described in the following section.

Fig. 2. Static and dynamic slicing, and Hammock, and Data Dependency graphs of a Verilog code

III. PROPOSED DEBUGGING APPROACH

Our debugging method consists of three phases. In the first phase, equivalence checking is done between the specification and the implementation to find buggy outputs using M-HED. These buggy outputs are defined in terms of operative primary inputs (i.e., inputs that affect a specific output). In the second phase, a backward path tracing on data dependency graph is done for each buggy output to find intermediate nodes that are affected by the buggy outputs (PELBs in Fig. 5 (a)). Then, forward path tracing from operative inputs (which were found in Phase 1) is done to find other intermediate nodes that are affected by the operative inputs (PELFs in Fig. 5 (b)). The intersection of PELBs and PELFs constitute a reduced potential error location set (RPELs in Fig. 5 (c)). Finally in the third phase, the final error candidates are ranked in decreasing order of their probability of being an error. These phases are explained in details in the following subsections.

Phase1: Equivalence Checking and Finding Buggy Outputs

As mentioned, the first phase of our verification and debugging method is checking the equivalence between the specification and the implementation of the design. The pseudo code shown in Fig. 3 demonstrates the steps of this phase. Assume a specification (MS) and an implementation (MI) are given. Note that the specification used as a golden model is a C-code and the HDL implementation is often modeled with a fixed-size datapath architecture. Hence, equivalence verification of two polynomial functions extracted from the high level specification and the implementation should be carried out over a finite word-length. For doing so, M-HED is used as a canonical representation of the polynomial functions over finite word-lengths. First, a symbolic simulation is done that leads to a list of assignments for both specification and implementation. It is worth mentioning that the C and RTL codes used in our method will not run forever but terminate at a certain point. In addition, in order to handle sequential loops, time frame expansion is done. We implement this by using vectors for intermediate register value. By using the vectors instead of single variables, the outputs of a specific register in different clock cycles are discriminated from each other. The elements of this vector are treated as different variables. The number of expanded frames is defined according to the control information or total latency of the circuit.

With the above, symbolic simulation can be performed and the assignment lists can be generated easily. These lists completely describe the behavior of datapath of specification and implementation. Then HEDs of both specification and implementation are constructed. This is shown in lines 1 and 2 of Fig. 3. Following this, we use M-HED to check whether both HEDs are equivalent (Line 3, Fig. 3). If so, then the implementation is bug free and we are done (Lines 4-6). Otherwise they are not equivalent and we should debug the implementation to find potential error locations. To do so, we extract Output Polynomials in the specification (OPs) and in the implementation (OPi) (Lines7-8). We then subtract the corresponding polynomials for each individual output j (OPsj–OPij). All of the non-zero polynomials will be members of the Buggy Outputs set, (BOs) that should be considered for debugging (Line 9).

Unlike the previous approaches, we use M-HED instead of SAT/SMT solvers to check whether each primary output is correct or not. In addition to simplifying the process, M-HED can describe the difference between primary outputs of a specification and its implementation in terms of polynomials.

Equivalence-Checking(I,S) 1 HS = HED of specification (I); 2 HI = HED of implementation (S); 3 Equivalence checking of HS and HI using M-HED; 4 IF (HI == HS) 5 RETURN bug free; 6 ELSE 7 OPs= outputs polynomials in the specification; 8 OPi= outputs polynomials in the implementation; 9 BOs= buggy outputs set which its members are non-zero results of corresponding outputs polynomial subtraction (OPsj – OPij); 10 RETURN BOs;

Fig. 3. Equivalence checking and finding buggy outputs

n2

n8 n7

n9

n1n10

n13

n6 n14

n22 n21

n4 n12 n3 n11

n16 n15

out0 out2 out3 out4 out5 out6 out7out1

inp0

n5

inp1 inp2 inp3 inp4 inp5 inp6 inp7 inp8 inp9

Fig. 4. Data dependency graph of Verilog code describing FFT4 circuit

Example 2: To demonstrate Phase 1, suppose that we have a correct HDL implementation of a C-code for Fast Fourier Transform procedure (FFT4). Fig. 4 depicts the Data Dependency graph of this implementation. As shown, the design has ten inputs (in0,…,in9), and eight outputs (out0,…,out7). All inputs and outputs nodes have a corresponding node in the specification.

Fig. 5. Top view of procedure for obtaining potential error location set with (a) backward path tracing (b) forward path tracing and (c) obtaining reduced potential error

location set We deliberately inject three bugs in our implementation; these bugs directly affect n9, n15, and n21 respectively, as indicated in Fig. 4.These errors can be any form of design errors in an RTL code that can change the functionality of the design. In this example, bugs are introduced as incorrect operators. As mentioned, M-HED can represent the primary outputs of a circuit in terms of the primary inputs in a polynomial forms. Therefore, by using the Equivalence-Checking(I,S)procedure of Fig. 3, the subtraction results of polynomials are as follows:

subr0 = out0spc - out0imp = inp2×((-2)×inp9) subr1 = out1spc - out1imp = inp5×(2×inp8)+(inp2×((-2)×inp6 + (2×inp7))) subr2 = out2spc - out2imp = inp2×(2×inp9) subr3 = out3spc - out3imp = inp5×(2×inp9) subr4 = out4spc - out4imp = 0 subr5 = out5spc - out5imp = 2×inp8 + 2×inp9

subr6 = out6spc - out6imp = 0 subr7 = out7spc - out7imp = 0

Therefore, BOs = {out0, out1, out2, out3, out5} is the set of buggy outputs as highlighted in Fig. 4. Phase 2: Deriving a Reduced Set of Error Candidates

In the second phase, we derive a reduced set of error candidates. Fig. 6 shows the steps needed to be taken for this phase. For all elements of buggy outputs (BOsj) obtained in the previous phase, we traverse the data dependency graph by backward path tracing to find Effective Intermediate Nodes which affect (have impact on) each buggy output (EIN(BOsj)) (Lines 1-3). The union of these nodes (Line 4, Fig. 6) forms the Potential Error Locations in the Backward set (PELBs in Fig. 5(a)).

In this case, PELBs may still contain many error candidates that are not true error locations. To extract a Reduced Potential Error Locations set (RPELs), first we find Operative Inputs that affect each of the buggy outputs (OI(BOsj)) using the output of polynomial subtractions in the previous phase (Lines 5-7). Since M-HED is able to describe the primary outputs in terms of the primary inputs, this set indicates an effective set of primary inputs that have an effect on the buggy outputs. Then we traverse the data dependency graph by forward path tracing from these operative inputs to find Effective Intermediate Nodes that can be affected by each operative input j (EIN(inpj)) (Lines 8-10, Fig. 6). The union of EIN(inpj) sets give the intermediate nodes that are affected by the inputs of the subtraction result set. This results in PELFs shown in Fig. 5(b) (Line 11, Fig. 6).

The intersection of PELBs and RPELFs result in a Reduced Potential Error Location set for each buggy output j (RPELsj). Moreover intersection of PELBs and PELFs constitutes a Reduced Potential Error Location set, (RPELs in Fig. 5(c)) (Lines 13-15).

Example 3: In this example, we show steps for obtaining RPELs. Let us consider the Data Dependency graph in Fig. 4. By traversing this graph, those intermediate nodes EIN(BOsj) that affect the buggy outputs are obtained as follows:

EIN(BOs0) = {n2, n6, n7, n8, n9, n10, n14, n22} EIN(BOs1) = {n1, n5, n7, n8, n9, n10, n13, n21} EIN(BOs2) = {n2, n6, n7, n8, n9, n10, n14} EIN(BOs3) = {n1, n5, n7, n8, n9, n10, n13} EIN(BOs5) = {n3, n11, n15}

PELBs is union of all above sets which is {n1, n2, n3, n5, n6, n7, n8, n9, n10, n11, n13, n14, n15, n21, n22,}.

.Deriving RPELSET (CGI, BOs) 1 FOR(all members of BOs(BOsj∈BOs)//backward tracing2 EIN(BOsj)= traverse data dependency graph to find

each BOsj is impacted from which intermediates nodes;

3 END FOR; 4 PELBs= EIN(BOsj); 5 FOR (all members of BOs (BOsj∈BOs)) 6 OI(BOsj) = inputs of subtraction result for

correspondent outputs of imp. and spec.in BOs;7 END FOR; 8 FOR (each set of OI(BOsj))//forward tracing 9 EIN(inpj) =intermediate nodes which are effected

from each member of each set OI(BOsj);; 10 END FOR; 11 PELFs = EIN(inpj)

12 RPELsi =EIN(BOsj) EIN(inpj); 13 determine each intermediate node can effect on

which buggy output according to set of RPELsj, 14 RPELs=PELBs PELFs; 15 RETURN RPELsj and RPELs;

Fig. 6. Deriving reduced potential error location set

It is obvious that 16% (=3/18) reduction is obtained because the number of initial error candidates is 18 while |PELBs| is 15. On the other hand, based on subrj computed in Phase 1(see Example 2), the operative primary inputs on each buggy output are as follows:

OI(BOs0) = {inp2, inp9} OI(BOs1) = {inp2, inp5, inp6, inp7, inp8} OI(BOs2) = {inp2, inp9} OI(BOs3) = {inp5, inp9} OI(BOs5) = {inp8, inp9}

According to such information, we will be able to indicate those intermediate nodes that are affected by primary inputs determined in the previous step as follows:

EIN(inp2,9)= {n8, n9, n11, n13, n14, n15, n17, n19, n21, n22} EIN(inp2,5,6,7,8)= {n7, n8, n9, n10, n11, n12, n13, n14,

n15, n16, n17, n18, n19, n21, n22} EIN(inp5,9)= {n7, n9, n11, n13, n14, n15, n17, n19, n21, n22} EIN(inp8,9)= {n9, n11, n13, n14, n15, n17, n19, n21, n22}

PELFs is union of all above sets which is {n7, n8, n9, n10, n11, n12, n13, n14, n15, n16, n17, n18, n19, n21, n22}. Also we have:

RPEL0 = EIN(BOs0)∩EIN(inp2,9) = {n8, n9, n14,n22} RPEL1 = EIN(BOs1)∩EIN(inp2,5,6,7,8) = {n7, n8, n9, n10, n13, n21} RPEL2 = EIN(BOs2)∩EIN(inp2,9) = {n8, n9, n14} RPEL3 = EIN(BOs3)∩EIN(inp5,9) = {n7, n9, n13} RPEL5 = EIN(BOs5)∩EIN(inp8,9) = {n11, n15}

This way, we are able to figure out which intermediate nodes affect which buggy outputs as listed below:

n7 {BOs1, BOs3} n8 {BOs0, BOs1, BOs2}n9 {BOs0, BOs1, BOs2, BOs3} n10 {BOs1} n11 {BOs5} n13 {BOs1, BOs3}n14 {BOs0, BOs2} n15 {BOs5} n21 {BOs1} n22 {BOs0}

Therefore, RPELs=PELBs∩ PELFs= {n7, n8, n9, n10, n11, n13, n14, n15, n21, n22}. As you can see |RPELs| is 10, so the reduction rate is 8/18 = 44%.

Phase3: Error Candidate Ranking

Although the reduced potential error set can significantly reduce the number of error candidates, it still contains several potential error locations. From these locations, identifying the true design errors by examining all the error candidates one by one requires considerable time and effort. To alleviate this problem, we introduce a priority criterion for ranking error candidates.

Let TNBO and NBOAj be the Total Number of Buggy Outputs and the Number of Buggy Outputs Affected by a specific error candidate j, (RPELsj ∈ RPELs), respectively. In addition, let IDNBk be the Inverse of Distance between the suspected Node and kth Buggy output. These parameters can be obtained from RPELsj and Data Dependency graph. The following function defines an efficient criterion for ranking final error candidates. _ +

∑ TNBOk =

1k =

The basic idea behind defining the first term is the fact that if an intermediate node affects more buggy outputs, it has more chance to be a true error location, and hence, should have a relatively higher rank. The second term also indicates that if a suspected node is closer to buggy outputs, it has a better chance to be a true error, because it has less interaction with other nodes in its path to the outputs. Then, nodes with the same priority are grouped in the same class of priority. They are named First, Second, Third, and Other Classes of Priorities (FCPs, SCPs, TCPs, OCPs).

Example4: Let us consider RPELs computed in Phase 2 (see Example 3). The priorities are as follows:

Priority-RPELs7= 1× (2/5) +1×(1/3+1/2)/2 = 0.8 Priority-RPELs8= 1× (3/5) +1×(1/3+1/2+1/3)/3= 1.0 Priority-RPELs9=1×(4/5)+1×(1/3+1/2+1/3+1/2)/4= 1.2 Priority-RPELs10 =1 × (1/5) + 1×(1/3)/1 = 0.5 Priority-RPELs11= 1 × (1/5) + 1×(1/2)/1 = 0.7 Priority-RPELs13= 1 × (2/5) + 1×(1+1/2)/2 = 1.1 Priority-RPELs14= 1 × (2/5) + 1×(1+1/2)/2 = 1.1 Priority-RPELs15= 1 × (1/5) + 1×(1)/1 = 1.2 Priority-RPELs21= 1 × (1/5) + 1×(1)/1 = 1.2 Priority-RPELs22= 1 × (1/5) + 1×(1)/1 = 1.2

Therefore, we will have four classes of priorities: FCPs = {n9, n15, n21, n22}, SCPs = {n13, n14}, TCPs = {n8} and OCPs = {n7, n10, n11}.Three nodes in FCPS present true errors, which can easily be distinguished from other nodes.

IV. EXPERIMENTAL RESULTS

In order to demonstrate the effectiveness of this debugging technique for polynomial datapath designs, we apply our technique to several large-size designs common in DSP and multimedia applications. The benchmarks include AlphaBlending as an algorithm in image processing. This benchmark blends the foreground picture with the background one. Other benchmarks Differential Equation (DIFFEQ), Sobel as a convolution algorithm (this is a core of many image processing algorithms), Finite Impulse Response (FIR), (Inverse) Discrete Cosine Transform (DCT, IDCT), and Fast Fourier Transform with three sizes (FFT32, FFT256, FFT512). We have fully automated our procedure using C++, and simulations are carried out on an Intel 3.1 GHz Core i5 with 8 GB main memory running Linux with Qt creator as an IDE. We randomly inject some typical RTL design errors such as false state transition, incorrect assignment, and incorrect operator, etc. that can change functionality of the design at the primary outputs. The inputs of our algorithm are a list of assignments obtained by symbolic simulation. All results are averaged over ten runs of each experiment.

Table I tabulates our experimental results for single and multiple errors. The Design and #LS columns indicate the name of the benchmark and the number of lines obtained after symbolic simulation. the #LS column also represents the initial search space for error candidates. The HEDTime column indicates the time needed to construct HED and perform equivalence checking. Columns Single Error and Multiple Errors indicate the results for single error, and multiple errors, respectively. Sub-columns Red.PELB and Red.RPEL exhibit the amount of reduction obtained from the initial search space (all lines in assignments list) as a result of potential error location in the backward set (PELBs), and the reduced potential error location set (RPELs) generated in the second phase of our approach. Sub-columns RunTime state the computed time in seconds for equivalence checking, debugging and ranking. The sub-column #FEC represents the number of final error candidates (number of lines in RTL code) after mapping the list of assignments into RTL code. The sub-column #ERR shows the number of injected errors in case of multiple errors. As shown, on average, by using the PELBs reduction alone, 64%, and 34% reduction in error candidates can be achieved for single error and multiple errors, respectively. In addition, by constructing RPELs, the reduction for single and multiple errors are 85% and 73% respectively. The RunTime column shows that on average we can debug our circuit in a short time (on average 75.3, and 106 second is consumed for single, and multiple errors).

Fig. 7 and Fig. 8 demonstrate the effectiveness of our error candidate ranking for single, and multiple errors for each benchmark. Clearly, our proposed ranking method for single error works well. Because in the best cases, the true error is found in the first class of priorities (FCPs) while in the worst case (IDCT) on average 93% of the errors can be found in FCPs. For multiple errors, we can observe that in the best case (AlphaBelending), all errors occur in FCPs, and in the worst case (FFT512), on average, 68% of the errors occur in the FCPs, 20% of them in SCPs, 9% in TCPs, and only 3% in OCPs. It is obvious, our method is considerably faster and more scalable than gate-level diagnosis [5][6] and [7].That is because errors are modeled at the higher level of abstraction and hence, much more complicated polynomials can be quickly processed.

Table I. Experimental results of proposed method for nine benchmarks

Benchmark #LS HED Time

Single Error Multiple Errors Red.

PELB Red.

RPEL #FEC RunTime This paper #ERR Red.

PELB Red.

RPEL #FEC RunTime This paper

AlphaBlending 100 0.4 82% 95% 5 24.6 2 81% 91% 5 30.9 DIFFEQ 111 0.4 79% 88% 4 25.3 2 75% 85% 6 32.7

Sobel 587 1.8 52% 74% 11 43.6 2 27% 62% 13 66.2 FIR 1560 5.1 47% 71% 8 79.5 4 19% 72% 15 111.5 DCT 1620 5.2 55% 86% 14 87.3 5 22% 72% 23 114.6 IDCT 592 1.9 54% 84% 15 45.0 5 15% 65% 21 88.2

FFT32 533 1.7 72% 93% 10 38.5 3 24% 81% 12 59.4 FFT256 6638 13.7 71% 84% 13 115.0 3 19% 75% 15 157.8 FFT512 30170 58.4 69% 85% 14 218.6 4 21% 64% 21 293.0 Average 4656.7 10.6 64% 85% 9.84 75.3 3.3 34% 73% 14.5 106.0

Clearly, a simple RTL design can be translated to a complex netlist in the gate level which causes the number of errors to grow more quickly. Therefore, debugging at the gate level is difficult and limited to some specific errors [2]. In addition, many debugging methods are based on a limited number of counterexamples [5, 7, 9, 15]. Hence, their results may fail or change when new counterexamples, not considered during debugging, are considered. On the other hand, because of the scalability problem, considering all counterexamples is not applicable. However, we use equivalence checking between the implementation and the specification, and, this we guarantee to locate all error candidates. In addition, in comparison with [11, 12, 15, 18] that cannot debug designs with more than two design errors, our proposed method can handle them efficiently. Hence, this method can be combined with the work such as [18] to facilitate correction method.

Fig. 7. Ranking results for single error

Fig. 8. Ranking results for multiple errors

V. CONCLUSIONS In this paper we proposed a novel formal and scalable

debugging technique based on a combination of forward and backward path tracing incorporating M-HED to deal with the problem of multiple design errors in polynomial datapath designs. Empirical results show that our method can

significantly reduce the number of error candidates and rank them in a short time with a high accuracy. One possible avenue for the future work is to apply such a debugging technique to control dominated applications and also more complicated designs. In addition, we plan to append an auto-correction capability to our method to create a bug free implementation automatically.

REFERENCES

[1] P. Rashinkar, P. Paterson, and L. Singh, System-on-a-Chip Verification: Methodology and Techniques. Boston, MA: Kluwer, 2000.

[2] K. Chang, I. Wagner, V. Bertacco, and I. L. Markov, “Automatic Error Diagnosis and Correction for RTL Designs,” in Proc, of HLDVT'10, pp.683-688, 2010.

[3] B. Alizadeh, and M. Fujita, “A Unified Framework for Equivalence Verification of Datapath Oriented Applications,” in IEEE journal of IEICE’09, vol. E92-D, no. 5, pp. 985-994, 2009.

[4] B. Alizadeh, and M. Fujita, “Modular Datapath Optimization and Verification based on Modular-HED,” in IEEE Trans. on CAD, vol. 29, no. 9, pp. 1422-1435, 2010.

[5] A. Smith, A. Veneris, M. F. Ali, and A. Viglas, “Fault Diagnosis and Logic Debugging Using Boolean Satisfiability,” in IEEE Trans. on CAD, vol. 24, no. 10, pp. 1606-1621. 2005.

[6] A. Sülflow, G. Fey, R. Bloem, and R. Drechsler, “Using Unsatisable Cores to Debug Multiple Design Errors,” in Proc. of GLSVLSI'08, pp. 77-82, 2008.

[7] S. Safarpour, H. Mangassarian, A. Veneris, M. H. Liffiton, and K. A. Sakallah, “Improved Design Debugging Using Maximum Satisfiability,” in Proc. of FMCAD'07, pp.13-19,2007.

[8] C. H. Shi, and J. Y. Jou, “An Efficient Approach for Error Diagnosis in HDLDesign,” in Proc. of ISCAS’03, pp. 732-735.,2003.

[9] S. Mirzaeian, F. Zheng, and K.-T.T. Cheng, "RTL Error Diagnosis Using a Word-Level SAT-Solver," in Proc. of ITC'08, pp. 1-8, 2008.

[10] S. Horeth, and R. Drechsler, “Formal verification of Word-level Specifications,” in Proc. of DATE’99, pp. 52-58, 1999.

[11] T. Jiang, C. Liu, J. Jou, "Accurate Rank Ordering of Error Candidates for Efficient HDL Design Debugging," in IEEE Trans. on CAD’09,vol.28, no.2, pp.272-284, 2009.

[12] B. Alizadeh, "A Formal Approach to Debug Polynomial Datapath Designs," in Proc of ASP-DAC'12, pp.683-688, 2012.

[13] B. Alizadeh, M. Fujita, “A Canonical and Compact Hybrid Word-Boolean Representation as a Formal Model for Hardware/Software Co-Designs,” in Proc. of CFV’07, pp. 15–29, 2007.

[14] M. N. Velev, and P. Gao, “Automatic Formal Verification of Reconfigurable DSPs,” in Proc. of ASP-DAC '11, pp. 293-296, 2011.

[15] T. Matsumoto, S. Ono, M. Fujita: “An Efficient Method To Localize And Correct Bugs In High-Level Designs Using Counterexamples And Potential Dependence,” in Proc. of VLSI-SoC’12, pp. 291-294, 2012.

[16] U. Repinski, H. Hantson, M. Jenihhin, J. Raik, R. Ubar, G. Di Guglielmo, G. Pravadelli, and F. Fummi, “Combining Dynamic Slicing And Mutation Operators For ESL Correction,” in Proc. of ETS’12,pp. 1-6, 2012.

[17] M. Weiser, “Program Slicing,” in IEEE Trans. on Software Engineering, vol.10, no.4, pp.352–357, 1984.

[18] B. Alizadeh, P. Behnam, “Formal Equivalence Verification and Debugging Techniques with Auto-Correction Mechanism for RTL Designs,” in Microprocessor and Microsystems journal, volume 37, Issue 8, pp. 1108–1121, 2013.

100 100 100 95 94 93 100 100 100

5 6 70

20

40

60

80

100

shar

e of

eac

h pr

iorit

y cl

ass

in

debu

gged

err

ors

(%)

FCPs SCPs TCPs OCPs

100 10080

88

71 68

9081

68

2012

19 2110

5

2010 11

6 983

0

20

40

60

80

100

shar

e of

eac

h pr

iorit

y cl

ass

in

debu

gged

err

ors

(%)

FCPs SCPs TCPs OCPs